Reference to the website of Scratchapixel.

If there is an object with some points in the world coordinate, and we want to project it to the image plane of a camera. What should we do? At first, we need to transfer the object’s world coordinate to the camera coordinate. So that we can do the operation of projection in the same coordinate. And we just need to use the extrinsic matrix (related to camera’s location and posture in the world coordinate) of the camera to finish this transfer. Then, we can do the projection. It will use the intrinsic matrix of the camera to compute the position of each point after projection. We know what the intrinsic matrix looks like (reference to this post: 三维重建之摄像机模型 | HDY blog ), but how to get it?

1 Some Default Information

1.1 The default pose of camera

1-defaultPoseOfCamera

By default, cameras always point down the negative z-axis in the camera coordinate. It means the points which can be seen by the camera have negative z.

2 A Simple Perspective Projection Matrix

This is a simple way to do the perspective projection. It assumes that the image plane of the camera is always located at , and this is a square plane in the range both in x- and y-coordinates.

And we need to remap the near clippping plane to , remap the far clipping plane to .

In this assumption, we can use the similar triangles to compute the position after projection as:

2-computePositionAfterProjection

$\frac{BC}{EF} = \frac{AB}{DE} \rightarrow BC = \frac{AB*EF}{DE} \rightarrow y^{\prime\prime} = \frac{1*y}{-z} = \frac{y}{-z}$

We need to use to make , cause .

Then we can use the same method to get .

3-frustum

Suppose we have a point (homogeneous coordinate) in the camera coordinate. Then we’ll use an intrinsic matrix to project it. But the real result position aren’t . It should be instead (basic concept in the homogneous coordinate).

$[x\ y\ z\ w] * \begin{bmatrix} m_{00} & m_{01} & m_{02} & m_{03} \\ m_{10} & m_{11} & m_{12} & m_{13} \\ m_{20} & m_{21} & m_{22} & m_{23} \\ m_{30} & m_{31} & m_{32} & m_{33} \\ \end{bmatrix} = [x^\prime\ y^\prime\ z^\prime\ w^\prime\ ]$

2.1 Remapping x-coordiante and y-coordinate

According to the derivation in part 1, we have these equations:

$\begin{cases} x^{\prime\prime} = \frac{x}{-z} \\ y^{\prime\prime} = \frac{y}{-z} \end{cases}$

So we just need to find a matrix that satisfy the above equations. In this place, we do not consider the z-coordinate. The trick is that we can make . so that we can sarisfy the above equations while calculate the homogeneous coordinate. So we get the intrinsic projection matrix:

$M_{4\times4} = \begin{bmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ 0 & 0 & \cdots & -1 \\ 0 & 0 & \cdots & 0 \\ \end{bmatrix}$

You can use this matrix to check, it satisfies the required equations.

2.2 Remapping z-coordinate

We know that the x- and y-coordinates don’t contribute to the calculation of the projected point z-coordinate. Then we can fill in the projection matrix as:

$M_{4\times4} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & A & -1 \\ 0 & 0 & B & 0 \\ \end{bmatrix}$

So we can get:

$z^{\prime\prime} = \frac{0*x + 0*y + A*z + B*w}{w^\prime = -z} = \frac{Az+B}{-z}$

And we know the result of remapping z-coordinate in two special cases (near clipping plane and far clipping plane), we use represent for the distance value of near clipping plane and for far:

$\begin{cases} \frac{-nA+B}{n} = 0,\ when\ z = -n \\ \frac{-fA+B}{f} = 1,\ when\ z = -f \end{cases}\ \rightarrow\ \begin{cases} -nA + B = 0 \\ -fA + B = f \end{cases}$

After calculation, we can obtain:

$\begin{cases} A = \frac{f}{n-f}\\ B = \frac{nf}{n-f} \end{cases}$

So we update the projection matrix:

$M_{4\times4} = \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{f}{n-f} & -1 \\ 0 & 0 & \frac{nf}{n-f} & 0 \\ \end{bmatrix}$

2.3 Considering Field of View

The angle of view or field-of-view (FOV) of the camera will influent how much we see of a scene (the extent of the scene).

4-FOV

Zooming in should correspond to a decrease in FOV, and so we need to multiply the projected point coordinates by a value greater than 1. To zoom out means that the FOV increases, so we need to multiply these coordinates by a value less than 1. And so the final equation about the zoom scale is:

$S = \frac{1}{\tan(\frac{fov}{2}*\frac{\pi}{180})}$

4-FOV2

Now we get the final projection matrix:

$M_{4\times4} = \begin{bmatrix} S & 0 & 0 & 0 \\ 0 & S & 0 & 0 \\ 0 & 0 & \frac{f}{n-f} & -1 \\ 0 & 0 & \frac{nf}{n-f} & 0 \\ \end{bmatrix}$

3 The OpenGL Perspective Projection Matrix

The derivation to get the OpenGL perspective projection matrix is somehow similar as before. The primary difference is that we do not ensure near clipping plane has , it can be represented by instead.

5-computePositionAfterProjectionForOpenGL

Using the similar triangles, we know:

$\frac{AB}{DE} = \frac{BC}{EF} \rightarrow \frac{n}{-z} = \frac{BC}{y} \rightarrow BC = y^{\prime\prime} = \frac{n*y}{-z} \\ and\ similar:\\ x^{\prime\prime} = \frac{n*x}{-z}$

3.1 Remapping x-coordiante and y-coordinate

As before, we’ll derive how to remap x- and y-coordinates at first.

The goal of a projection is to remap the values projected onto the image plane to a unit cube (a cube whose minimum and maximum extents are and respectively). However, once the point P is projected on the image plane, Ps is visible if its x- and y- coordinates are contained within the range [left, rigtht] for x and [bottom, top] for y. We use represent for left, for right, for bottom and for top.

6-frustum2

So we can have these derivation:

$\begin{matrix} l \leq x^{\prime\prime} \leq r \\ 0 \leq x^{\prime\prime} - l \leq r - l \\ 0 \leq \frac{x^{\prime\prime} - l}{r-l} \leq 1 \\ 0 \leq 2\frac{x^{\prime\prime} - l}{r-l} \leq 2 \\ -1 \leq 2\frac{x^{\prime\prime} - l}{r-l}-1 \leq 1 \\ -1 \leq \frac{2x^{\prime\prime} - l -r}{r-l} \leq 1 \\ -1 \leq \frac{2x^{\prime\prime}}{r-l} - \frac{r+l}{r-l}\leq 1 \\ -1 \leq \frac{2nx}{-z(r-l)} - \frac{r+l}{r-l}\leq 1 \\ \end{matrix}$

What is the use of the above derivation? It means that we find the method to project the point in camera coordinate to camera image plane, and its range is . And with the same process, we can get the method to transfer y-coordinate as well.

$-1 \leq \frac{2ny}{-z(t-b)} - \frac{t+b}{t-b}\leq 1 \\$

So we can follow the same we to fill in the projection matrix:

$M_{4\times4} = \begin{bmatrix} \frac{2n}{r-l} & 0 & 0 & 0 \\ 0 & \frac{2n}{t-b} & 0 & 0 \\ \frac{r+l}{r-l} & \frac{t+b}{t-b} & A & -1 \\ 0 & 0 & B & 0 \\ \end{bmatrix}$

3.2 Remapping z-coordiante

At part 2.2, we remap z-coordinate to the range , but in OpenGL, the range is . We just need to slightly change the equations as:

$\begin{cases} \frac{-nA+B}{n} = -1,\ when\ z = -n \\ \frac{-fA+B}{f} = 1,\ when\ z = -f \end{cases}\ \rightarrow\ \begin{cases} -nA + B = -n \\ -fA + B = f \end{cases}$

so the result is changed:

$\begin{cases} A = \frac{f+n}{n-f}\\ B = \frac{2nf}{n-f} \end{cases}$

And the projection matrix:

$M_{4\times4} = \begin{bmatrix} \frac{2n}{r-l} & 0 & 0 & 0 \\ 0 & \frac{2n}{t-b} & 0 & 0 \\ \frac{r+l}{r-l} & \frac{t+b}{t-b} & \frac{f+n}{n-f} & -1 \\ 0 & 0 & \frac{2nf}{n-f} & 0 \\ \end{bmatrix}$

3.3 Considering Field of View

In this perspective projection, users can change the distance of near and far clipping plane. But how about the value of left, right, bottom and top? It seems like they are only influenced by the value of and ? Nope, they are also influenced by the value of field-of-view (FOV) actually.

7-FOV3