In-depth exploration of Perspective Projection Transformation

Source: Internet
Author: User
Tags ranges
Reposted articles:Http:// article. I remember reading this article when I first came into contact with ogl. It seems that I can't help but praise the author for writing such details. Thank you for your excellent work! In-depth exploration of Perspective Projection Transformation

Perspective Projection is an important part of a 3D fixed assembly line. It is used to transform the points in the camera space from the cone (frustum) to the Canonical View Volume, the Perspective division action after cropping. In the algorithm, it is completed in two steps: Perspective matrix multiplication and perspective division.

Perspective Projection Transformation is a mysterious and confusing graphic technology that many developers have just entered the 3D graphics field. The difficulty lies in the tedious steps and the excessive reliance on some basic knowledge. Once you are unfamiliar with any of them, your understanding will immediately stop.

Yes, mainstream 3D APIs such as OpenGL and D3D indeed encapsulate the specific perspective projection details, such as gluPerspective (...) You can generate a pivot projection matrix based on the input. In most cases, you can complete the task without having to know specific insider algorithms. But do you not think that if you want to become a professional graphics programmer or game developer, should you really drop the Perspective Projection guy? Let's start with the necessary basic knowledge and step by step (this knowledge can be found separately in many places, but I have never found it all in the same place, but now you have found J ).


First, we will introduce two things that must be mastered. With these, we will not lose our way in understanding the perspective projection transformation process.
Homogeneous coordinate representation
Perspective Projection Transformation is carried out under homogeneous coordinates, and homogeneous coordinates are a confusing concept. Here we first understand it clearly.
For a vector v and a base oabc,

You can find a group of coordinates (v1, v2, v3 ),
V = v1 a + v2 B + v3 c (1)
For a point p, we can find a group of coordinates (p1, p2, p3 ),
P-o = p1 a + p2 B + p3 c (2)

From the expression of vectors and points, we can see that in order to represent a point (such as p) in the coordinate system ), we regard the point position as a displacement of the Origin o of the base, that is, a vector-p-o (in some books, such a vector is called a location vector-a special Vector starting from the coordinate origin ), we expressed the point p in an equivalent way while expressing this vector:
P = o + p1 a + p2 B + p3 c (3)

(1) (3) represents different expressions of a vector and a point in the coordinate system. It can be seen that although both vectors and points are expressed in the form of algebraic components, it requires additional information to express a vertex than a vector. If I write an algebraic component expression (1, 4, 7), who knows whether it is a vector or a vertex!

We now write (1) (3) as a matrix:

Here (a, B, c, o) is the coordinate base matrix, and the column vectors on the right are the coordinates of the vectors v and point p under the base. In this way, the vector and the point have different expressions under the same base: The 4th algebra component of the 3D vector is 0, and the 4th algebra component of the 3D point is 1. In this way, the concept of 3D ry is represented by a homogeneous coordinate using four algebraic components.
"Homogeneous coordinate representation is one of the important means of computer graphics. It can be used to distinguish vectors and points clearly, and it is also easier to perform ry (linear) transformation ." -F. S. Hill, JR
In this way, if the above (1, 4, 7) is written as (,), it is a vector; if it is (,), it is a vertex.
The following describes how to convert a common Coordinate (Ordinary Coordinate) to a Homogeneous Coordinate (Homogeneous Coordinate:
When the normal coordinates are converted into the homogeneous coordinates,
If (x, y, z) is a vertex, it is changed to (x, y, z, 1 );
If (x, y, z) is a vector, it is changed to (x, y, z, 0)

When converting from homogeneous coordinates to normal coordinates,
If it is (x, y, z, 1), it knows that it is a vertex and changes to (x, y, z );
If it is (x, y, z, 0), it is known to be a vector and still becomes (x, y, z)

The preceding method uses homogeneous coordinates to distinguish between vectors and points. We can think about it and know that for the three most common affine transformations, namely, translation T, rotation R, and scaling S, translation transformation is only meaningful for vertices, because normal vectors do not have the concept of position, only the size and direction are available, which can be clearly seen in the following formula:

Rotation and scaling make sense for vectors and points. You can use the same representation as above to detect them. It can be seen that the homogeneous coordinates are very convenient for affine transformation.

In addition, for the point P = (PX, Py, PZ) of a common coordinate, there is a corresponding family of homogeneous coordinates (wpx, WPY, wpz, W), where W is not equal to zero. For example, the homogeneous coordinates of P (1, 4, 7) include (1, 4, 7, 1), (2, 8, 14, 2), (-0.1, -0.4,-0.7,-0.1) and so on. Therefore, if a point is changed from normal coordinates to homogeneous coordinates, x, y, and z are multiplied by the same non-zero number of W, and then 4th component W is added; if you convert a homogeneous coordinate to a normal coordinate, divide the first three coordinates by the 4th coordinates at the same time, and then remove the 4th components.

Since the homogeneous coordinates use four components to express the 3D concept, the translation transformation can be carried out using a matrix, such as F. s. hill, Jr said, it is more convenient to perform the affine (linear) transformation. Since the graphic hardware has generally supported homogeneous coordinates and matrix multiplication, it has promoted the use of homogeneous coordinates, making it seem to be a standard in graphics.

Simple linear interpolation

This is a basic technique widely used in graphics. It is used in many places, such as 2D bitmap amplification, downsize, tweening conversion, and perspective projection transformation. The basic idea is to give an X belonging to [a, B], find y belonging to [c, d], so that the distance ratio between x and A is proportional to the AB length, the distance between Y and C is equal to the ratio obtained by the length of CD. It is easy to understand it by using a mathematical expression:

In this way, each vertex from A to B corresponds to the unique Vertex on C to D. If there is an X, we can obtain an y.
In addition, if X is not in [a, B], for example, x <A or x> B, the obtained y is Y <C or Y> D, and the proportion remains unchanged, interpolation is also applicable.

Perspective Projection Transformation
Well, with the above two theoretical knowledge, we will start to analyze this pivotal projection transformation. Here we use OpenGL's Perspective Projection Transformation for analysis. Other APIs may have some differences, but the subject idea is similar and can be deduced similarly. After the transformation of the camera matrix, the vertex is transformed to the camera space. At this time, the polygon may be cropped by the cone, but it is not that easy to crop in this irregular body. Therefore, after careful analysis by the graphics predecessors, the crop is arranged in Canonical View Volume, CVV. CVV is a cube, and the range of x, y, and z is [-]. polygon cropping is done with this rule body. Therefore, in fact, the Perspective Projection Transformation consists of two steps:
1) Use the perspective transform matrix to change the vertex from the cone to the cvv of the cropping space.
2) After the CVV cropping is complete, perform the perspective Division (which will be explained later ).

We will first examine the projection relationship from one direction.

It is the case where the vertices in the right coordinate system are in the camera space. P (x, z) is the point after camera transformation. The cone consists of the eye-eye position, np-near cropping plane, and fp-far cropping plane. N is the distance from the eyes to the near-cropping plane, and F is the distance from the eyes to the far-cropping plane. The projection plane can be any plane parallel to the near-cropping plane. Here we choose the near-cropping plane as the projection plane. If P '(x', Z') is the point after projection, then Z' =-N. We have something to do with the similarity triangle:


In this way, we get the point P after P projection'

From the above we can see that the projection result Z' is always equal to-N on the projection plane. In fact, Z' has no significance for the P after projection, and this information point is useless. However, for 3D graphics pipelines, in order to facilitate the subsequent operations, such as the z buffer blanking algorithm, it is necessary to save the z before projection for later use. Therefore, we use this useless information point to store z and process it:

This form maximizes the use of three information points to achieve the original projection transformation, but it is too straightforward, a little dry, I don't think our final result should be it. What do you say? We started to think about it with CVV, and write it in a more elegant and consistent way, making it easier to process programs. The above form can be written as a false input:

Then we can easily use the matrix and homogeneous coordinate theory to express the Projection Transformation:


Ha, I saw the use of the homogeneous coordinates, which is no stranger to you? This new form not only achieves the original projection transformation above,

In addition, the homogeneous coordinate theory is used to make the processing more standardized. Note that we use the rules for changing the homogeneous coordinates to normal coordinates in the next step. This step is called Perspective Division (Perspective Division) in the Perspective Projection process. This is the 2nd step of Perspective Projection Transformation. After this step, the original z value is discarded (the corresponding z value in CVV is obtained, which is explained later), and the vertex is projected. The CVV cropping process is used between the two steps. Therefore, the cropping space uses the homogeneous coordinates, mainly because the perspective division will lose some necessary information (such as the original z, 4th-z retained) to make the cropping more difficult to process. Here we will not discuss the CVV cropping details, just focus on the two steps of perspective projection transformation. Matrix is the first version of our projection matrix. You must ask why z should be written for two reasons:
1) The three algebraic components of P' are divided by the denominator-z in a unified manner, which is easy to use homogeneous coordinates to convert them into common coordinates, making processing more consistent and efficient.
2) The following CVV is a rule body with the range of x, y, and z: [-], which facilitates polygon cropping. However, we can select the coefficients a and B to make

In this formula, the value is-1 when z =-N, and 1 when z =-F, so as to build CVV In the z direction.

Next we can find a and B:

In this way, the first version of the perspective projection matrix is obtained:

The Perspective Projection Matrix of this version can be used to build CVV In the z direction, but the x and y directions are still not restricted in, the next version of our perspective projection matrix will solve this problem.

To change the vertex from Frustum to CVV in the direction of x and y, we start to process x and y. First, observe the final transformation result we have obtained:

We know that the valid range of-NX/Z is the left boundary value (as left) and right boundary value (as right) of the projection plane, that is, [left, right], -NY/Z indicates [bottom, top]. Now we want to map-NX/Z to [left, right] and X to [-1, 1].-NY/Z belongs to [bottom, top] ing to Y belongs to [-1, 1. What did you think? Ha, that is, our simple linear interpolation. You have mastered it! Let's fix it:

Then we get the final projection point:

What we need to do below is to reverse introduce the Perspective Projection Matrix of the next version from this new form. Note that it is in the form of perspective Division, while P' only changes the form of X and Y components, AZ + B and-Z are unchanged, then we do the inverse processing of the perspective division-multiply each component of P by-Z to get

The result is as follows:

Then we finally get:

M is the final perspective transformation matrix. Vertices in the camera space. If they are in the cone, the transformed vertex is in CVV. If it is outside the cone, the conversion is outside the CVV. The regularity of CVV is very favorable for polygon cropping. OpenGL uses the M format when constructing the Perspective Projection Matrix. Note that the last row of M is not (0 0 0 1) but (0 0-1 0). Therefore, we can see that the perspective transformation is not an affine transformation, and it is non-linear. In addition, you may have thought that the width and height of a projection plane are mostly different, that is, the aspect ratio is not 1, such as 640/480. The width and height of CVV are the same, that is, the aspect ratio is always 1. This causes polygon distortion. For example, a square on a projection plane may become a rectangle on a CVV plane. The solution to this problem is to use the normalized device coordinates (normalized device coordinates) after the perspective transformation, cropping, and perspective division of multiple variants) correction in the transformation, it will convert the normalized vertices to the viewport according to the same proportion as the projection surface, so as to remove the distortion caused by the Perspective Projection Transformation. The premise of correction is to make the aspect ratio of the projection plane the same as the aspect ratio of the viewport.

Convenient projection matrix generation function

3D APIs provides such functions as gluPerspective (fov, aspect, near, far) or D3DXMatrixPerspectiveFovLH (pOut, fovY, Aspect, zn, zf) this function provides you with a quick method for generating perspective matrices. We still use the corresponding OpenGL method to analyze how it works.
GluPerspective (fov, aspect, near, far)
Fov refers to the field of view, which is the angle of opening of the cone on the xz plane or yz plane. OpenGL and D3D both use the yz plane.
Aspect is the aspect ratio of the projection plane.
Near is the distance from the near-cropping plane.
Far is the distance from the far-cropping plane.

On the left side of the maxcompute console, you can calculate the cone in the xz plane and on the right side of the maxcompute console. The top = right/aspect step 3rd on the left uses Division (something graphics programmers hate), and the right step right = top x aspect uses multiplication, this may be why the image APIs uses the yz plane!

Basic differences between OpenGL and D3D

As mentioned above, the basic differences between different APIs lead to the differences in the final transformation matrix. The differences between OpenGL and D3D Perspective Projection matrices are as follows:
(1) OpenGL uses the right-hand coordinate system by default, while D3D uses the left-hand coordinate system by default.

(2) OpenGL uses column vector matrix multiplication while D3D uses row vector matrix multiplication.

(3) The Z range of OpenGL CVV is [-1, 1], and the Z range of D3D CVV is [0, 1].

These differences lead to the final differences between OpenGL and D3D Perspective Projection matrices.

Derivation of D3D Perspective Projection Matrix

Let's first look at the most basic perspective relationship diagram:

Here we examine the relationship on the xz plane, and the relationship on the YZ plane is the same. Here, O is the camera position. NP is the near-cropping plane, also the projection plane, and N is the distance from it to the camera. FP is the far-cropping plane, and F is the position from it to the camera. P is the point to be projected, and p 'is the point after projection. According to the similarity triangle theorem, we have

Then there is

Note that OpenGL uses the right-hand coordinate system, so-N should be used, while d3d uses the left-hand coordinate system, so using N is one of the differences between the two. In this way, we get the point after projection.

The third information point is the position of the transformed Z on the projection plane, that is, N. It is useless. Let's write p'

Therefore, use the third useless information point to store Z (if you are not familiar with this, please refer to the previous article ). Next, we can find a and B to build CVV In the z direction. Note that here is another difference between OpenGL and d3d. The Z range of OpenGL CVV is [-1, 1], while that of d3d CVV is [0, 1]. That is to say, the point after Point Projection on the near-cropping plane in d3d will be on the z = 0 plane of CVV, the point after the point projection on the far crop plane is on the z = 1 plane of CVV. In this way, our calculation equation is

The first version of the perspective projection matrix is obtained.

That is

In this case, the third component is changed to CVV. The Z range of CVV is [0, 1]. Next, according to the previous article, we will change the first two components to CVV. The X and Y ranges of CVV are [-1, 1], as shown in:

Using Linear interpolation, we have:

Here left and right are the left and right ranges of the projection plane, and top and bottom are the upper and lower ranges of the projection plane. Xcvv and ycvv are the X and Y we need to calculate in the CVV case, that is, the result we want to calculate. But before calculating them, we should first write the above formula:

Note that if the projection plane is centered in the X direction

Then the first statement can be used to pin the 1/2 on both sides of the equal sign and write it

Similarly, if the projection plane is centered in the Y direction, the second formula can be written

We will discuss it in two cases:

(1) center of the projection plane and center of the x-y plane (center in both the x and y directions)
(2) General situation

We discuss:

(1) Special Case Equations

This group is special, and the equation is relatively simple, but it is also the most frequently used method (this is used by D3DXMatrixPerspectiveLH, D3DXMatrixPerspectiveFovLH, D3DXMatrixPerspectiveFovLH, and week ). We export it:

Then we will introduce the Perspective Projection Matrix:


R-l and t-B can be regarded as w and H of the projection plane respectively. The last matrix is one of the Perspective Projection matrices of D3D. In addition, if we do not know the right, left, top, and bottom parameters, we can also obtain them based on the FOV-Field Of View parameters. The following shows the relationship between two planes:

The two fovs are the field of view on the x-z and y-zplanes respectively. If only one field of view is given, the aspect ratio of the projection plane can be calculated as follows:

Use a field of view to calculate w or h, and then use the aspect ratio to calculate h or w.

(2) General Equation

This group of equations is cumbersome, but more general (consistent with the derivation of OpenGL General matrices, which is also used by D3DXMatrixPerspectiveOffCenterLH and D3DXMatrixPerspectiveOffCenterRH ). We export it:

We continue to introduce the Perspective Projection Matrix:


The final matrix is the general perspective projection matrix of D3D.

Now, we have exported two pivot projection matrices of D3D. Next I will write the previously exported OpenGL Perspective Projection Matrix. You can compare it with the recently exported D3D general perspective projection matrix.

After careful observation, we can find that the layout of the elements is a transpose relationship, which is caused by the differences between the left-right coordinate system and the row-column matrix they use. There are also differences in the details of some elements, because the z range of CVV in D3D is different. It can be seen that, under the same principle, minor environmental differences can cause great changes, which is why there are many different versions of the perspective projection matrix. In general, the Perspective Projection Matrix can also be defined in the field of vision. The method is the same as in special cases.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.