Original post address: http://blog.csdn.net/popy007/article/details/1797121
Perspective Projection is an important part of a 3D fixed assembly line. It is used to transform the points in the camera space from the cone (frustum) to the canonical view volume, the Perspective division action after cropping. In the algorithm, it is completed in two steps: Perspective matrix multiplication and perspective division.
Perspective Projection Transformation is a mysterious and confusing graphic technology that many developers have just entered the 3D graphics field. The difficulty lies in the tedious steps and the excessive reliance on some basic knowledge. Once you are unfamiliar with any of them, your understanding will immediately stop.
Yes, mainstream 3D APIs such as OpenGL and d3d indeed encapsulate the specific perspective projection details, such:
Gluperspective (...) You can generate a pivot projection matrix based on the input. In most cases, you can complete the task without having to know specific insider algorithms. But do you not think that if you want to become a professional graphics programmer or game developer, should you really drop the Perspective Projection guy? Let's start with the necessary basic knowledge and step by step (this knowledge can be found separately in many places, but I have never found it all in the same place, but you have found it now ).
First, we will introduce two things that must be mastered. With these, we will not lose our way in understanding the Perspective Projection Transformation Process (here we will use part of the vector ry and matrix knowledge, if you are not very familiar with this, for more information, see the use of vector ry in game programming ).
Homogeneous coordinate representation
Perspective Projection Transformation is carried out under homogeneous coordinates, and homogeneous coordinates are a confusing concept. Here we first understand it clearly.
According to the concept of base in "use of vector ry in game programming 6. ForVectorVAnd base oabc,
You can find a group of coordinates (V1, V2, V3 ),
V= V1A+ V2B +V3C(1)
ForPointP, You can find a group of coordinates (P1, P2, P3 ),
P-O= P1A +P2B+ P3C(2)
From aboveVectorAndPointTo representPoint(For exampleP), We regard the point position as the origin of this base.OA displacement, that is, a vector --P-o(In some books, such vectors are calledLocation Vector-- Starting from a special vector of the coordinate origin), we express points in an equivalent way while expressing this vector.P:
P=O+ P1A +P2B+ P3C (3)
(1) (3) representsVectorAndPoint. It can be seen that although both vectors and points are expressed in the form of algebraic components, it requires additional information to express a vertex than a vector. If I write an algebraic component expression (1, 4, 7), who knows whether it is a vector or a vertex!
We now write (1) (3) as a matrix:
Here(A, B, C, O)Is the coordinate base matrix, and the column vectors on the right are vectors.VAnd pointPCoordinates under the base. In this way, vectors and points have different expressions under the same base:3DVectorThe 4th algebra component of is 0, while3DPointThe first algebraic component of is 1. In this way, the concept of 3D ry is represented by a homogeneous coordinate using four algebraic components.
"Homogeneous coordinate representation is one of the important means of computer graphics. It can be used to distinguish vectors and points clearly, and it is also easier to perform ry (linear) transformation ." -- F. S. Hill, Jr
In this way, if the above (1, 4, 7) is written as (,), it is a vector; if it is (,), it is a vertex.
The following describes how to convert between normal coordinates (ordinary coordinate Cartesian coordinates) and homogeneous coordinates (homogeneous coordinate:
When the normal coordinates are converted into the homogeneous coordinates,
If (x, y, z) is a vertex, it is changed to (X, Y, Z, 1 );
If (x, y, z) is a vector, it is changed to (X, Y, Z, 0)
When converting from homogeneous coordinates to normal coordinates,
If it is (X, Y, Z, 1), it knows that it is a vertex and changes to (x, y, z );
If it is (X, Y, Z, 0), it is known to be a vector and still becomes (x, y, z)
The preceding method uses homogeneous coordinates to distinguish between vectors and points. We can think about it and know that for the three most common affine transformations, namely, translation T, rotation R, and scaling S, translation transformation is only meaningful for vertices, because normal vectors do not have the concept of position, only the size and direction are available, which can be clearly seen in the following formula:
Rotation and scaling make sense for vectors and points. You can use the same representation as above to detect them. It can be seen that the homogeneous coordinates are very convenient for affine transformation.
In additionPointP = (PX, Py, PZ) has a set of homogeneous coordinates (wpx, WPY, wpz, W), where W is not equal to zero. For example, the homogeneous coordinates of P (1, 4, 7) include (1, 4, 7, 1), (2, 8, 14, 2), (-0.1, -0.4,-0.7,-0.1) and so on. Therefore, if a point is changed from normal coordinates to homogeneous coordinates, x, y, and z are multiplied by the same non-zero number of W, and then 4th component W is added; if you convert a homogeneous coordinate to a normal coordinate, divide the first three coordinates by the 4th coordinates at the same time, and then remove the 4th components.
Since the homogeneous coordinates use four components to express the 3D concept, the translation transformation can be carried out using a matrix, such as F. s. hill, Jr said, it is more convenient to perform the affine (linear) transformation. Since the graphic hardware has generally supported homogeneous coordinates and matrix multiplication, it has promoted the use of homogeneous coordinates, making it seem to be a standard in graphics.
Simple linear interpolation
This is a basic technique widely used in graphics. It is used in many places, such as 2D bitmap amplification, downsize, tweening conversion, and perspective projection transformation. The basic idea is to give an X belonging to [a, B], find y belonging to [c, d], so that the distance ratio between x and A is proportional to the AB length, the distance between Y and C is equal to the ratio obtained by the length of CD. It is easy to understand it by using a mathematical expression:
In this way, each vertex from A to B corresponds to the unique Vertex on C to D. If there is an X, we can obtain an y.
In addition, if X is not in [a, B], for example, x <A or x> B, the obtained y is Y <C or Y> D, and the proportion remains unchanged, interpolation is also applicable.
Perspective Projection Transformation
With the above two theoretical knowledge, we began to analyze the pivot projection transformation, the main character of this analysis. Here we use OpenGL's Perspective Projection Transformation for analysis. Other APIs may have some differences, but the subject idea is similar and can be deduced similarly. After the transformation of the camera matrix, the vertex is transformed to the camera space. At this time, the polygon may be cropped by the cone, but it is not that easy to crop in this irregular body. Therefore, after careful analysis by the graphics predecessors, the crop is arranged in Canonical view volume, CVV. CVV is a cube, and the range of X, Y, and Z is [-]. polygon cropping is done with this rule body. Therefore, in fact, the Perspective Projection Transformation consists of two steps:
1) Use the perspective transform matrix to change the vertex from the cone to the cvv of the cropping space.
2) After CVV cropping is completePerspective Division(I will explain it later ).
We will first examine the projection relationship from one direction.
It is the case where the vertices in the right coordinate system are in the camera space. P (x, z) is the point after the camera transformation. The cone consists of the eye-eye position, NP-near cropping plane, and FP-far cropping plane. N is the distance from the eyes to the near-cropping plane, and F is the distance from the eyes to the far-cropping plane. The projection plane can be any plane parallel to the near-cropping plane. Here we choose the near-cropping plane as the projection plane. If p '(x', Z') is the point after projection, then Z' =-n. We have something to do with the similarity triangle:
Likewise
In this way, we get the point P after P projection'
From the above we can see that the projection result Z' is always equal to-N on the projection plane. In fact, Z' has no significance for the P after projection, and this information point is useless. However, for 3D graphics pipelines, in order to facilitate the subsequent operations, such as the Z buffer blanking algorithm, it is necessary to save the Z before projection for later use. Therefore, we use this useless information point to store Z and process it:
This form maximizes the use of three information points to achieve the original projection transformation, but it is too straightforward, a little dry, I don't think our final result should be it. What do you say? We started to think about it with CVV, and write it in a more elegant and consistent way, making it easier to process programs. The above form can be written as a false input:
Then we can easily use the matrix and homogeneous coordinate theory to express the Projection Transformation:
Where
I have seen the use of the homogeneous coordinates, which is no stranger to you? This new form not only achieves the original projection transformation above, but also uses the homogeneous coordinate theory to make the processing more standardized. Note that
Change
In this step, we use the rules for changing the homogeneous coordinates to normal coordinates. This step is calledPerspective Division (Perspective Division)This is step 1 of Perspective Projection Transformation. After this step, the original Z value is discarded (the corresponding Z value in CVV is obtained, which is explained later), and the vertex is considered as the projection. The CVV cropping process is used between the two steps, so the cropping space uses the homogeneous coordinates.
,
The main reason is that the perspective division will lose some necessary information (such as the original Z, 4th-Z retained) to make the cropping more difficult. Here we will not discuss the CVV cropping details, focus only on the two steps of perspective projection transformation.
Matrix
Is the first version of our projection matrix. You must ask why Z should be written
There are two reasons:
1) The three algebraic components of P' are divided by the denominator-Z in a unified manner, which is easy to use homogeneous coordinates to convert them into common coordinates, making processing more consistent and efficient.
2) The following CVV is a rule body with the range of X, Y, and Z: [-], which facilitates polygon cropping. We can select a and B as appropriate, so that the value of this formula is-1 when z =-N and 1 when z =-F, to build CVV In the z direction.
Next we can find A and B:
In this way, the first version of the perspective projection matrix is obtained:
The Perspective Projection Matrix of this version can be used to build CVV In the z direction, but the X and Y directions are still not restricted in, the next version of our perspective projection matrix will solve this problem.
To change the vertex from frustum to CVV in the direction of X and Y, we start to process X and Y. First, observe the final transformation result we have obtained:
We know that the valid range of-NX/Z is the left boundary value (as left) and right boundary value (as right) of the projection plane, that is, [left, right], -NY/Z indicates [bottom, top]. Now we want to map-NX/Z to [left, right] and X to [-1, 1].-NY/Z belongs to [bottom, top] ing to Y belongs to [-1, 1. What did you think? Ha, that is, our simple linear interpolation. You have mastered it! Let's fix it:
Then we get the final projection point:
What we need to do below is to reverse introduce the Perspective Projection Matrix of the next version from this new form. Notes
Yes
After the form of perspective Division, while P' only changes the form of X and Y components, AZ + B and-Z are unchanged, then we do the inverse processing of the perspective division -- multiply each component of P by-Z to get
The result is as follows:
Then we finally get:
M is the final perspective transformation matrix. Vertices in the camera space. If they are in the cone, the transformed vertex is in CVV. If it is outside the cone, the conversion is outside the CVV. The regularity of CVV is very favorable for polygon cropping. OpenGL uses the M format when constructing the Perspective Projection Matrix. Note that the last row of M is not (0 0 0 1) but (0 0-1 0). Therefore, we can see that the perspective transformation is not an affine transformation, and it is non-linear.
In addition, you may have thought that the width and height of a projection plane are mostly different, that is, the aspect ratio is not 1, such as 640/480. The width and height of CVV are the same, that is, the aspect ratio is always 1. This causes polygon distortion. For example, a square on a projection plane may become a rectangle on a CVV plane. The solution to this problem is to use the normalized device coordinates (normalized device coordinates) after the perspective transformation, cropping, and perspective division of multiple variants) correction in the transformation, it will convert the normalized vertices to the viewport according to the same proportion as the projection surface, so as to remove the distortion caused by the Perspective Projection Transformation. The premise of correction is to make the aspect ratio of the projection plane the same as the aspect ratio of the viewport.
Convenient projection matrix generation function
3D APIs provides such functions as gluperspective (FOV, aspect, near, far) or d3dxmatrixperspectivefovlh (pout, fovy, aspect, Zn, ZF) this function provides you with a quick method for generating perspective matrices. We still use the corresponding OpenGL method to analyze how it works.
Gluperspective (FOV, aspect, near, far)
FOV refers to the field of view, which is the angle of opening of the cone on the xz plane or YZ plane. OpenGL and d3d both use the YZ plane.
Aspect is the aspect ratio of the projection plane.
Near is the distance from the near-cropping plane.
Far is the distance from the far-cropping plane.
On the left side of the maxcompute console, you can calculate the cone in the xz plane and on the right side of the maxcompute console. The top = right/aspect step 3rd on the left uses Division (something graphics programmers hate), and the right step right = top X aspect uses multiplication, this may be why the image APIs uses the YZ plane!
So far, I have already elaborated on the Perspective Projection Transformation. I think if you keep following my ideas, you should be able to have a detailed understanding of the perspective projection transformation. Of course, it is very likely that you are already an expert in Perspective Projection Transformation. If so, please write to me and point out my lack of understanding. I will be very grateful to J. Bye!