Derivation of the projection matrix (deriving Projection matrices)

Last Update:2015-07-27 Source: Internet

Author: User

Tags truncated

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article is the derivation of the < projection matrix > translation, the original address is:

http://www.codeguru.com/cpp/misc/misc/math/article.php/c10123__1/ Deriving-projection-matrices.htm, due to my limited ability, there is no understanding of the translation of the place you can refer to the original, thank you ^-^!

A star on a meteor

If you want to reprint, please indicate the source, thank you!

In the basic matrix transformation of 3D graphics program, the projection matrix is more complicated. Pan and zoom Browse to understand that the rotation matrix as long as the knowledge of trigonometric functions can be understood, but the projection matrix is a bit tricky. If you've ever seen a projection matrix, you'll find that your common sense is not enough to tell you how it came about. Also, I haven't seen much of the tutorial resources on how to derive projection matrices online. The topic of this paper is how to derive the projection matrix.

For those who are just beginning to touch 3D graphics, I should point out that understanding how projection matrices derive might be our curiosity about mathematics, which is not required. You can just use the formula, and if you use a graphical API like Direct3D, you don't even need to use a formula, the graphics API will build a projection matrix for you. So, if this article seems a little difficult, don't be afraid. As long as you understand what the projection matrix does, you don't need to focus on what it does without you wanting to. This article is for those who want to know more about the programmer.

Overview: What is a projection?

A computer monitor is a two-dimensional surface, so if you want to display three-dimensional images, you need a way to convert the 3D geometry into a form that can be rendered as a two-dimensional image. That's exactly what the projections do. In a simple example, a method of projecting a 3D object onto a 2D surface is simply dropping the z-coordinate of each coordinate point. For a cube, it might look like Figure 1:

Figure 1: Projecting to the XY plane by dropping the z-coordinate

Of course, this is too simple and is not particularly useful in most cases. First, there is no projection to a plane; instead, the projection formula transforms your geometry into a new space body called the canonical viewshed body (canonical view volume), and the exact coordinates of the canonical viewshed may differ from one graph API to another, but for discussion purposes, Think of it as a box that extends from (-1,-1, 0) to (1, 1, 1), which is also used in Direct3D. Once all vertices are mapped to the canonical viewshed body, only their x and Y coordinates are used to map to the screen. This does not mean that Z-coordinates are useless, and it is often used in depth buffering for visibility testing. This is why it is transformed into a new space body instead of projecting onto a plane.

Note that figure 1 depicts the left-handed coordinate system, where the camera overlooks the z-axis positive direction, the y-axis upward and the x-axis to the right. This is the coordinate system used in Direct3D, which I will use in this article. For the right-handed coordinate system, there is no significant difference in terms of calculation, there is a little difference in the specification of the viewshed, so all discussions will still apply even if your graphics API uses different rules than Direct3D.

Now, you can go into the actual projection transformation. There are many projection methods, and I'll cover the 2 most common: orthogonal and perspective.

Orthogonal projection (orthographic Projection)

Orthogonal projection, which is called because all the projection lines are perpendicular to the final drawing surface, is a relatively simple projection technique. The viewshed, which is the visual space that contains all the geometry you want to display-is an axis-aligned box that will be transformed to the canonical viewshed, as shown in Figure 2:

Figure 2: Orthographic projection

As you see, the viewshed is defined by 6 polygons:

Because both the viewshed body and the canonical viewshed are axis-aligned boxes, this type of projection has no distance correction. The end result is, in fact, much like Figure 1, where each coordinate point simply discards the z-coordinate. The size of the object in 3D space is the same as in the projection, even though one object is much farther away from the camera than the other object. Parallel lines in 3D space are also parallel to the final image. Using this type of projection will have some problems like first-person shooter-just imagine playing without knowing how far you are! But it also has its usefulness. You might use it in a lattice game, for example, especially if the camera is tied to a fixed-angle lattice game, Figure 3 shows 1 simple examples:

Figure 3: A simple example of orthographic projection

So, without further ado, now start figuring out how it works. The simplest approach might be to consider the 3 axes separately and calculate how to map points from the viewshed body to the canonical viewshed body along each axis. Starting from the x-axis, the x-coordinate range of the points in the viewshed body is in [L, R], which wants to transform it to a range of [-1, 1]:

Now, prepare to narrow the range to what we expect, minus L, so that the leftmost item becomes 0. Another possible option is to translate the range so that it is centered at 0 rather than one end of 0, but now the algebraic style is cleaner, so I'll do it in this way for readability:

Now, at the end of the range is 0, you can narrow it down to the desired size. You expect the range of x values to be 2 units wide, from 1 to-1, so multiply the items by 2/(R-L). Note that r-l is the width of the viewshed, so it is always a positive number, so don't worry that the inequality will change direction:

Next, each minus 1 produces the range we expect [ -1,1]:

The basic algebra allows us to write the middle term as a single score:

Finally, dividing the middle item into two parts makes it shape like a px+q form, we need to organize the item into this form so that the formula we derive can be simply converted into matrix form:

The middle term of this inequality tells us the formula for converting X to the canonical field of view:

The steps to get the transformation formula for Y are exactly the same--just replace x with Y, replace R with T, and b instead of l--so we don't repeat them, just give the result:

Finally, you need to tear down the transformation formula for Z. The derivation of z is a little different, because Z is mapped to a range [0, 1] instead of [-1, 1], but it looks very similar. Z coordinates start at the range [N,f]:

Subtract the items by n so that the lower bound of the range becomes 0:

Now all that is left to do is divide by F-n, which produces the final range [0,1]. As before, note that F-n is the depth of the viewshed so it is never negative:

Finally, divide it into two parts so that it is shaped like a px+q form:

This gives the transformation formula of Z

Now you are ready to write the orthogonal projection matrix. Summing up the work so far, 3 projection formulas have been deduced:

If written in matrix form, you get:

That's it! Direct3D offers D3DXMATRIXORTHOOFFCENTERLH () (What a mouthful!) Method constructs an orthogonal projection matrix that is identical to this formula; you can find it in the DirectX documentation. The "LH" in the method name indicates that you are using a left-handed coordinate system. But what exactly does "offcenter" mean?

The answer to this question leads you to a simplified form of an orthogonal projection matrix. Consider the points: first, in the visible space, the camera is positioned at the origin and is viewed along the z-axis. Second, you usually want your field of view to extend as far as the left and right side, and also in the upper and lower directions of the z axis. If this is the case, then the z-axis goes directly through the center of the body of your viewshed, so you get R =-L and t =-B. In other words, you can forget about R, L, T, and B, and simply define the viewshed as 1 width w and a height H, as well as the clipping surface F and N. If you apply the above mentioned in the orthogonal projection matrix, you will get this fairly simplified version:

This formula is the implementation of the D3DXMATRIXORTHOLH () method in Direct3D. You can almost always use this matrix instead of the more generic "offcenter" version of the above that you derive, unless you do something strange with projections.

There is one more thing before you finish this part. It inspires us to note that this matrix can be replaced by a series of two simple transformations: panning is followed by scaling. It makes sense to you if you think about geometry, because all you do in orthographic projection is to move from one axis to another to align the box; the viewshed does not change its shape, it only changes its position and size. Specifically, there are:

This projection may be more intuitive because it makes it easier for you to imagine what's going on. First, the viewshed is panned along the z-axis to coincide with its near plane and origin, and then applies a zoom to narrow it down to the canonical viewport body size. It's easy to understand, isn't it? A offcenter orthogonal projection matrix can also be replaced with a transform and a zoom, which is similar to the above results, so I'm not listed here.

The above is an orthographic projection, and now it's time to get in touch with something more challenging.

Perspective projection (Perspective Projection)

Perspective projection is a slightly more complex projection method and is increasingly commonplace because it creates a sense of distance and therefore produces more realistic images. Geometrically speaking, this method differs from orthographic projections in that the viewshed of perspective projection is a flat-truncated body-that is, a truncated pyramid, not an axisymmetric box. See Figure 4:

Figure 4: Perspective projection

As you can see, the near plane of the viewshed is extended from (L,b, N) to (R, T, N). The far-plane range is emitted from the origin through the four-point ray of the near plane until it intersects the plane z=f. Since the viewshed is further extended from the origin, it becomes more and more lenient, and you transform the shape into a canonical viewshed box; The far end of the viewshed is more compressed than the near edge of the viewshed body. As a result, objects at the far end of the viewshed become smaller, which gives you a sense of distance.

Because of this transformation of the spatial body shape, the perspective projection cannot be expressed as simple as a translation and a scale as an orthographic projection. You have to make a few different things. However, this does not mean that the work you do on the orthographic projection is useless. A handy way to solve math problems is to reduce the problem to the one you already know how to solve. So that's what you can do here. Last time, you check one coordinate at a time, but this time, you will combine the x and Y coordinates together and then consider the Z coordinate. Your handling of x and Y can be in 2 steps:

1th step: Given the point in the viewshed body (x, y, z), it is projected to the near plane z=n. Because the projection point is on the near plane, its x-coordinate range is in [L, r],y coordinate range in [B, t].

2nd step: Use the formula you learned in Orthographic projection to map the x-coordinate from [L, R] to [-1, 1], mapping the y-coordinate range from [B, t] to [-1, 1].

Sounds great, doesn't it? Take a look at Figure 5:

Figure 5: Projecting a point to the z=n plane using a similar triangle

In this diagram, you draw a line from the point (x, Y, Z) to the origin, and notice the point where the line intersects the Z=n plane-the one marked with black. With these points, you draw 2 vertical lines relative to the z axis, and suddenly you get a pair of similar triangles. If you can recall the geometry of high school, similar triangles are triangles that have the same shape but not necessarily the same size. To prove that the 2 triangles are similar, it must be proved that their Tong are equal and not difficult to do here. Angle 1 is shared by two triangles, and it is obviously equal to itself. Angle 2 and Angle 3 are Tong that are formed through two parallel lines, so they are equal. At the same time, the right angle is of course equal to each other, so two triangles are similar.

What you should be interested in for similar triangles is that they are proportional to each corresponding edge. You know the length of the edges along the z axis, they are N and Z. That means that the proportions of the other corresponding sides are also n/z. So, think about what you know. According to the Pythagorean theorem, vertical lines from (x, y, z) relative to the z axis have the following lengths:

If you know the length of the vertical line from your projection point to the z axis, you can calculate the x and Y coordinates of that point. How long does the length beg? That's too easy! Because you have a similar triangle, the length is simple l multiplied by n/z:

Therefore, x coordinates are x * n/z,y coordinates are y * n/z. The first step is done.

The second step is simply to perform the same mapping you did on the previous part, so it's time to review the derivation formula you learned in orthographic projection. Recall that the X and Y coordinates are mapped to the canonical viewshed body, like this:

Now you can call these formulas again, unless you want to take the projection into account, so x is replaced with x * n/z, and y is replaced with y * n/z:

Now, by multiplying the z:

These results are a bit strange. In order to write these equations into a matrix, you need to put them in this form:

But it's clear that it's not going to work now, so now it looks like it's deadlocked. What should be done? If you can find a way to get Z ' Z's formula like X ' Z and y ' Z, you can write a transformation matrix that maps (x, y, z) to (x ' z, y ' z, z ' z). Then you just have to divide the parts by point Z, and you'll get what you want (x ', y ', Z ').

Because you know that Z-to-Z ' conversions don't depend on X and y, you know you want a formula like Z ' z= PZ + q,p and Q are constants. And, you can easily find those constants, because you know how to get Z ' in two special cases: because you want to map [n, F] to [0, 1], you know when z=n Z ' = 0, and z=f z ' = 1. When you put the first set of values into Z ' z = pz + q, you can solve:

Now, substituting the second set of values to get:

By substituting the value of Q into the equation, you can easily solve P:

Now that you have the value of P, and you have just obtained Q=–PN, you can solve the Q:

Finally, the expression of P and Q is put into the most primitive formula, which is:

You are almost done, but the unusual nature of your handling of the problem requires you to also handle the homogeneous coordinates w. Normally, simply setting w ' = The one that you may have noticed under a basic transformation is always [0, 0, 0, 1]---but now you are writing a transformation for the point (X ' z, y ' z, z ' z, w ' z). So instead, write w ' = 1 as w ' z = z. So the last equation for perspective projection is as follows:

Now, when you write this equation in the form of a matrix, you get:

When you use this matrix for points (x, y, z,1), it will produce (X ' z, y ' z, z ' z, w ' z). Then, you apply the usual steps to remove the coordinates in homogeneous order to get (x ', y ', z ', 1). That's perspective projection. Direct3D's D3dxmatrixperspectiveoffcenterlh () method also implements the above formula. Just as with orthogonal projections, if you assume that the viewshed is symmetric and the center is the z-axis (that is, r =-l,t =-B), you can easily rewrite the items in the matrix with the width W and height H of the viewshed body:

Direct3D's D3DXMatrixPerspectiveLH () method also generates this matrix.

Finally, there is a regular use of the perspective projection of the expression. In this representation, you define the viewshed based on the visual range of the camera, instead of worrying about the size of the viewshed. See Figure 6 for this concept:

Figure 6: The height of the viewshed body is defined by the angle A of the vertical visual range

The angle of the vertical visible range is a. This angle is divided by the z axis, so according to the basic trigonometric function, you can write the following equation, associating a and near plane N, and the screen height H:

This expression can replace the height in the projection matrix. In addition, using the longitudinal ratio r instead of the width, r is defined as the width-to-height ratio of the display area. So, get:

Therefore, there is a perspective projection matrix consisting of a vertical visual range angle A and a longitudinal ratio r:

In Direct3D, you can use the D3DXMATRIXPERSPECTIVEFOVLH () method to get the matrix of this form. This form is especially useful because you can set r directly to the aspect ratio of the render window, and the viewing range is P/4 better. So what you really need to worry about is just defining the scope of the viewshed along the z axis.

Summarize

This is the mathematical concept behind all the projection transformations you need. There are other less common projection methods, and if you use the right-hand coordinate system or a different canonical viewshed, it will be a little different from what we are talking about, but based on the conclusions of this article you should easily be able to deduce those formulas. If you want to know more about projections or other transformations, take a look at the real-time Rendering of Tomas Moller and Eric Haines, or James D. Foley, Andries van Dam, Steven K. feine R and John F.hughes's computer graphics:principles and practice, both of which are excellent books on computer graphics.

If you have any questions about this article, or need to point out any corrections, you can contact me through the CodeGuru forum, my name is Smasher/devourer.

Happy coding!

Derivation of the projection matrix (deriving Projection matrices)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More