Rotation Matrices, Matrix-Matrix Multiply
CS 381 Lecture, Dr. Lawlor
So now you're at least somewhat comfortable with matrices and vectors to represent 3D points.
Hey, what's with the dang W coordinate?
Yes, W is a strange thing. We live in a 3D universe, so only 3 floats are needed to represent position.
However, the graphics hardware uses 4 floats. This is for a curious combination of reasons:
- 4 floats is 16 bytes, which is a nice power of two. This
means 4-float vectors are nicely aligned in memory, which is important
for high memory bandwidth. SSE and most other SIMD CPU extensions
require 16-byte alignment, so in software, you'd have to pad out your vectors anyway.
- An extra W of 1.0 gives you the chance to include translation
into a matrix, by tacking on an additional (rightmost) column that gets
multiplied by 1.0. Keeping all your rotation/translation/scalings
in a single matrix is convenient, as we'll see below.
- If W isn't 1.0, the hardware divides everything in the vector (X,
Y, Z, and W) by it before rendering. Actually, it divides by W
even if W==1.0, although dividing by 1 has no effect! Dividing by
W has two important effects:
- W is then 1.0 again. This is good for purpose (2) above.
- Dividing by W allows us to represent division inside a matrix,
which we couldn't do otherwise. Division is useful for
perspective, as you found in HW1.
So the bottom line is that W adds to matrix arithmetic the ability to do translation and perspective. This is pretty cool.
For example, consider a 3x3 matrix without any W coordinates.
Note that we can do scalings and rotations, but no
translation--everything in the matrix gets multiplied by something
important:
[ s.x ] [ 1 0 0 ][ v.x ]
[ s.y ] = [ 0 1 0 ][ v.y ]
[ s.z ] [ 0 0 1 ][ v.z ]
Now, add W coordinates. Assume the input v.w ==1.0, and that we divide through by the output s.w.
[ s.x ] [ 1 0 0 a ][ v.x ]
[ s.y ] = [ 0 1 0 b ][ v.y ]
[ s.z ] [ 0 0 1 c ][ v.z ]
[ s.w ] [ 0 0 1 d ][ v.w ]
Now notice that s.w=v.z*1.0 + v.w*d = v.z+d. The hardware then
divides by this finished s.w value; so this is equivalent to
s.x = (v.x + a)/(v.z+d);
s.y = (v.y + b)/(v.z+d);
s.z = (v.z + c)/(v.z+d);
s.w = 1.0;
So this is equivalent to translating the object by (a,b,c), shoving it d units down the Z axis, and computing perspective.
Why use a dang Matrix? Why not just write out an equation?
Translation is simple: v.x+=0.2 translates the object in the x direction.
Scaling is simple: v.x*=1.2 scales the object in the x direction.
Rotation is simple:
v2.x = cos(angle) * v.x + -sin(angle) * v.z;
v2.z = sin(angle) * v.x + cos(angle) * v.z;
Perspective is simple: s.x = v.x / (v.z + 2.0);
The equation for a scaled, rotated, translated, rotated again,
translated, scaled, and perspective transform is not simple. To
keep track of a big set of transformations efficiently and without
hassle, matrices are indispensible.
Because matrices can be multiplied.
If you've got to apply two matrices, first R and then L to a vector v, you can write
out = L * (R * v);
But you can also write
out = (L * R) * v;
where L * R is another mat4. Specifically, transforming the
columns of R by L gives the the columns of the new product matrix:
mat4 P;
P[0]=L * R[0];
P[1]=L * R[1];
P[2]=L * R[2];
P[3]=L * R[3];
This operation is good. Learn it. Use it. Stick all your hundred matrices into one.
But what about Z?
Ah. So you've noticed Z. X and Y are the location onscreen. Z is the depth of the onscreen objects. OpenGL actually keeps a separate buffer, the depth or Z-buffer,
offscreen alongside the normal visible color image. In my ogl
applications, you can press D (shift-d) to read back and display the
depth buffer, showing depth==-1.0 in black and depth==+1.0 as white.
You normally clear the depth buffer to +1.0 at the start of each
frame. As you draw stuff, your vertex shader is supposed to
compute a depth value, and return it in the Z coordinate. The Z
coordinate is divided by W to give a depth. If the new stuff's
depth is smaller (closer) than the old depth, it gets drawn. If
the new stuff is deeper (farther away) than whatever else was drawn at
that pixel already, the hardware skips the new stuff and leaves the
whatever else there. Overall, this means closer objects obscure
objects that are further away. (Actually, the depth test is
configurable with glDepthFunc--it
can be disabled, flipped around to only draw if greater, etc. You
can also use textures in a fragment shader to simulate your own
arbitrarily complicated depth test, although the builtin depth buffer
has dedicated hardware and is hence faster and better than yours.)
So the bottom line is that a working perspective matrix just has to
make sure the post-divide depth, which is Z/W, ranges between -1 and
+1, with lower values for closer stuff. For example, if your
object's (post-rotation) v.z values already range between -1 and +1,
and your W=(v.z+3), and so ranges between 2 and 4, than you can just
set Z=v.z, and Z/W=v.z/(v.z+3) will range between -1/2 and +1/4.
zNear and zFar
The range of v.z values you want to store in the depth buffer is
usually described as the "near and far clipping plane depths", which
are usually written as zNear and zFar (or even "hither" and "yon"
[snicker!]), which I'll abbreviate below as "n" and "f".
They're called clipping planes because finished depth values that lie
outside the -1 to +1 range are chopped off.
In a perfect world, you'd just linearly map v.z coordinates to depth buffer values like so:
depth = Z/W = (v.z - n) / (f - n);
(You could also multiply this by 2 and subtract 1, just to make the depth range from -1 to +1 instead of 0 to 1.)
Unfortunately, the above linear mapping is tough to actually implement solely
because of the W divide. W is some linear function of v.z
already, which isn't what the above equation wants. The sensible thing
to do, then, would be to only divide X and Y, not Z, by W. The
hardware, however, divides X, Y, and Z by W whether you like it or
not.
You could, in fact, sarcastically cancel out the W divide by doing something like this:
Z = W * (v.z - n) / (f - n);
This actually works fine, except that W is some function of v.z
already, and we can't represent the product of two linear functions of
v.z (a quadratic in v.z) using a matrix. You might just give up
on the matrix business at the last step, and compensate for the W
divide by doing as your last step
"gl_Position.z*=gl_Position.w;". This actually works pretty well
for points, but it's slower than needed, and breaks down for polygons
(we'll talk about the precise breakdown when we cover polygons in a
week). Some 'net folks actually prefer this method, and deal with or ignore the polygon breakdown.
But the depth computation in most common use today, for example by gluPerspective, is the following hideous thing:
Z = (v.z * (f + n) - 2* f * n) / (f - n);
W = v.z;
Note that if f and n are constants, both Z and W are linear functions
of v.z, which means this can drop right into a matrix. (OpenGL
curiously assumes the camera is pointing in the negative Z direction,
so everything scaled by v.z is negated in the gluPerspective documentation.)
If v.z==n, we're at the near clipping plane, and the above expands to
depth = Z / W = ((n * (f+n) - 2* f * n) / (f - n)) / n = (n * n - f * n) / (f * n - n * n) = -1
OK! The near clipping plane (after the algebraic smoke clears) expands out to a depth of -1, which is what we want.
If v.z==f, we're at the far clipping plane, and we get
depth = Z / W = ((f * (f+n) - 2* f * n) / (f - n)) / f = (f * f - f * n) / (f * f - n * f) = +1
So overall this computation actually keeps the depth inside the range
we want, which is good. It works for polygons, which is also
good. It's ugly, which is bad. It also totally freaks out
if n or f is zero, while the linear scaling trick works fine! It
begins to display numerical problems if n or f is very small or very
large--this can cause weird roundoff problems called z-buffer
fighting. Overall, it's not perfect, but it works. There's
an alternative, the W-buffer, that works better in some cases.