Rotation Matrices, Matrix-Matrix Multiply

CS 381 Lecture, Dr. Lawlor

So now you're at least somewhat comfortable with matrices and vectors to represent 3D points. 

Hey, what's with the dang W coordinate?

Yes, W is a strange thing.  We live in a 3D universe, so only 3 floats are needed to represent position.

However, the graphics hardware uses 4 floats.  This is for a curious combination of reasons:
  1. 4 floats is 16 bytes, which is a nice power of two.  This means 4-float vectors are nicely aligned in memory, which is important for high memory bandwidth.  SSE and most other SIMD CPU extensions require 16-byte alignment, so in software, you'd have to pad out your vectors anyway.
  2. An extra W of 1.0 gives you the chance to include translation into a matrix, by tacking on an additional (rightmost) column that gets multiplied by 1.0.  Keeping all your rotation/translation/scalings in a single matrix is convenient, as we'll see below.
  3. If W isn't 1.0, the hardware divides everything in the vector (X, Y, Z, and W) by it before rendering.  Actually, it divides by W even if W==1.0, although dividing by 1 has no effect!  Dividing by W has two important effects:
So the bottom line is that W adds to matrix arithmetic the ability to do translation and perspective.  This is pretty cool.

For example, consider a 3x3 matrix without any W coordinates.  Note that we can do scalings and rotations, but no translation--everything in the matrix gets multiplied by something important:
[ s.x ]   [ 1  0  0 ][ v.x ]
[ s.y ] = [ 0 1 0 ][ v.y ]
[ s.z ] [ 0 0 1 ][ v.z ]
Now, add W coordinates.  Assume the input v.w ==1.0, and that we divide through by the output s.w.
[ s.x ]   [ 1  0  0  a ][ v.x ]
[ s.y ] = [ 0 1 0 b ][ v.y ]
[ s.z ] [ 0 0 1 c ][ v.z ]
[ s.w ] [ 0 0 1 d ][ v.w ]
Now notice that s.w=v.z*1.0 + v.w*d = v.z+d.  The hardware then divides by this finished s.w value; so this is equivalent to
    s.x = (v.x + a)/(v.z+d);
    s.y = (v.y + b)/(v.z+d);
    s.z = (v.z + c)/(v.z+d);
    s.w = 1.0;

So this is equivalent to translating the object by (a,b,c), shoving it d units down the Z axis, and computing perspective.

Why use a dang Matrix?  Why not just write out an equation?

Translation is simple: v.x+=0.2 translates the object in the x direction.
Scaling is simple: v.x*=1.2 scales the object in the x direction.
Rotation is simple:
    v2.x = cos(angle) * v.x + -sin(angle) * v.z;
    v2.z = sin(angle) * v.x + cos(angle) * v.z;
Perspective is simple: s.x = v.x / (v.z + 2.0);

The equation for a scaled, rotated, translated, rotated again, translated, scaled, and perspective transform is not simple.  To keep track of a big set of transformations efficiently and without hassle, matrices are indispensible.

Because matrices can be multiplied.

If you've got to apply two matrices, first R and then L to a vector v, you can write
    out = L * (R * v);
But you can also write
    out = (L * R) * v;
where L * R is another mat4.  Specifically, transforming the columns of R by L gives the the columns of the new product matrix:
    mat4 P;
    P[0]=L * R[0];
    P[1]=L * R[1];
    P[2]=L * R[2];
    P[3]=L * R[3];

This operation is good.  Learn it.  Use it.  Stick all your hundred matrices into one.

But what about Z?

Ah.  So you've noticed Z.  X and Y are the location onscreen.  Z is the depth of the onscreen objects.  OpenGL actually keeps a separate buffer, the depth or Z-buffer, offscreen alongside the normal visible color image.  In my ogl applications, you can press D (shift-d) to read back and display the depth buffer, showing depth==-1.0 in black and depth==+1.0 as white.

You normally clear the depth buffer to +1.0 at the start of each frame.  As you draw stuff, your vertex shader is supposed to compute a depth value, and return it in the Z coordinate.  The Z coordinate is divided by W to give a depth.  If the new stuff's depth is smaller (closer) than the old depth, it gets drawn.  If the new stuff is deeper (farther away) than whatever else was drawn at that pixel already, the hardware skips the new stuff and leaves the whatever else there.  Overall, this means closer objects obscure objects that are further away.  (Actually, the depth test is configurable with glDepthFunc--it can be disabled, flipped around to only draw if greater, etc.  You can also use textures in a fragment shader to simulate your own arbitrarily complicated depth test, although the builtin depth buffer has dedicated hardware and is hence faster and better than yours.)

So the bottom line is that a working perspective matrix just has to make sure the post-divide depth, which is Z/W, ranges between -1 and +1, with lower values for closer stuff.  For example, if your object's (post-rotation) v.z values already range between -1 and +1, and your W=(v.z+3), and so ranges between 2 and 4, than you can just set Z=v.z, and Z/W=v.z/(v.z+3) will range between -1/2 and +1/4.

zNear and zFar

The range of v.z values you want to store in the depth buffer is usually described as the "near and far clipping plane depths", which are usually written as zNear and zFar (or even "hither" and "yon" [snicker!]), which I'll abbreviate below as "n" and "f".   They're called clipping planes because finished depth values that lie outside the -1 to +1 range are chopped off. 

In a perfect world, you'd just linearly map v.z coordinates to depth buffer values like so:
    depth = Z/W = (v.z - n) / (f - n);
(You could also multiply this by 2 and subtract 1, just to make the depth range from -1 to +1 instead of 0 to 1.)

Unfortunately, the above linear mapping is tough to actually implement solely because of the W divide.  W is some linear function of v.z already, which isn't what the above equation wants. The sensible thing to do, then, would be to only divide X and Y, not Z, by W.  The hardware, however, divides X, Y, and Z by W whether you like it or not. 


You could, in fact, sarcastically cancel out the W divide by doing something like this:
    Z = W * (v.z - n) / (f - n);
This actually works fine, except that W is some function of v.z already, and we can't represent the product of two linear functions of v.z (a quadratic in v.z) using a matrix.  You might just give up on the matrix business at the last step, and compensate for the W divide by doing as your last step "gl_Position.z*=gl_Position.w;".  This actually works pretty well for points, but it's slower than needed, and breaks down for polygons (we'll talk about the precise breakdown when we cover polygons in a week).  Some 'net folks actually prefer this method, and deal with or ignore the polygon breakdown.


But the depth computation in most common use today, for example by gluPerspective, is the following hideous thing:
    Z = (v.z * (f + n) - 2* f * n) / (f - n);
    W = v.z;
Note that if f and n are constants, both Z and W are linear functions of v.z, which means this can drop right into a matrix.  (OpenGL curiously assumes the camera is pointing in the negative Z direction, so everything scaled by v.z is negated in the gluPerspective documentation.)

If v.z==n, we're at the near clipping plane, and the above expands to
    depth = Z / W = ((n * (f+n) - 2* f * n) / (f - n)) / n = (n * n - f * n) / (f * n - n * n) = -1
OK!  The near clipping plane (after the algebraic smoke clears) expands out to a depth of -1, which is what we want.

If v.z==f, we're at the far clipping plane, and we get
    depth = Z / W = ((f * (f+n) - 2* f * n) / (f - n)) / f = (f * f - f * n) / (f * f - n * f) = +1

So overall this computation actually keeps the depth inside the range we want, which is good.  It works for polygons, which is also good.  It's ugly, which is bad.  It also totally freaks out if n or f is zero, while the linear scaling trick works fine!  It begins to display numerical problems if n or f is very small or very large--this can cause weird roundoff problems called z-buffer fighting.  Overall, it's not perfect, but it works.  There's an alternative, the W-buffer, that works better in some cases