Programmable Shaders with GLSL
CS 481 Lecture, Dr. Lawlor
Non-Programmable Shaders Stink
Back in the day (2000 AD), graphics cards had finally managed to
compute all of OpenGL in hardware. They had hardware projection
matrices, hardware clipping, hardware transform-and-lighting, hardware
texturing, and so on. Folks were thrilled, because glQuake looked amazing and ran great.
There's a problem with hardware, though. It's hard to change.
And no two programmers ever want to do, say, bump mapping exactly the
same way. Some want shadows. Some want bump-and-reflect. Some want
bump-and-light. Some want light-and-bump. nVidia and ATI were going
crazy trying to support every developer's crazy desires in hardware.
For example, my ATI card still supports these OpenGL extensions, just for variations on bump/environment mapping:
GL_EXT_texture_env_add, GL_ARB_texture_env_add, GL_ARB_texture_env_combine,
GL_ARB_texture_env_crossbar, GL_ARB_texture_env_dot3, GL_ARB_texture_mirrored_repeat,
GL_ATI_envmap_bumpmap, GL_ATI_texture_env_combine3, GL_ATIX_texture_env_combine3,
GL_ATI_texture_mirror_once, GL_NV_texgen_reflection, GL_SGI_color_matrix, ...
This was no good. Programmers had good ideas they couldn't get
into hardware. Programmers were frustrated trying to understand
what the heck the hardware guys had created. Hardware folks were
tearing their hair out trying to support "just one more feature" with
limited hardware.
The solution to the "too many shading methods to support in hardware" problem is just to support every possible shading method in hardware. The easy way to do this is just make the shading hardware programmable.
So, they did.
Programmable Shaders are Very Simple in Practice
The graphics hardware now lets you do anything you want to incoming
vertices and fragments. Your "vertex shader" code literally gets
control and figures out where an incoming glVertex should be shown
onscreen, then your "fragment shader" figures out what color each pixel
should be.
Here's what this looks like. The following is C++ code, relying
on the "makeProgramObject" shader-handling function listed below.
The vertex and fragment shaders are the strings in the middle.
These are very simple shaders, but they can get arbitrarily complicated.
void my_display(void) {
glClearColor(0,0,0,0); /* erase screen to black */
glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);
/* Set up programmable shaders */
static GLhandleARB prog=makeProgramObject(
"//GLSL Vertex shader\n"
"void main(void) {\n"
" gl_Position=gl_ModelViewProjectionMatrix * gl_Vertex;\n"
"}\n"
,
"//GLSL Fragment (pixel) shader\n"
"void main(void) {\n"
" gl_FragColor=vec4(1,0,0,1); /* that is, all pixels are red. */\n"
"}\n"
);
glUseProgramObjectARB(prog);
... glBegin, glVertex, etc. Ordinary drawing here runs with the above shaders! ...
glutSwapBuffers(); /* as usual... */
}
A few meta-observations first:
- Even with programmable shaders, you've still clearly got plenty of normal C++ OpenGL code.
- The GLSL programmable shader language is suspiciously similar to C++, Java, C#, etc. This is by design!
- The programmable shader goes into OpenGL as a *runtime string*. This
means shaders get compiled for your graphics hardware at runtime. This
is good! It means the same (C++) executable can run on ATI and nVidia
cards (as well as hypothetical future cards like the Speartm Asparagon-9000). Your program can supply the shader-strings by:
- Hardcoding the shaders into your program, like above.
- Reading the shaders from a file (I like "vertex.txt" and "fragment.txt", when I don't hardcode.)
- Downloading shaders from the net.
- Creating new shaders on the fly (with just string processing!)
The stuff in strings is all "OpenGL Shading Language" (GLSL) code. The official GLSL Language Specification
isn't too bad--chapter 7 lists the builtin variables, chapter 8 the
builtin functions. Just think of GLSL as plain old C++ with a nice set
of 3D vector classes, and you're pretty darn close.
- gl_Position is the onscreen location of the vertex. This is
the one value the vertex shader is required to output.
gl_Position is a "vec4", and stored in the usual OpenGL coordinates,
from -1 to +1 on all axes.
- gl_Vertex is the vertex's raw C++ location, like as passed to a "glVertex3f(x,y,z);" call.
- gl_ModelViewProjectionMatrix is the whole OpenGL matrix stack, including both the GL_PROJECTION and GL_MODELVIEW matrices.
- gl_FragColor is the onscreen color of the pixel. This is
the one value the fragment shader is required to output. It's a
"vec4", and I'm using the constructor-style syntax to initialize it
above.
Data types in GLSL work exactly like in C/C++/Java/C#. There are some beautiful builtin datatypes, though:
- float. Works exactly like C/C++/Java/C#.
- vec4. A class with four floats in it, which you can think of as
the XYZW components of a vector, or the RGBA components of a color.
vec4 supports + - * / exactly like you'd expect. vec4 is the native
datatype of the graphics hardware, so all of these operations are
single-clock-cycle.
- You can get to the first component of a vec4 named "v" as follows:
- "v.x", treating the vec4 as a spatial position or vector.
- "v.r", treating the vec4 as a color. This is the same data,
the same speed, the same everything as ".x"; it's basically just a
comment or a hint to the human reader that you're dealing with a color.
- "v[0]", treating the vec4 as an array. Again, it's the same underlying data.
- You can initialize a vec4 as follows:
- "vec4 v=vec4(0.0);\n", sets all four components to zero.
- "vec4 v=vec4(0.1,0.2,0.3,0.4);\n" sets all four components independently.
- "vec3 d=vec3(0.1,0.2,0.3);\n"
"vec4 v=vec4(d,0.4);\n"
You can make a 3-vector into a 4-vector by just adding the missing components.
- The "w" component is used for homogenous coordinates.
It's 1.0 for ordinary position vectors, and 0.0 for direction or offset
vectors. You care about this when you're deriving a new projection matrix, but otherwise you usually ignore it.
- vec3. A class with three floats in it. Doesn't have a ".w" or
".a" component. Useful for representing directions (surface normals,
light directions, etc) when you don't want the "w" component messing up your dot products.
- vec2. A class with just two floats. Missing ".z" or ".b" and
".w" or ".a". Useful for representing 2D texture coordinates, or
complex numbers.
- mat4, mat3, mat2. Matrices that operate on vec4's, vec3's, and vec2's. See my caveats
on how to load up the matrix values (the constructor takes column-major
order), or just load them from C++ via a builtin like
gl_ModelViewMatrix.
- "int" usually *isn't* supported for computation (the graphics
hardware usually doesn't have integer math!), although GLSL allows it
for a loop counter.
- A variable declared as "varying" gets transmitted from the vertex
shader to the fragment shader. This is the only way to communicate
between your vertex and fragment shaders!
Bottom line: programmable shaders really are quite easy to use.
Example GLSL Shaders
Try these out in the 481_glsl demo program! (Zip, Tar-gzip)
Stretch out incoming X coordinates, by dividing by z:
"//GLSL Vertex shader\n"
"void main(void) {\n"
" vec4 sv = gl_Vertex; \n"
" sv.x=sv.x/(1.0+sv.z); /* stretch! */\n"
" gl_Position=gl_ModelViewProjectionMatrix * sv;\n"
"}\n"
,
"//GLSL Fragment shader\n"
"void main(void) {\n"
" gl_FragColor=vec4(1,0,0,1);\n"
"}\n"
Transmit the incoming vertex colors (gl_Color) to the fragment shader, where they're multiplied by red:
"//GLSL Vertex shader\n"
"varying vec4 myColor; /*<- goes to fragment shader */ \n"
"void main(void) {\n"
" myColor = gl_Color;\n"
" gl_Position=gl_ModelViewProjectionMatrix * gl_Vertex;\n"
"}\n"
,
"//GLSL Fragment shader\n"
"varying vec4 myColor; /*<- comes from vertex shader */ \n"
"void main(void) {\n"
" gl_FragColor=vec4(1,0,0,1)*myColor;\n"
"}\n"
Color incoming vertices by their position (red-X, green-Y, blue-Z. A common debugging trick!):
"//GLSL Vertex shader\n"
"varying vec4 myColor; /*<- goes to fragment shader */ \n"
"void main(void) {\n"
" myColor = gl_Vertex; /* color-by-position */\n"
" gl_Position=gl_ModelViewProjectionMatrix * gl_Vertex;\n"
"}\n"
,
"//GLSL Fragment shader\n"
"varying vec4 myColor; /*<- comes from vertex shader */ \n"
"void main(void) {\n"
" gl_FragColor=myColor;\n"
"}\n"
The Joy(?) of the OpenGL Interface
Like many APIs, it takes many
OpenGL calls to get anything useful done. Programmable shaders
are especially call-heavy--for each of the vertex and fragment shaders,
you've got to create a GLhandleARB "ShaderObject", put in your source
code, compile that source code, and check for compile errors.
Then you've got to create a "ProgramObject", attach the vertex and
fragment shaders, link the program, check for link errors, and finally
"glUseProgramObjectARB". Then you can render stuff.
The below code does everything but the "glUseProgramObjectARB" and
rendering. I've used it for years, and haven't looked at it since
2005. I can't recommend looking at it, or the official ARB_shader_objects extension that describes how these functions work in unintelligible excruciating legalese.
#include <GL/glew.h> /*<- for gl...ARB extentions. Must call glewInit after glutCreateWindow! */
#include <stdio.h>
#include <stdlib.h> /* for "exit" */
#include <fstream>
// Print an error and exit if this object had a compile error.
void checkShaderOp(GLhandleARB obj,int errtype,const char *where)
{
GLint compiled;
glGetObjectParameterivARB(obj,errtype,&compiled);
if (!compiled) {
printf("Compile error on program: %s\n",where);
enum {logSize=10000};
char log[logSize]; int len=0;
glGetInfoLogARB(obj, logSize,&len,log);
printf("Error Log: \n%s\n",log); exit(1);
}
}
// Create a vertex or fragment shader from this code.
GLhandleARB makeShaderObject(int target,const char *code)
{
GLhandleARB h=glCreateShaderObjectARB(target);
glShaderSourceARB(h,1,&code,NULL);
glCompileShaderARB(h);
checkShaderOp(h,GL_OBJECT_COMPILE_STATUS_ARB,code);
return h;
}
// Create a complete shader object from these chunks of GLSL shader code.
// You still need to glUseProgramObjectARB(return value);
// THIS IS THE FUNCTION YOU PROBABLY *DO* WANT TO CALL!!!! RIGHT HERE!!!!
GLhandleARB makeProgramObject(const char *vertex,const char *fragment)
{
if (glUseProgramObjectARB==0)
{ /* glew never set up, or OpenGL is too old.. */
std::cout<<"Error! OpenGL hardware or software too old--no GLSL!\n";
exit(1);
}
GLhandleARB p=glCreateProgramObjectARB();
glAttachObjectARB(p,
makeShaderObject(GL_VERTEX_SHADER_ARB,vertex));
glAttachObjectARB(p,
makeShaderObject(GL_FRAGMENT_SHADER_ARB,fragment));
glLinkProgramARB(p);
checkShaderOp(p,GL_OBJECT_LINK_STATUS_ARB,"link");
return p;
}
// Read an entire file into a C++ string.
std::string readFileIntoString(const char *fName) {
char c; std::string ret;
std::ifstream f(fName);
if (!f) {ret="Cannot open file ";ret+=fName; return ret;}
while (f.read(&c,1)) ret+=c;
return ret;
}
// Create a complete shader object from these GLSL files.
GLhandleARB makeProgramObjectFromFiles(const char *vFile="vertex.txt",
const char *fFile="fragment.txt")
{
return makeProgramObject(
readFileIntoString(vFile).c_str(),
readFileIntoString(fFile).c_str()
);
}
Do what I do, kids: write and debug the above code *once*, wrap it in a
nice library, call it from everywhere, and get on with your life!
Programmable Shaders are Very Weird Underneath
"May your children live in historic times, and come to the attention of the Emperor."
- (Supposedly) Ancient Chinese Curse. (Hint: are historic times usually good times?)
These are very interesting and historic times for computer science in
general, and computer graphics in particular. For fifty years,
we've built faster and faster machines that all operate in exactly the
same way--they execute the instructions of a stored program
(conceptually) one at a time. Consider that Fortran was invented in 1956, for vacuum-tube based computers, but it's still a viable language for programming a Core2 Solo.
Sadly, running one top-down program is just not enough anymore, and Fortran only runs on one core of a Core2 Duo.
Today (2004+) is the dawn of a new era of parallelism, from multi-core CPUs to Field-Programmable Gate Arrays
(programmed in VHDL) other weirder logic. Graphics cards are
actually among the most interesting parallel hardware out there.
Consider the job of running your pixel shader. Your shader
compiles into 20 machine-code instructions. Each of those
instructions is likely to take a ten clock cycles or more (because
floating-point is slow, there's a divide, etc.). A normal CPU
will start the first instruction of the first pixel, and because the
second instruction of the first pixel depends on the output of the
first instruction, even a fancy superscalar CPU has to just sit there
and wait until the first instruction finishes. But your screen
has a million pixels. So you have to wait 20 instructions/pixel *
10 cycles/instruction * 1 million pixels = 200 million clock cycles, or
about 1/10 second, before the rendering is finished.
By contrast, a GPU
pixel shader unit knows how big the screen is. So one GPU pixel
shader unit will actually immediately fire off the first instruction of
the *second* pixel before it's even done with the first instruction of
the first pixel. The GPU pixel shader unit will be starting the
first instruction of the *tenth* pixel before the first instruction is
finished, and will have *two hundred* pixels "in flight" before the
first pixel's shader is complete! Said another way, a GPU uses the
natural parallelism of the graphics rendering problem to keep the
arithmetic pipelines full, eventually cranking out one result every
clock cycle. Ignoring the 200-cycle startup time, you only have
to wait 20 million clock cycles, or 1/100 second, before the screen is
finished rendering.
Read that again. The GPU is ten times faster, because each pixel unit is busy executing ten instructions at once.
It gets better. Even the crappiest embedded motherboard graphics
card has at least two pixel units. A high-end card like an nVidia
GeForce 8800 has 32 pixel shader units. So a GPU might actually
be several hundred times faster than a sequential machine, and keep
several thousand instructions "in flight" and in progress at once.
Superscalar CPUs dream about being able to acheive this sort of
parallelism, but they have to painfully, carefully squeeze parallelism
from dry sequential machine code designed in the 1950's, dodging
dependencies all the way. A GPU is hooked directly to the fire
hose of pure natural parallelism inherent in the graphics problem (and
many other problems in today's world).
The beautiful part about this is that pixel-level parallelism lets you hide almost any source of latency:
- Arithmetic operations can be deeply pipelined, causing latency.
- Arithemtic operations can be complicated, like reciprocal-square-root, causing latency.
- Memory operations need not hit in the cache for high performance, since memory just adds latency.
Bottom line: GPUs run fast by explointing rendering's inherent problem-level parallelism.