Sunday, October 30, 2011

Evolution of the Graphics Pipeline: a History in Code (part 1)

Few markets have seen such a vertiginous rise in performance and complexity like 3D graphics hardware. Some people, including me, like to think that this journey started fifteen years ago, when the original 3dfx Voodoo Graphics board was released to the public. I thought it was time to look back and see how we got this far.

Being a programmer, I witnessed this evolution as APIs extended and grew over time, to cover the features made possible by newer hardware. It was lots of fun: graphic boards gradually went from mere triangle rasterizers to massively parallel computation engines, and with every major generation came a quantum leap in possibilities.

In this article we will walk down this path again, showing how OpenGL grew in response to the evolution of hardware and the new features that programmers wanted to use. In detail, I am going to describe a common NPR (Non Photo Realistic) rendering technique called Cel Shading (or Toon Shading, or Cartoon Shading, depending whom you ask to), and show its implementation on each major hardware generation using the OpenGL API.

Cel Shading

First things first, let’s define the 3D effect we are going to implement for the rest of this article.

Cel Shading is a technique designed to make objects look like they are hand-drawn cartoons. Cartoonists draw only a small set of tones on their subjects instead of fully shading them: for example a dark one for shadows, a midtone for ambient light and a bright tone for the highlights. Then the silhouette is underlined with a black outline.

Choosing the lighting model

To reproduce this effect on 3D hardware, we must thus start defining a lighting model, which will determine how objects are lit.

If you want your renderings to be faithful to reality, your lighting model must keep several things into account: materials’ (in)ability to reflect light, shadows and self-shadows, object transparency (which is non-trivially tied to the first two features) to name a few, and you even haven’t started handling multiple lights and the traits of light itself. In the end, defining a complete lighting model is not an easy task by any mean, and is far outside the scope of this article.

Luckily, our stated goal concerns implementing a rendering technique which is non-photorealistic to start with, so it won’t hurt to simplify the model (quite) a bit:

  • our source of light is omnidirectional;
  • intensity stays the same regardless of distance from the light source;
  • materials don’t reflect light;
  • materals are lit according to the angle of incidence of light (the smaller the angle, the higher the intensity).

With these assumption, we can use a very simple approach to determine light intensity on a specific point of the object; we’ll use the angle (A) between the normal vector on that point (N) and the vector from that point to the light source (L), and say that the smaller the angle, the higher the intensity; this makes sense, because if the two vectors are parallel it means that the light is pointing straight there; if the two vectors are orthogonal instead it means that the light is running parallel to the surface, thus not touching it.

The relationship between these three entities is described by the formal definition of the dot product between two vectors:

N dot L = |N| * |L| * cos(A)

Since |N| and |L| are both 1.0 for normalized vectors, we can get the cosine of the enclosing angle between the light and normal vectors just by calculating their dot product. It’s worth noting at this point that we don’t need the actual angle value; in fact, cos(A) is 1.0 for a couple of parallel vectors, a value we can map to maximum light intensity, and it’s 0.0 for a couple of orthogonal vectors, a value we can map to minimum light intensity.

You can see the result of such a lighting model applied to a torus. The light in this scene comes from the camera point of view. As you can see, portions of the torus directly facing the light are brighter, and there are smooth transactions of tones to the darker regions.

Lighting model

Putting the “Cel” in “Cel Shading”

Now, let’s implement the Cel Shading technique over this lighting model. A possible implementation can roughly be summarized in these two steps:

  • Ink outline: in wireframe mode, only draw the back-facing polygons of the objects, with a thick line width;
  • Hand-drawn look: clamp the intensity of light for each point of the object within a set of discretized intensities, then use those to fill polygons over the ink outline.

This is the wireframe skeleton that will be used as ink outline. Lines are 5 pixels wide. Note that we are rendering the back-facing polygons only!

Ink outline

Then we draw the object using discretized intensities over the outline: the smooth transitions of the first screenshot are gone, replaced by abrupt changes when the light intensity crosses a series of defined thresholds, making the object look like hand-drawn.

Enabling depth test when drawing ensures that front facing polygons will be rendered ahead of back facing ones, so that most of the wireframe torus is covered, except for the borders; this gives the viewer the illusion that the object has a black ink contour.

Cel Shading

As a final requirement, it must be possible for intensity changes to happen on the middle of a single polygon.

This is where it gets hard, because otherwise we could get away by calculating a final discretized lighting value for a whole polygon, and rendering it in flat color. This, however, would have the downside of making the polygonal representation of your object evident; this is especially an issue when working on a low-poly object. It wouldn’t either be anywhere near a cartoon artist representation of your model.

To better understand the expected behaviour of our cel shading code, this screenshot shows a cel shaded object with wireframe superimposed, with a detail of a region when a tone change occurs. As you can see, it happens in the middle of the surrounding polygons, whose existance you wouldn’t imagine (…sort of) without the wireframe.

Cel shading on mesh

Now, on to the implementation!

Fixed functionality

History context

Until mid ’90s, consumer-level 3D dedicated hardware simply didn’t exist: from the definition of vertices to the final rasterization, 3D graphics were entirely performed on the CPUs.

This changed with the introduction of the first 3D dedicated graphics boards; those cards implemented what is now commonly referred to as fixed functionality pipeline. The following diagram (courtesy of summarizes how the fixed functionality pipeline is exposed in OpenGL.

Fixed functionality pipeline diagram

In practice, this meant that this kind of hardware was very specific about the features it could deliver: it offered a set of pre-defined processing and rendering steps, each of which could be controlled via the API. The presence of an explicit “Fog” stage is especially indicative of the hardwired approach offered by the fixed functionality pipeline.

Fixed functionality has been the only programming model available on graphics boards for about five years after their inception; hardware from that time actually implemented the same fixed stages that the API made available (the nVidia GeForce256 card went as far as mapping them 1:1 to hardware). With recent hardware this is not true anymore, but OpenGL drivers still offer a fixed functionality API, whose commands and states are translated by a compatibility layer to run on the much different architecture that lies beneath.

The base set-up

The torus object shown in the “Cel Shading” section is procedurally generated, so that we can raise or lower its polygon complexity easily.

The code generates vertices, normal to vertices, and face indices for the torus; we can store those information in VBOs, so to have this data uploaded in the graphics board video memory for faster operations.

With our object defined, and recalling the description of our lighting model, we can already compute the light intensity for every vertex. This is the CPU calculation for a single vertex of the object:

vec4 vertex;
vec3 normal;
vec4   light_dir = normalize(light_pos - vertex);
double light_int = dot((vec3(light_dir), normal);

We first compute the normalized vector from the vertex to the light position (so that the result is pointing out of the object) which is the light vector, then compute the dot product between the light vector and the normal to that vertex, obtaining the cosine of the enclosing angle. As previously discussed, this value being 1.0 means bright highlight, it being 0.0 means pitch black.

This is all great, but this is the per-vertex light intensity only; we still need to find a way to fill polygons, and it seems that with fixed functionality we’re out of luck: the API offers flat shading, which is useless to us since intensity changes need to happen in the middle of polygons, smooth shading, which is not a solution either since by definition it doesn’t allow for sharp transitions, and texture mapping, but no thing such a light texture exists.

…Wait. Doesn’t it?

The texturing twist

Let’s take a step back. In texture mapping, texture coordinates are interpolated between vertices, and for each fragment a texture lookup is made using the interpolated coordinate. We can leverage this fact, building an indirection: we can’t convince the hardware interpolator to discretize its output value, but it’s very possible to build a texture which maps interpolated values to discretized ones.

void celTexture(void)
  /*texture array*/
  texture = (GLfloat *)calloc(256 * 3, sizeof(GLfloat));

  int i;
  for(i=0; i<256; ++i)
    GLfloat val;
    if (i < (4.0 / 32.0) * 256.0)
      val = 0.2;
    else if (i < (15.0 / 32.0) * 256.0)
      val = 0.5;
      val = 1.0;

    texture[i * 3 + 0] = val;
    texture[i * 3 + 1] = val;
    texture[i * 3 + 2] = val;

  glGenTextures(1, &tex_id);
  glBindTexture(GL_TEXTURE_1D, tex_id);
  glTexImage1D(GL_TEXTURE_1D, 0, GL_RGB, 256, 0, GL_RGB, GL_FLOAT, texture);

  glBindTexture(GL_TEXTURE_1D, 0);

This will be a 1D texture, so that it’s possible to access it using the single coordinate for each vertex which is (you guessed it at this point) the same light_int we have calculated per-vertex before.

In the picture you can see a representation of this mapping: above are the computed intensities, below the texture map that is used to discretize them.

Light intensities after discretization

In the end, we’ve found a way to bind the API to our willing. Actually, this is an effective but still cheap trick, because the hardware is simply interpolating our texture coordinates linearly between the vertices: this means that an intensity transition within a polygon will always be a straight line, and the slope will abruptly change between polygons. Still, it’s the best we can do with fixed functionality (without messing with the input geometry, of course).

Outlining the result

So, after computing intensities for each vertex and uploading this data as texture coordinates, we can render our object, but we still haven’t discussed how to draw the outline. OpenGL API is very helpful on this side, because we can just set a few renderer states to obtain the wireframe effect described in the “Cel Shading” section:

  // Draw the object here

  // Switch to outline mode
  glPolygonMode(GL_BACK, GL_LINE);

  // Draw the outline
  glDrawElements(GL_TRIANGLES, n, GL_UNSIGNED_INT, 0);

  // Be kind to your OpenGL state and clean up accordingly here

We are going to re-use this outline approach verbatim in the next sections: focusing on how to implement the shading is enough for a single article, and actually the code above already does pretty everything we could wish for outline rendering.

Vertex programs

History context

The first quantum leap in 3D graphics came when nVidia released the NV20 chip, commercially branded as GeForce 3, in 2001. It was special because it made vertex processing fully programmable, allowing the programmer to bypass part of the fixed functionality pipeline, through the introduction of vertex programs.

A vertex program is executed for each and every vertex processed by the graphics chip. What’s peculiar about vertex programs is that they are thought to be executed on the graphics board itself rather than on the host computer. This kind of programmability, which was a distinctive trait of CPUs only up to that point, is why the NV20 was advertised by nVidia as the first “GPU” (Graphics Processing Unit).

Vertex programs are exceptionally simple: one vertex gets in, one vertex gets out. They don’t create new geometry, they just modify the existing one. They receive the raw vertices as they are sent to the graphics card, and take care of treating the vertices after forwarding them to the fragment stage; the simplest vertex program is a no-op that forwards the vertices raw and untransformed (still in object coordinates), the most complicate vertex program is whatever you can come up with, respecting hardware limitation (getting to these in a minute).

All GPUs since the NV20 are capable to run several vertex programs in parallel, due to their execution being largely independent one from the other and requiring no external communication with the host program. So, programmers that moved vertex processing from the CPU to the GPU via vertex programs were also in for a performance gain.

This said, this performance gain was admittedly very small with early GPU models, whose focus was still on introducing programmability rather than delivering performance via parallel execution. Vertex programs also had a size limit, a common figure in the early days being 128 hardware instructions.

Vertex programs in OpenGL

OpenGL offers the ability to write vertex programs via the extension ARB_vertex_program . Faithful to the stateful and bind-to-use nature of OpenGL, you need to explicitly enable vertex program mode and bind to your vertex program of choice.

They have access to the OpenGL state, which is useful for example to read the current transformation matrix (remember: you still have to do the transformation yourself!), and they can be fed parameters from the main program via the OpenGL API.

This is a selection of vertex attributes accessible by the vertex program:

  Vertex Attribute Binding  Components  Underlying State
  ------------------------  ----------  ------------------------------
  vertex.position           (x,y,z,w)   object coordinates
  vertex.normal             (x,y,z,1)   normal
  vertex.color              (r,g,b,a)   primary color
  vertex.fogcoord           (f,0,0,1)   fog coordinate
  vertex.texcoord           (s,t,r,q)   texture coordinate, unit 0
  vertex.texcoord[n]        (s,t,r,q)   texture coordinate, unit n
  vertex.attrib[n]          (x,y,z,w)   generic vertex attribute n

This kind of programmability brought great advances in the 3D graphics field. Animation techniques like key-frame interpolation or vertex skinning suddenly became much easier to implement, now that the programmer had full control over the vertex stage.

The language used to write vertex program is an assembly dialect. To give you an idea of the potential, this is the full instruction set of the 1.0 version of the language:

  Instruction    Inputs  Output   Description
  -----------    ------  ------   --------------------------------
  ABS            v       v        absolute value
  ADD            v,v     v        add
  ARL            s       a        address register load
  DP3            v,v     ssss     3-component dot product
  DP4            v,v     ssss     4-component dot product
  DPH            v,v     ssss     homogeneous dot product
  DST            v,v     v        distance vector
  EX2            s       ssss     exponential base 2
  EXP            s       v        exponential base 2 (approximate)
  FLR            v       v        floor
  FRC            v       v        fraction
  LG2            s       ssss     logarithm base 2
  LIT            v       v        compute light coefficients
  LOG            s       v        logarithm base 2 (approximate)
  MAD            v,v,v   v        multiply and add
  MAX            v,v     v        maximum
  MIN            v,v     v        minimum
  MOV            v       v        move
  MUL            v,v     v        multiply
  POW            s,s     ssss     exponentiate
  RCP            s       ssss     reciprocal
  RSQ            s       ssss     reciprocal square root
  SGE            v,v     v        set on greater than or equal
  SLT            v,v     v        set on less than
  SUB            v,v     v        subtract
  SWZ            v       v        extended swizzle
  XPD            v,v     v        cross product

(“v” denotes vectors, “s” scalar input, “ssss” a scalar output replicated in all the four components of a vector, and “a” is an address).

As you could expect, there are dedicated instructions for common operations in computer graphics (dot/cross product, multiply and add, etc…) and vectors are first-class citizens in this instruction set.

Vertex program syntax include swizzling, which allows to specify arbitrary read/write masks for vector components: for example, should you want to reverse one vector’s components, you could use

MOV vec, vec.wzyx

Or, to replicate one component of vec1 all over vec2:

MOV vec2, vec1.xxxx

(the implicit default for swizzling is of course .xyzw). Swizzle operations used to come at little to no cost performance-wise. For more advanced swizzling magic, consult the documentation of the dedicated SWZ instruction.

The experienced reader will notice the lack of a branching instruction. This makes sense considering that those programs run on execution units whose pipelines are an order of magnitude longer than CPU’s: in this scenario the penalty for a failed branch prediction (or worse, waiting to know which branch should be taken) is so high that early GPUs shipped with no branching support at all.

Even today, while branching on CPU is somewhat of a solved problem thanks to accurate branch prediction systems, branching on GPU should still be a matter of second thoughts. The interested can find a more in-depth discussion on this topic on chapter 34 “GPU Flow-Control Idioms” of GPU Gems 2.

Back to our torus.

Programming the torus

Now that you’re convinced that vertex programs were the big thing in graphics programming during the first 2000’s, it’s time to see what they can do for our cel-shaded torus. Cel-shading has more to do with the fragment stage than with the vertex one, so having vertex programs at our disposal isn’t going to change our light texture approach that we used with fixed functionality; still, we would like to offload some of the involved calculation to the GPU via a vertex program.

The light intensity calculation is an obvious candidate: it is run for every vertex, it manipulates one vertex attribute (the texture coordinate), and it requires an external parameter from the application (light’s position). Without much further ado, let’s step into our vertex program:


A vertex program starts declaring what specification of the language it conforms to. In this case, we’re using ARB-flavour vertex program, version 1.0.

# Current model/view/projection matrix
PARAM mvp[4]    = { state.matrix.mvp };

# Position of light
PARAM lightPosition = program.env[0];

Parameters are read-only values that stay constant during the execution of the vertex program, that represent states of the host application. In this case we access the current model / view / projection matrix, which is at our disposal as part of OpenGL state, and the current light position, that we’ll specify as a vertex program parameter from the host application.

# Transform the vertex position
DP4 result.position.x, mvp[0], vertex.position;
DP4 result.position.y, mvp[1], vertex.position;
DP4 result.position.z, mvp[2], vertex.position;
DP4 result.position.w, mvp[3], vertex.position;

The first operation of our vertex program is, unsurprisingly, the model / view / projection transformation. The four instructions above essentially perform a matrix/vector multiplication: in detail, each column of the transformation matrix is multiplied with the input vector position, yielding the transformed vector position. “result” is a reserved keyword for the vertex program output.

To better understand the rest of the vertex program, let’s look at the fixed functionality code we write before. This was the first step of calculating light intensity, find the light vector:

vec4 light_dir = normalize(light_pos - vertex);

Which translates to this vertex program code:

TEMP lightVector;
TEMP normLightVector;

# Ray from vertex to light
SUB lightVector, lightPosition, vertex.position;

# By-the-book normalization for lightVector
DP3 normLightVector.w, lightVector, lightVector;
RSQ normLightVector.w, normLightVector.w;
MUL, normLightVector.w, lightVector;

TEMP is used to define temporary vector variables, then we compute the light vector from the current vertex using the SUB instruction.

Now, why the following three instruction actually yield the normalized version of lightVector? Normalizing a vector is performed by dividing each component of it by the length of the vector itself. Now, because the first two instructions calculate the reciprocal of lightVector length (recall the properties of the dot product), the last MUL instruction is actually performing the normalization and writing the result to the xyz subvector via swizzling.

# Store intensity as texture coordinate
DP3 result.texcoord.x, normLightVector, vertex.normal;
MOV result.color, {1.0, 1.0, 1.0, 1.0};


It’s downway from now on: we calculate the dot product between the vertex normal and the light vector, and write the result to the final texture coordinate.

To run this vertex program, we need to run through the familiar generate and bind mechanism of OpenGL, then compile the plain text program via glProgramStringARB:

glGenProgramsARB(1, &vp_id);
glBindProgramARB(GL_VERTEX_PROGRAM_ARB, vp_id);
                   strlen(program), program);

Later, in our drawing code, we enable vertex program mode and select our vertex program to be the current one. Now the circle can be closed by specifying the current light position as a parameter. All vertices processed from now on will pass through our vertex program.

glBindProgramARB(GL_VERTEX_PROGRAM_ARB, vp_id);
glProgramEnvParameter4fvARB(GL_VERTEX_PROGRAM_ARB, 0, &rotated_light_pos[0]);

Wrapping it up

So far we've discussed the first step of OpenGL evolution that enabled programmers to tap into the new possibilities offered by programmable hardware. We'll pick up from where we left in part 2, discussing the introduction of a fragment programmable stage and how it was exposed in OpenGL.