Tuesday, March 27, 2012

UBO Wan Kenobi


Nothing to do with this entry, but some of the current programming progress: Screw up the repeating patterns by (vertex)painting details on your textures!

I thought it would be nice to have some nerd-talk between all those "Making of" posts. Tonight’s special guest: UBO's. No, UBO's are not a new shoe brand or Unidentified-Blowjob-Objects. Though "Uniform Buffer Objects" kinda sounds the same. If you make extensive use of shaders -and any modern game engine does- you may have noticed you are passing quite a lot of parameters to these shaders. For example, pretty much all shaders that involve lighting or reflections somehow, need to know the camera position and/or light properties such as color, falloff, position or its projection matrix in case of spotlights.

Tower22 has a few hundred different shader programs (build from ubershaders based on selected options). And the number is growing as the amount of options grows. Normally, I would need to pass parameters such as the camera position or light props again for each shader. Feels like a waste of time, since those values are the same for all shaders. Or how about shaders that need a large array of parameters? For example, you may want to do all lights at once in a single shader program. And “Vertex Skinning” (animating via the GPU) is also a technique that requires to know a big number of matrices (or quaternions) somehow. It's perfectly fine to pass all those parameters one by one before the rendering starts, but it's not the fastest way. Too bad a shader can't grab data from a fixed location somewhere in the videocard memory... Or can they...

If those issues sound familiar, UBO's can be your angel in darkness, the spray can in a stinky toilet. Not only they may give a (slight) performance boost, it also allows programming in a more natural way with structs and stuff. Instead of passing all those parameters individually, you make a data buffer. Which is basically just a block of vectors(float4) stored somewhere in the videocard memory. Pretty much the same idea as texture buffers or Vertex buffers:

1- Make a buffer
2- Fill it with data (once, or at the start of each renderCycle, or whenever an update is needed)
3- Define the same buffer in your shadercode. As an array or struct for example.
4- At creation, get the buffer-parameter from your shader and link it with the block you made earlier

In other words, the parameters have been moved from the CPU/RAM to the GPU/Videomem so we don't have to pass (and slowdown the pipeline) all the values anymore. Down below you can find a simple implementation, using OpenGL and Cg. When using GLSL, it works almost the same though. Probably even simpler.



Preparations
Before hitting your coding typewriter, make sure your drivers are prepared though. First of all, if you are using Cg (like me), download Cg 3.1 or higher. And if you are in love with OpenGL, make sure you are using OpenGL 2.x at least. AND, you may need to update your videocard drivers as well!! I tried and I tried, but making a buffer the pure OpenGL way caused crashes. Then I realized the laptop videocard comes from 2009. UBO's weren't even born back then, or at least still pooping their diapers. A videocard driver update did the magical fix in my case.

Unless you are using up-to-date headers, you may need to define some new OpenGL functions first. Well, I'm using Delphi so don't count on up-to-date functions. Ditto for the Cg libraries. Luckily adding new functions is pretty simple. If you can’t find a particular function, just search for it and the OpenGL docs tells you exactly how the function works, what it returns, what parameters to give, et cetera. Also good to know, the nVidia OpenGL10 SDK has some examples as well.



Example: lighting with structs
---------------------------------
Now let's make a practical example. Lights. Our space-ice-hockey game uses a number of simple pointlights, let's say 128 at max. Several shaders such as the environment and character shaders need those lights. We could define all lights as follow:

struct PointLight
{
float3 position;
float range; // Falloff distance
float3 diffuseColor;
};
// ! Don't use this code! There is a problem with the layout, I'll explain below

struct SceneLights
{
int lightCount;
PointLight[128] light;
} _sceneLights;

With such a struct, we could do the lighting inside a shader as follow:

for (int i=0; i < _sceneLights.lightCount; i++)
{
PointLight light = _sceneLights[i];

float attenuation = getAttenuation( pixelPos, light.position, light.range );
float3 diffuse = saturate( dot( pixelNormal,
normalize(light.position - pixelPos) ) );
diffuse *= light.diffuseColor.rgb * attenuation;
totalDiffuse.rgb += diffuse;
} // for i

And chaps, don't forget you can also still pass traditional parameters as lookup indices. This can be useful when rendering lots of stuff in a single breath, when using instancing for example.

... uniform int myID )
{
MyData d = dataArray[ myID ];


1 Making the buffer:
---------------------------------
Pretty cool huh? Step1 is to make a buffer. Simple stuff:

{ Create an empty buffer }
glGenBuffersARB( 1, @ubo.glHandle );
glBindBufferARB( GL_UNIFORM_BUFFER, ubo.glHandle );
err := glGetError;
if err <> 0 then
ubo.glHandle := 0; // Arh! Check your drivers matey

{ Size & Fill it (or pass NULL if you don't want to fill it yet) }
if isDynamic then
glBufferDataARB( GL_UNIFORM_BUFFER, byteSize, dataPtr , GL_DYNAMIC_DRAW_ARB )
else
glBufferDataARB( GL_UNIFORM_BUFFER, byteSize, dataPtr , GL_STATIC_DRAW_ARB );

glBindBufferARB( GL_UNIFORM_BUFFER, 0 ); // Detach
{ Make a Cg Buffer }
ubo.cgHandle := cgGLCreateBufferFromObject( cgContext, ubo.glHandle, CG_FALSE );


Some notes. First, you can define how your buffer will be used with the GL_DYNAMIC_DRAW_ARB parameters. I showed 2 variations, but there are more tastes, check the OpenGL documentation. Basically you need to decide how often the buffer will get updated? Only once? Each cycle? Even more?
* Another note, at the bottom I'm making a Cg specific variant of this buffer to use it with Cg shaders. If you use GLSL or something else, you can skip that line. Another note to self, Need to buy milk for tomo... wait, nevermind.


2 Filling the buffer & Layout:
---------------------------------
We already saw glBufferDataARB being used to fill the buffer. In this case it would be a pointer to struct(s) I showed earlier. You can update (sub)contents with

// GLSL
glBindBufferARB( GL_UNIFORM_BUFFER, ubo.glHandle );
glBufferDataARB( GL_UNIFORM_BUFFER, byteSize, dataPointer, );
glBufferSubDataARB( GL_UNIFORM_BUFFER, byteOffset, byteSize, dataPointer );
// Cg
cgSetBufferData( ubo.cgHandle, byteSize, dataPointer );
cgSetBufferSubData( ubo.cgHandle, byteOffset, byteSize, dataPointer );

There are some catches though. First of all, you can't just mix datatypes like I did (float3, float, int, ...). If you do, you may get weird results. Correct me if I'm wrong, but by default OpenGL expects the datablock to use the std140 layout for formatting. There are documents out there describing this. But if you take the lazy path like me, just make sure everything is using float4 (or float4x4 for matrices) or int4 types:

struct PointLight
{
float4 positionRange; // XYZ W = Falloff distance
float4 diffuseColor; // RGB A = not used
}; // 2 x 16 = 32 bytes

struct SceneLights
{
int4 lightCount; // X = pointlight count
PointLight[128] light;
} _sceneLights;
// 16 + 128 * 32 = 4.112 bytes


Yes, that may give some overhead (unused fields on the color and count variable). You don't have to use this way of formatting, but filling the buffer get's a whole lot more difficult then, as you need to know the offsets for each variable in that case. OpenGL and Cg have functions to calculate those btw.

Second rule, be aware there is a maximum size. For now, 4096 float4's to be more precise. That means I could define up to 2047 lights (don't forget the in4 lightCount variable) in the example above, as each pointlight takes 2 float4's. If that is not enough for you, you can bind multiple UBO's at the same time. You could make one UBO with all pointlights, another one with all spotlights, and so on.
* Oh, and Delphi boys, don't forget to pack your records ( TPointLight = packed record )!

3 Defining the structs in your shader
---------------------------------
Depends on the language you use, but it's pretty much the same as you did in C++, Delphi, or whatever it is you are using. Below a Cg example. The "BUFFER[x]" is an optional addition that tells Cg on which fixed "slot" the buffer is bound. Like textures, you can bind up to 32 (I think) UBO's at the same time. If you don’t care about the specific index, just type “: BUFFER;”.


4 Final step, connect shader parameters with the UBO's
---------------------------------
Not sure how it's done with GLSL, but with Cg you need to find the UBO parameter first, and pass the cgHandle we got before with cgGLCreateBufferFromObject(). For each program that uses UBO's:

var uboParamHandle : CGParameter;
begin
uboName := 'SceneLights';
uboParamHandle:= cgGetNamedProgramUniformBuffer( programHandle, pchar(uboName) );

I bet there are more ways to wire up the whole thing, but this is at least one of them. Anyway, now that we have the parameterHandle, we can pass the UBO:
cgSetUniformBufferParameter( uboParamHandle, cgBufferHandle );

As said, use the bufferHandle we got via cgGLCreateBufferFromObject(), thus not the one we got from OpenGL with glGenBuffers(). Unless you have crazy ideas, you only have to pass this value once by the way. So typically that would be at the start. Some final important notes (and maybe I'm doing something wrong):

* When defining large arrays in your shader, the compile time can get a LOT longer. It seems the entire array gets unwrapped.
* That's why I suggest to pre-compile the shaders and load those as long as you didn't make changes.
* Too bad cgGetNamedProgramUniformBuffer() does not seem to work with pre-compiled shaders... I think this is a bug in the Cg3.1 library, so I asked on the nVidia forums... with no result yet.

May the UBO be with you

1 comment: