Monday, December 27, 2010

Need for Speed

Right, how many kilograms of beef, potatoes, ice-cream, chocolate saus and bread did you eat last weekend? Enough to promise losing some weight in 2011? Ah Christmas... All those magical cozy lights, Wham! music, and not to forget: Home Alone, Critters and Gremlins 1..18. But the best memories are probably those of unwrapping Command & Conquer, Goldeneye (n64), Zelda OOT / Majora's Mask. Each year my little brother and I would nervously wait for out next game. Inspecting all the packages beneath the tree and knowing the exact dimensions / weight of a N64 game, we already knew which box to keep an eye on weeks before 24 December.

The time of getting has transformed into giving presents. Hence, I wouldn't even know what to ask anymore. The downside of getting older is also getting more spoiled. At least where I live. What do I need anyway? A working computer, a chair, a bike to go to work. Clothes maybe... There is more joy in buying Shrek for our daughter, or giving a Blu-Ray player to grandpa.

However... I realized my videocard was pretty old again. Bought it end 2007, so that
is ~25 in dogyears, and 2.435 BC in hardware-years. In other words, extremely old in
hardware-land, where videocards older as fast as they render pixels. So, after donating Santa some money, a shiny box with a EVGA GeForce 4700 GTS came in. And damn, it even worked right after replacing it with the older card. Our family has a long history of fooling around with computer parts. Dad never bought a complete (working) system. 4 MB RAM here, a 60 Hz processor there. A 0Kb modem elsewhere, etcetera. And of course, it NEVER worked. Had to travel the entire country with dad to get computer parts in the summer of 1994, waiting weeks and weeks before I could finally play Doom2 (with PC-speaker).

Was it worth the money? Hmmm, I can imagine there are more useful things in life, but:

- Tower22 on GeForce 8800 GTS (640 MB) : ~30 FPS
- Tower22 on EVGA GeForce 4700 GTS : ~56 FPS

Almost doubled, pulled the T22 Engine out of the mud. But don't worry, we'll bring that card down to its knees again in no-time, begging for mercy. More light, realtime volumetric lightshafts/fog and updated Ambient lighting are on the menu.

Work in progress: improved volumetric light. Not blurring a bright spot, but raytracing through space to see "how much particles" were lit. The lower-left corner shows the lightshaft-buffer.

FBO Sandwich
Talking about speed. As a programmer, I'm sure you wondered several times
"How the hell can Crysis/Halflife/... run that fast on my machine, while my game runs like a crippled grandma?"
Ifso, here a last programmers advise for 2010.

After the transformation into the Inferred rendering pipeline we discussed earlier was completed, the speed dropped from ~30 to ~22 FPS (on the old card). Inferred Rendering has slight more overhead, but that drop was ridiculous. Where did we go wrong?! Bad shaders? Maybe the new shadowMapping storage technique (I'll discuss that another time)?

Then an old fiend flashed by; Captain Framebuffer. In OpenGL terms, a FBO is a collection of targetbuffers you can render on. Well, with all those (background) techniques, we change that FBO plenty of times. But as I discovered years ago, when playing around with shadowMaps for the first time, mistakes are easily made. A wrong switch or MRT setting, and your engine neck snaps like a lucifer stick. Here a few important advises:

Try to prevent switching resolutions
Switching targets always takes time, but especially when hopping from one resolution to another. For example, a pipeline might do this:

- render to 4 1024 x 768 textures for Deferred input
- render to a 256 x 256 texture for a light shadowMap
- render to a 512 x 512 SSAO buffer
- render to another 1024 x 768 texture for depth
- render to a 512 x 512 DOF input buffer

Five switches. With all those techniques, switching is enevitable. But at least you can order things better:

- render to 4 1024 x 768 textures for Deferred input
- render to another 1024 x 768 texture for depth
- render to a 512 x 512 SSAO buffer
- render to a 512 x 512 DOF input buffer
- render to a 256 x 256 texture for a light shadowMap

See that? Only 3 switches instead of 5.


Make a FBO for each resolution
Not 100% sure about this, but people say it's best to make a FBO for each possible resolution, instead of only changing the rendertarget for a single FBO. In the example above, we would need 3 FBO's; 1024 x 768 , 512 x 512 and 256 x 256. Each one can have it's own depth buffer.

Atlas renderTargets
Use bigger atlas textures to perform multiple passes in a single buffer texture. When having to blur or downscale, you quickly end-up with a large number of different resolutions. For example, the HDR technique requires to downsample the average luminance of the screen contents. At first, my engine would do this:
1.- Render luminance values to a 128 x 128 texture
2.- Downscale step 1 texture to 64 x 64
3.- Downscale step 2 texture to 16 x 16
4.- Downscale step 3 texture to 3 x 3

4 switches. But you could also perform everything in a single, larger buffer.

Only 1 switch, ow hurray! I also render all shadowMaps in a single large atlas texture by the way, but I'll give details about that another time.


After simply re-ordering the passes, to reduce the amount of FBO switches, the framerate was restored.

Don't worry. New content will come. In 2011!

No comments:

Post a Comment