I’ve been meaning to write about this for a while now. I added a few more optimisations and extensions to the Tile-Based Forward Rendering demo I wrote back in March. I’ve inlined the updates in the actual main article, but I’ll outline the changes here for those who have already read it.
The first significant change is that I’ve switched to a Compute Shader for the list generation. I used a line-sphere test in the Compute Shader to determine the maximum and minimum depth values at each corner of each tile. I added lights to either the opaque or transparent buffers based on whether the light intersected this frustum section.
The speed increase by using the CS is easily by an order of magnitude. Beforehand I was primitive assembly bound due to the number of spheres that I had to render, whereas running a compute shader (clipped to the bounds of the light in screen space) was much more efficient. Using the line test also meant I didn’t have to grow my spheres by a tile, which previously meant I was getting conservative light addition in some cases.
The second change follows on from this. Because most of the scene is opaque geometry, I optimise this case by building a separate light buffer for opaque surfaces. This means my light linked-list generation shader generates two separate lists for opaque-affecting lights and transparent-affecting lights. Because transparent objects could have any depth less than the value in the depth buffer, all lights that pass the depth test need to be recorded. However, for opaque objects, we can reject lights that we know don’t intersect anything.
For each tile I accumulate a minimum and maximum depth value. If a light doesn’t intersect [0, MaxZ] then it’s culled completely. If it does, it gets added to the TransparentLightBuffer. If it also falls between [MinZ, MaxZ], I add it to an OpaqueLightBuffer. Obviously this breaks down if a tile has a large depth range, but it’s still a significant saving in the general case.
Finally, because we want the opaque materials to be as fast as possible, I serialise the lighted linked lists for each tile into a flat array using another CS. This has the obvious drawback of having to allocate a fixed maximum number of lights per tile, but it nets a huge speed win.