I’ve been meaning to write about this for a while now. I added a few more optimisations and extensions to the Tile-Based Forward Rendering demo I wrote back in March. I’ve inlined the updates in the actual main article, but I’ll outline the changes here for those who have already read it.
The first significant change is that I’ve switched to a Compute Shader for the list generation. I used a line-sphere test in the Compute Shader to determine the maximum and minimum depth values at each corner of each tile. I added lights to either the opaque or transparent buffers based on whether the light intersected this frustum section.
The speed increase by using the CS is easily by an order of magnitude. Beforehand I was primitive assembly bound due to the number of spheres that I had to render, whereas running a compute shader (clipped to the bounds of the light in screen space) was much more efficient. Using the line test also meant I didn’t have to grow my spheres by a tile, which previously meant I was getting conservative light addition in some cases.
The second change follows on from this. Because most of the scene is opaque geometry, I optimise this case by building a separate light buffer for opaque surfaces. This means my light linked-list generation shader generates two separate lists for opaque-affecting lights and transparent-affecting lights. Because transparent objects could have any depth less than the value in the depth buffer, all lights that pass the depth test need to be recorded. However, for opaque objects, we can reject lights that we know don’t intersect anything.
For each tile I accumulate a minimum and maximum depth value. If a light doesn’t intersect [0, MaxZ] then it’s culled completely. If it does, it gets added to the TransparentLightBuffer. If it also falls between [MinZ, MaxZ], I add it to an OpaqueLightBuffer. Obviously this breaks down if a tile has a large depth range, but it’s still a significant saving in the general case.
Finally, because we want the opaque materials to be as fast as possible, I serialise the lighted linked lists for each tile into a flat array using another CS. This has the obvious drawback of having to allocate a fixed maximum number of lights per tile, but it nets a huge speed win.
Nice one Pete! I did some experiments with tiled forward too – including serialization on the light list. Did you actually get any speedup with this? In my tests (on NVIDIA hardware – GeForce 540M in my laptop) it actually got a bit slower (standard pass that utilizes the list is pretty much the same speed, but there is a small overhead of the CS that does the list serialization). I was curious how this looks on ATI or even just regular high-end desktop hardware, with faster shader units, where texture cache stalls might be more observable.
Yeah, I can’t remember the specific numbers but it was a significant win on my Radeon HD 5700. I was drawing significantly more opaque than transparent stuff. I’ll have to get some solid numbers again once I’ve finished tinkering!
Dont let the packaging fool you into buying something that may damage your
hair. Amla is used in the preparation of a highly effective natural shampoo by mixing 100 g each of amla, nuts and Shikakai soap in two quarts of water,
the mixture is boiled for half an hour to simmer and then can
be used as shampoo for one month for all hair treatment throughout the year.
Deemark herbal hair care oil have been traditionally used to treat irritated
stressed scalp, reduce effect of aging on hair shape and growth, combat seborrhea and alopecia.
In this manner you can choose the herbs that are right for you.
Many people use hair greases to keep their hair moisturized.