> However, using a transparent color significantly slowed down the number that can be drawn too which doesn't make as much sense to me. I'd imagine that with hardware today transparency should be somewhat free.
That's because transparency limits how you can batch draws on the GPU. With opaque draws, you can use the depth buffer and draw in any order you like e.g. maximizing batching. With transparency, you need to draw things in the right order for blending to work (painters order).
I think it's more complex than that. While web browsers do use GPU rendering, they're not game engines. They don't draw every single object on the screen every frame, that could easily cause lag on a large complex page.
Chromium in particular tries to minimize the total number of layers. It renders each layer into a pixel map, then for each frame it composites all of the visible layers into the final image.
That works really well in practice because you often have lots of layers that move around but don't actually change their pixels. So those don't need to be rendered (rasterized) every frame, just composited.
If you have a bunch of box shadows without transparency, Chromium will rasterize the whole thing once as a single layer.
If you have a bunch of box shadows with transparency, Chromium might create a separate layer for each one. That's probably suboptimal in this particular case, but imagine if your partially transparent box shadows had to also slide around the page independently.
I think the above was a simplification. GUI rendering is a good example of jack of all trades, master of none. It doesn't use tight render loops like game engines but it much more flexible in terms of UI possibilities.
There is also the issue that GPU's are oddly terrible at generating 2D elements of which a desktop has thousands of them. There are things like Glyph caching but they can only go so far.
Having the CPU doing the majority of the work with a few rasterization tasks to the GPU makes sense.
> There is also the issue that GPU's are oddly terrible at generating 2D elements
I never knew this and honestly I’m a bit baffled - 2D seems simpler on its face, is it just that GPUs have been made with gaming in mind (typically) and most “hardcore gamers” are playing 3D games?
I can't find it right now, but I recall reading an article years ago where a newer, more powerful generation of GPUs turned out to have much less hardware for 2D operations/calls, and a fraction of the 2D performance, but the performance in 3D games went up so nobody really noticed.
Games are expected to have sole access to your machine, so if they use all the CPU/GPU resources, nobody cares. If my web browser was burning up my battery re-rendering the page 90 times a second, I'd be livid.
I think “consumes too much resources” is a valid concern. But I think “could cause lag because too many objects” is totally bogus. Rendering a web page isn’t that complicated.
I wouldn’t have thought so either, and then I learned the order in which CSS rules are applied and I’m convinced browsers have some of the most complex rendering engines out there.
> Games are expected to have sole access to your machine, so if they use all the CPU/GPU resources, nobody cares. If my web browser was burning up my battery re-rendering the page 90 times a second, I'd be livid.
Games 20 years ago had sole access to machines that were much less powerful than even partial access to today's machines.
And eg Nintendo Switch games (or games on the Steam deck, or just mobile phone games) still deal with power limitations; people are very aware when their games burn through their batteries.
Reverse-painter's-order beats painter's-order since it lets you skip fully-occluded objects:
Start with a buffer that's fully transparent (α=0.0)
for each face from front to back:
for each pixel of the face:
draw the pixel, blending 1.0-buffer.α of the new pixel into whatever's already in the buffer
(if buffer.α == 1.0 you can just skip it entirely, just like for depth buffering)
go back and double check your math relating to transparent objects behind other transparent objects
The tricky part is if you have faces that overlap in a cycle (can happen legitimately), or that penetrate each other (often avoidable if you think about it).
The game engines I've dealt with separate opaque and transparent geometry.
It is generally good to render opaque geometry back to front to reduce overdraw, but not going so far as sorting the objects. We would do stuff like render the hands first in an FPS or render the skybox last in most games.
Now for the transparent layer: First occlusion is handled by the z-buffer as usual. If you render from front to back I assume you render to another buffer first and then composite to the framebuffer? If you render from back to front you don't need alpha in your framebuffer and can assume each rendered pixel is opaque, not needing that composite.
There's also order independent transparency stuff though which IIRC does need another buffer, which requires a composite but then saves you having to sort the objects.
I could be wrong, but I remember folks that worked on Dreamcast games that loved how you could just throw the geometry at it in any old fashion you liked and the GPU would just sort it all out as needed. Transparencies and all.
GPUs also do not like overdraw, so it's generally good idea to avoid having many transparent elements on top of each other, its also the reason why drawing more triangles vs. transparent texture is generally better.
My big take away with the whole City Skylines 2 performance issue and the lack of LOD was that geometry processing is so cheap nowadays. So long as you aren't too reckless with geometry in terms of sub-pixel rendering, you don't really have to worry about it too much any more.
It isn't like the Ps2 era when geometry time was a real concern on render times. Even a modern low end GPU could process a few hundred million polygons a second without sweating it, now getting the result son screen is a very different issue.
Yeah, PS2's Graphics Synthesizer had a fill rate of 1.2 GB/s. For comparison, the OG Xbox had 0.932 GB/s, and the GameCube had 0.648GB/s. Assuming only 1 texture here.
The Xbox was released 1 year later, for context.
Sony also demo'd the GSCube once, it had 16 Graphics Synthesizers, achieving a fill rate of 37.7 GB/s (no textures, half that with 1... I think). Eventually they ditched the idea in favour of Nvidia's solution.
The "Ordering" step doesn't really matter that much. You're usually doing a sort anyway prior to submitting the drawcall. What hurts is the overdraw. If you're doing opaque rendering, you get to render front to back, rendering only what actually appears on the final framebuffer. The number of pixels (after the depth pass) is proportional to the framebuffer. When you're doing transparent rendering you render back to front, and you have to render a tonne of the scene that will eventually be (partially) obscured by other random polys. We call that overdraw. The amount of pixels through the shader pipeline balloons to be proportional to the size of your mesh.
If you're doing non-overlapping stuff, you'd actually expect (almost) no slowdown from transparency, since you'd have to touch every pixel once anyway, and the only thing that changed is the shader formula.
One more thing to consider is memory bandwidth, which can be limiting factor especially on mobile devices.
A non-transparent draw over another draw allows in best case to cull all overlapping drawing operations, in worst case means you only have to use as much bandwidth as the individual draws.
With transparency, especially if you can't somehow combine the operations together (from my understanding, very hard problem), it means you also need to read back the entire region you're overlapping with transparency - so every transparent draw involves at least twice the final framebuffer size bitmap going-over memory bus.
Now consider that many mobile devices had not enough memory bandwidth to do a full redraw (i.e. blit the full framebuffer image twice) of the screen in time to maintain 60fps and it becomes a considerable problem.
Yeah, somehow between comments I forgot this was about shadows and I was thinking more about drawing polygons. In that case, you can break up the polygons and work out the colors for each of the (theoretically 2^N) regions of overlap.
That's because transparency limits how you can batch draws on the GPU. With opaque draws, you can use the depth buffer and draw in any order you like e.g. maximizing batching. With transparency, you need to draw things in the right order for blending to work (painters order).