It's not quite clear to me why *rasterising* of all things is slow. I realise GP...

sparky · on Jan 3, 2011

Rasterization is slow when you have lots of small (often sub-pixel) triangles. Mapping a big triangle into screen space is fast, but mapping many triangles per pixel (and hopefully throwing out the 99+% of the geometry that won't show up at all) into screen space and blending them together in a convincing way takes a while.

Real-time graphics does the best it can with a few big triangles (by e.g. texturing and bump-mapping them) out of necessity, but people are moving towards cases that are harder on the rasterizer.

ttsiodras · on Jan 3, 2011

Rasterization works per triangle - which means that each CUDA thread is working on a triangle. Which means that by definition, each thread reads/writes to a completely different place in the Z-buffer and reads from a completely different place in the shadow buffer. Hence the abyssmal speed I experienced with my CUDA rasterizer, and why I switched to CUDA raycasting.