Hacker News new | past | comments | ask | show | jobs | submit login

It's not quite clear to me why rasterising of all things is slow. I realise GPUs have a separate rasterisation unit, but other than that, the ALUs are designed for this type of workload. I haven't experimented with latter-era GPGPU APIs and languages, but random memory access in a basic rasteriser sounds suspicious. Bouncing rays? Sure, that'll destroy any locality of reference, but mapping triangles into screen space? No way.



Rasterization is slow when you have lots of small (often sub-pixel) triangles. Mapping a big triangle into screen space is fast, but mapping many triangles per pixel (and hopefully throwing out the 99+% of the geometry that won't show up at all) into screen space and blending them together in a convincing way takes a while.

Real-time graphics does the best it can with a few big triangles (by e.g. texturing and bump-mapping them) out of necessity, but people are moving towards cases that are harder on the rasterizer.


Rasterization works per triangle - which means that each CUDA thread is working on a triangle. Which means that by definition, each thread reads/writes to a completely different place in the Z-buffer and reads from a completely different place in the shadow buffer. Hence the abyssmal speed I experienced with my CUDA rasterizer, and why I switched to CUDA raycasting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: