It needs to be done in the renderer. I think it's doable though, the FidelityFX library looks like it can be ported, it'll just run a bit slow because of the lack of subgroups. This particular library isn't based on a fancy scan implementation, as the state-of-the-art CUDA implementations are. There's a bit more followup in the linked Zulip thread.