> What costs more, generating a random number and maybe a second or a random number and a sin/cos
You’re adding a very unpredictable loop branch and a bunch of data dependencies doing that. For GPUs (and maybe even CPUs), the sqrt way is almost certainly faster.
If you have ALUs to spare, you can generate multiple points in parallel, say 4, and it is likely that at least one is inside the disk and you can select it branchlessly. Then you only need to loop in the very very rare case.
You’re adding a very unpredictable loop branch and a bunch of data dependencies doing that. For GPUs (and maybe even CPUs), the sqrt way is almost certainly faster.