Essentially it wouldn't involve any host/memory transfer, all would be processed on GPU.
Then you could limit your rendering thread to 60fps while running the CUDA kernel non-stop.
I haven't actually tested this out though because I had 2 GPUs for the sims above, and the beefy one running the sim was on TCC instead of WDDM mode (no attached display allowable.) [1] So I had the universe state buffer transferred to host memory, and then to the 2nd GPU for rendering to attached display.
I am not sure of the speed gains TCC vs WDDM really provides, but Nvidia says it makes "some difference."
If you did want to increase performance, you could try to use OpenGL for rendering but write your actual sim in CUDA C++.
The whole interop API is listed here: https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART_...
Essentially it wouldn't involve any host/memory transfer, all would be processed on GPU.
Then you could limit your rendering thread to 60fps while running the CUDA kernel non-stop.
I haven't actually tested this out though because I had 2 GPUs for the sims above, and the beefy one running the sim was on TCC instead of WDDM mode (no attached display allowable.) [1] So I had the universe state buffer transferred to host memory, and then to the 2nd GPU for rendering to attached display.
I am not sure of the speed gains TCC vs WDDM really provides, but Nvidia says it makes "some difference."
[1] https://docs.nvidia.com/gameworks/content/developertools/des...