I was really confused for a moment, because the article mention CUDA a lot, whic...

amelius · on Oct 20, 2020

Yes, it would be great if the .jl source code didn't even mention CUDA (but right now it does, with statements such as "using CUDA" and "CuArray(...)".)

maleadt · on Oct 20, 2020

Yes, that's fair. I focused on CUDA.jl because it is the most mature, easiest to install, etc. but as I mentioned we're actively working on generalizing that support as much as possible, and as a result support for AMD (AMDGPU.jl) and Intel (oneAPI.jl) GPUs is rapidly catching up.

soganess · on Oct 20, 2020

This is a complete novice, ill informed, question. So forgive it in advanced, but why have an AMD specific backend at all? Couldn't you just use AMD's HIP/HIP-IFY tool on the CUDA backend and get an AMD friendly version out?

https://github.com/ROCm-Developer-Tools/HIP

I realize these sort of tools aren't magic and whatever it spites out will need work, but it seems like a really good thin starting place for AMD support with a lower overhead for growth.

After the original CUDA bits can ""cross-compile"", the workflow is greatly reduced, right?

Workflow:

- update CUDA code

- push through the HIPIFY tool

- Fix what is broken (if you can fix it on the CUDA side)

After enough iterations, the CUDA code will grow friendly to HIPification...

jpsamaroo · on Oct 20, 2020

> This is a complete novice, ill informed, question. So forgive it in advanced, but why have an AMD specific backend at all? Couldn't you just use AMD's HIP/HIP-IFY tool on the CUDA backend and get an AMD friendly version out?

HIP and HIPify only work on C++ source code, via a Perl script. Since we start with plain Julia code, and we already have LLVM integrated into Julia's compiler, it's easiest to just change the LLVM "target" from Native to AMDGPU (or NVPTX in CUDA.jl's case) to get native machine code, while preserving Julia's semantics for the most part.

Also, interfacing to ROCR (AMD's implementation of the Heterogeneous System Architecture or HSA runtime) was really easy when I first started on this, and codegen through Julia's compiler and LLVM is trivial when you have CUDAnative.jl (CUDA.jl's predecessor) to look at :)

I should also mention that not everything that CUDA does maps well to AMD GPU; CUDA's streams are generally in-order (blocking), whereas AMD's queues are non-blocking unless barriers are scheduled. Also, things like hostcall (calling a CPU function from the GPU) doesn't have an obvious alternative with CUDA.

soganess · on Oct 21, 2020

Thank you for taking the time! I found this quite helpful.

MayeulC · on Oct 22, 2020

Something that is hinted at, but not spelled out loud in our posts is that AMD actively upstreams and maintains a LLVM back-end for their GPUs, so it really is a matter of switching the binary target for the generated code, at least in theory :)