AMD finally opens up its Radeon raytracing analyzer “RRA” source code

jacooper · on Nov 20, 2022

I hope they improve ROCm, and add support for normal GPUs, instead of only 6800xt+. My 6700xt can't do AI, but a 3050 can, welcome to AMD drivers in the professional world.

dotnet00 · on Nov 20, 2022

They sort of screwed themselves on consumer level hardware support with how they designed ROCm. Unlike CUDA, which compiles to an intermediate representation that can be compiled for the target hardware by the driver (thus allowing everything which meets the minimum feature level to run it), ROCm includes device code for each GPU, which means binary sizes would explode if they supported too many generations and every new generation needs recompiled binaries.

That's also why they struggle to support new consumer hardware, why they led people on about ROCm on 5000 series until 6000 series was around the corner (this move killed my interest in taking ROCm seriously for the near future) and why they dropped support for the rx580 when it was the only consumer GPU that was still available with ROCm.

They're going to have to fundamentally redesign ROCm's build process to come anywhere near CUDA's level of support.

schmorptron · on Nov 20, 2022

Wow, that's interesting. Do you think this is because they had to design this kind of system much faster than nvidia in order to release something to compete with cuda? Or are there other reasons for this sort of design decision?

dotnet00 · on Nov 20, 2022

Yeah, I think the decision was made that way because they wanted to try to catch up to CUDA but probably didn't really have good tie in with the driver team to put ROCm there. IIRC for a while you used to also have to install a custom driver to use ROCm, although at least that hasn't been necessary for some time.

With ROCm on 5000 series they promised status updates several times which never came and the eventual unofficial support came after 6000 series was out. Then with the Rx580 support, they claimed that while it was unsupported it should still work and several of their developers claimed to be looking into the matter. I recall other similar incidents regarding their other smaller projects under GPUOpen.

So overall it always seemed like they weren't really communicating properly internally, thus all of their projects seem somewhat disconnected from each other, leading to odd decisions like this one.

dagmx · on Nov 20, 2022

This is one of the reasons I don’t bet on AMD for GPU outside of gaming. All the other GPU vendors (NVIDIA , Intel, Apple, Qualcomm etc ) are investing strategically on making sure popular software is hardware accelerated by their products. NVIDIA is clearly in the lead here, due to the excellent choices for CUDA, but AMD is the only vendor who seems to not be pushing forward here because their HIP and ROCm strategy seems to be flawed.

ColonelPhantom · on Nov 21, 2022

On the other hand, I don't think any other vendors aim for compatibility with CUDA. Intel is laser focused on oneAPI which is just SYCL iirc. Sure, SYCL is cool and all, but you cannot trivially translate most CUDA programs to SYCL like you can with HIP.

dagmx · on Nov 20, 2022

It’s really insane to me that they target directly to hardware. Especially as a GPU vendor, they’re among the most aware of how much variances there are between different GPU designs.

Pretty much every other GPU targeted language either does a runtime compilation from source or IR.

This has been a known problem+solution for ages and their approach to ROCm is flummoxing.

dogma1138 · on Nov 21, 2022

It’s worse because there is no IR in the driver there is zero guarantee for forward compatibility and you need create binaries for every hardware for backwards compatibility.

So not only does it mean that you have to choose which hardware you want to support at any point in time but you have to maintain your codebase and release new binaries every time AMD releases a new GPU.

And it gets even more complicated because even intra-generation compatibility isn’t granted since differ GPUs from the same generation can have slight variances in them that essentially requires you to target them specifically.

On the other hand CUDA binaries that date back to the days of Tesla and Fermi can still run on current hardware with no issues.

The architecture behind ROCm does not make any sense outside of custom implementations for supercomputers and bespoke hyperscaler size deployments.

JonChesterfield · on Nov 20, 2022

The IR per architecture is annoying. Also shipping llvm IR has hazards wrt compatibility with different llvm versions. It's solvable, probably with performance overhead.

slavik81 · on Nov 20, 2022

None of the consumer Navi 2x cards are 'officially' supported. Nevertheless, you can use the ROCm libraries anyway by setting:

    export HSA_OVERRIDE_GFX_VERSION=10.3.0

That will make your gfx1031 card pretend to be gfx1030, which is a supported architecture. Those processors were given different numbers in case an incompatibility was found, but I haven't heard of any thus far.

Obviously, that's not as good as official support, but I hope it helps.

Firther · on Nov 20, 2022

They're not 'officially' supported like you say, but Navi21 cards (6800-6950xt) have undergone the same QA validation as the officially supported pro cards.

https://github.com/RadeonOpenCompute/ROCm/issues/1714#issuec...

kkielhofner · on Nov 20, 2022

I’m rooting for AMD but this fundamental issue alone makes the ecosystem frustrating.

CUDA will run at least run on anything that says “Nvidia” on it from the past five years or longer.

throwaway62831 · on Nov 20, 2022

Not many people know that Cuda stands for Compute Unified Device Architecture.

The unified being the key idea here. All NVIDIA GPUs support Cuda since the G80/G84/G86 generation which arrived at the end of 2006, beginning of 2007.

It’s true of course that the older GPUs don’t support newer versions of CUDA, but the idea that CUDA is unified has been central to the project since the beginning. It also has cost a lot of money and effort for NVIDIA to put CUDA support in every GPU, even when it wasn’t extensively used. Took about 10 years of investment before it really started to pay off.

kkielhofner · on Nov 20, 2022

All good points and yeah, my point (that I could have been clearer on) is even current versions of CUDA.

schmorptron · on Nov 20, 2022

Trying to figure out the ROCM hardware support page for a solid 10 minutes and then finding out my RX 5700 which would be pretty capable hardware-wise isn't supported was super frustrating. According to some GitHub Thread GFX10 and 20 should have been supported by the end of 2021 but official support as in being listed in the document never came?

I get that nvidia has a lot more resources and I'm trying not to support their closed ecosystem but AMD's non-support isn't exactly making it easy. Has anyone here had any experience with intel's new arc line?

slavik81 · on Nov 20, 2022

The RX 5700 is gfx1010. You can find some of the mappings on https://llvm.org/docs/AMDGPUUsage.html

The driver and compiler work, but the math libraries were never updated to add gfx1010, aside from rocBLAS and rocSOLVER. The official binaries don't contain machine code for your architecture, aside from those two.

I would suggest building ROCm with Spack if you are using a gfx101x processor. I've been working to make sure that all of ROCm can be built for different targets. e.g.

    spack install --verbose --test root rocblas amdgpu_target=gfx1010

That will build rocBLAS and run a subset of the test suite. The RX 5700 hardware is not tested by ROCm QA, so running the test suite is usually a good idea.

I have an RX 5700 XT available, which is also gfx1010, so if you encounter any problems and need some guidance, feel free to contact me. My email is in my profile.

slavik81 · on Nov 22, 2022

I forgot that OpenMP is broken with llvm-amdgpu in Spack at the moment. I hope it will be fixed soon, because OpenMP is used in some of the tests. In the meantime, you may have to remove `--test root` from that install command.

schmorptron · on Nov 24, 2022

Thank you so much, I'll check this out!

bayindirh · on Nov 20, 2022

I think having loads of GPUs in the most powerful supercomputers of our era will bring in the required funding for improving it.

For the last two months, huge amount of ROCm packages have been landing on Debian. I guess we'll see some improvements.

Tuna-Fish · on Nov 20, 2022

It's not a defect caused by insufficient work, but an intentional market segmentation for price discrimination.

It's imho really stupid for AMD to do this given their situation in the market. At least it's fairly easy to bypass.

q-big · on Nov 20, 2022

> I hope they improve ROCm, and add support for normal GPUs, instead of only 6800xt+.

Also, AMD should finally support ROCm under Windows. Currently, the only application known by me that uses ROCm under Windows is Blender, and they use a beta version of ROCm from AMD with Windows support for building the respective Blender releases that is not available publicly.

spookie · on Nov 20, 2022

Their next GPUs will have ROCm official support btw (Navi 31 and 33).

I understand your feelings though...

jacooper · on Nov 20, 2022

According to other comments, they said the same thing about the 5000 series, and they did nothing.

AMD drivers on Linux are good, unless you want to do anything outside gaming.

JonChesterfield · on Nov 20, 2022

YMMV outside of the blessed list. I did a bunch of testing with a 5700XT a while ago and it worked about as well as a card on the blessed list. If you've already tried it, how does it fail?

ciupicri · on Nov 20, 2022

By the way, so if I have a 7900X AMD CPU I can't use ROCm with its integrated graphics?

JonChesterfield · on Nov 20, 2022

It's a rdna thing so decent chance it'll run. I'll probably know sometime this week.

throwaway62831 · on Nov 20, 2022

Amazing how much credit AMD gets for “open sourcing” things when so much of the functionality is in binary blobs.

Like this entire directory: https://github.com/GPUOpen-Tools/radeon_raytracing_analyzer/...

kllrnohj · on Nov 20, 2022

Vulkan requires you to ship shaders in spir-v format, it doesn't have a source-level shader language. The generous interpretation is AMD just forgot to ship the source for those spir-v blobs since it would be a different toolchain.

Looking at the size of the blobs, though, I'm not entirely sure why you're claiming "so much" of the functionality is in those blobs. Most of them you could probably pretty trivially disassmeble & understand, especially given all the inputs & outputs are not obfuscated. And the actual function code of many of them look to be relatively small.

PedroBatista · on Nov 20, 2022

When NVIDIA is the benchmark, everybody is a saint on Earth.

JBits · on Nov 20, 2022

Even if the functionality is in binary blobs, can open source drivers still take advantage of this?

q-big · on Nov 20, 2022

kllrnohj already wrote that Vulkan has no predefined source code language for shaders. I can imagine that the SPIR-V code of these shaders could have been hand-optimized by some team at AMD, so a textual reprsentation of this binary code is the version that the engineers at AMD work on.

Also: the license of the repository is MIT license, so you are free to reverse-engineer these shaders and port them to a high-level language of your choice.

est · on Nov 20, 2022

the source code

https://github.com/GPUOpen-Tools/radeon_raytracing_analyzer

sylware · on Nov 20, 2022

What? It has been in mesa for many months (I run a custom elf/linux distro for AMD hardware)? BTW, it is kind of big, complex and it is not cleanly compilable-out, some patches are still needed for that.

raisin_churn · on Nov 20, 2022

This is not the driver. As stated in the first sentence of the link, what is being discussed is "part of their developer software suite for helping to profile ray-tracing performance/issues on Windows and Linux with both Direct3D 12 and the Vulkan API."

sylware · on Nov 20, 2022

allright, this is the application using the raytracing instrumentation code.

Well, this raytracing instrumentation code is not cleanely compilable out from mesa code, it is still requiring some patches.