The quantity of people on this site now that think they understand modern GPUs because back in the day they wrote some opengl...
1. Both AMD and NVIDIA have "tensorcore" ISA instructions (ie real silicon/data-path, not emulation) which have zero use case in graphics
2. Ain't no one playing video games on MI300/H100 etc and the ISA/architecture reflects that
> but for graphics they are very solid.
Hmmm I wonder if AMD's overfit-to-graphics architectural design choices are a source of friction as they now transition to serving the ML compute market... Hmmm I wonder if they're actively undoing some of these choices...
AMD isn't overfit to graphics. AMD's GPUs were friendly to general purpose compute well before Nvidia was. Hardware-wise anyway. AMD's memory access system and resource binding model was well ahead of Nvidia for a long time. When Nvidia was stuffing resource descriptors into special palettes with addressing limits, AMD was fully bindless under the hood. Everything was just one big address space, descriptors and data.
Nvidia 15 years ago was overfit to graphics. Nvidia just made smarter choices, sold more hardware and re-invested their winnings into software and improving their hardware. Now they're just as good at GPGPU with a stronger software stack.
AMD has struggled to be anything other than a follower in the market and has suffered quite a lot as a result. Even in graphics. Mesh shaders in DX12 was the result of NVIDIA dictating a new execution model that was very favorable to their new hardware while AMD had already had a similar (but not perfectly compatible) system since the Vega called primitive shaders.
This feels backwards to me when GPUs were created largely because graphics needed lots of parallel floating point operations, a big chunk of which are matrix multiplications.
When I think of matrix multiplication in graphics I primarily think of transforms between spaces: moving vertices from object space to camera space, transforming from camera space to screen space, ... This is a big part of the math done in regular rendering and needs to be done for every visible vertex in the scene - typically in the millions in modern games.
I suppose the difference here is that DLSS is a case where you primarily do large numbers of consecutive matrix multiplications with little other logic, since it's more ANN code than graphics code.
1. Both AMD and NVIDIA have "tensorcore" ISA instructions (ie real silicon/data-path, not emulation) which have zero use case in graphics
2. Ain't no one playing video games on MI300/H100 etc and the ISA/architecture reflects that
> but for graphics they are very solid.
Hmmm I wonder if AMD's overfit-to-graphics architectural design choices are a source of friction as they now transition to serving the ML compute market... Hmmm I wonder if they're actively undoing some of these choices...