Similarly there is this: https://github.com/ToNi3141/Rasterix Would be neat if s...

danbruc · 2024-03-27T13:31:06 1711546266

If you are going to that effort, you might also want a decent resolution. Say we aim for one megapixel (720p) and 30 frames per second, then we have to calculate 27.7 megapixel per second. If you get your FPGA to run at 500 MHz, that gives you 18 clock cycles per pixel. So you would probably want something like 100 cores keeping in mind that we also have to run vertex shaders. We also need quick access to a sizable amount of memory and I am not sure if one can get away with integer respectively fixed point arithmetics or whether floating point arithemtics is pretty much necessary. Another complication that I would expect is that it is probably much easier to build a long execution pipeline if you are implementing a fixed function pipeline as compared to a programmable processor. Things like out-of-order execution are probably best off-loaded to the compiler in order to keep the design simpler and more compact.

So my guess is that it would be quite challenging to implement a modern GPU in an affordable FPGA if you want more than a proof of concept.

PfhorSlayer · 2024-03-27T17:45:53 1711561553

You've nailed the problem directly on the head. For hitting 60Hz in FuryGpu, I actually render at 640x360 and then pixel-double (well, pixel->quad) the output to the full 720p. Even with my GPU cores running at 400MHz and the texture units at 480MHz with fully fixed-function pipelines, it can still struggle to keep up at times.

I do not doubt that a shader core could be built, but I have reservations about the ability to run it fast enough or have as many of them as would be needed to get similar performance out of them. FuryGpu does its front-end (everything up through primitive assembly) in full fp32. Because that's just a simple fixed modelview-projection matrix transform it can be done relatively quickly, but having every single vertex/pixel able to run full fp32 shader instructions requires the ability to cover instruction latency with additional data sets - it gets complicated, fast!

d_tr · 2024-03-27T16:44:50 1711557890

There's a new board by Trenz with a Versal chip which can do 440 GFLOPS just with the DSP58 slices (the lowest speed grade) and it costs under 1000 Euros, but you also need to buy a Vivado license currently.

Cheaper boards are definitely possible since there are smaller parts in that family, but they need to offer support for some of them in the free version of Vivado...

actionfromafar · 2024-03-27T09:32:13 1711531933

How good would a Ryzen with 32 cores be if it did just graphics?

__alexs · 2024-03-27T11:25:17 1711538717

15 fps on an oldish Epyc 64 core https://www.youtube.com/watch?v=2tn0bZcQf0E

tux3 · 2024-03-27T09:34:43 1711532083

You can run Crysis in software rendering on a high core count AMD CPU.

It's terrible use of the hardware and the performance is far from stellar, but you can!

immibis · 2024-03-27T10:55:27 1711536927

Wasn't Intel Larrabee something like that? Get a bunch of dumb x86 cores together and tell them to do graphics?

actionfromafar · 2024-03-27T11:15:05 1711538105

I'm so sad Larrabee or similar things never took off. No, it might not have benchmarked well against contemporary graphics cards, but I think these matrixes of x86 cores could have come to great use for cool things not necessarily related to graphics.

fancyfredbot · 2024-03-27T11:29:01 1711538941

Intel launched Larabee as Xeon Phi for non-graphics purposes. Turns out it wasn't especially good at those either. You can still pick one up on eBay today for not very much.

Y_Y · 2024-03-27T12:26:02 1711542362

The novelty of sshing into a PCI card is nice though. I remember trying to use them at a hpc cluster, all the convenience of wrangling GPUs but at a fraction of the performance

bee_rider · 2024-03-27T14:48:20 1711550900

Probably not aided by the fact that conventional Xeon core counts were sneaking up on them—not quite caught up, but anybody could see the trajectory—and offered a much more familiar environment.

actionfromafar · 2024-03-27T19:48:19 1711568899

Yes, I agree. Still unfortunate. I think the concept was very promising. But Intel had no appetite for burning money on it to see where it would go in the long run.

actionfromafar · 2024-03-27T12:27:56 1711542476

That's where we have to agree to (potentially) disagree. I lament that these or similar designs didn't last longer in the market, so people could learn how to harness them.

Imagine for instance hard real time tasks, each one task running on its own separate core.

rjsw · 2024-03-27T13:18:14 1711545494

I think Intel should have made more effort to get cheap Larabee dev boards onto the market, they could have been using chips that didn't run at full speed or with too many broken cores to sell at full price.

fancyfredbot · 2024-03-28T08:12:04 1711613524

I think Intel have similar designs? The Xeon Phi had 60 cores, and their high core count CPUs have 56. The GPU max 1550 has 128 low power xe cores.

erik · 2024-03-27T20:36:19 1711571779

Larrabee was mostly x86 cores, but it did have sampling/texturing hardware because it's way more efficient to do those particular things in the 3d pipeline with dedicated hardware.