> GPU-based OS, anyone? This makes me... uneasy. Incredible work, though. I had ...

chipsy · on Nov 21, 2015

We're basically approaching the part of the Myer and Sutherland wheel of reincarnation[0] where the display processor is as powerful as the main processor and therefore it should all be folded back together. Of course, this turn was anticipated by AMD well in advance, and the GCN architecture, APU hardware, etc., all plays into it.

At the end of the day, the optimization envelope is always shifting around, and industrial computing architectures chase that incrementally, so they'll always be looping around the wheel as today's "narrow fast path" gradually becomes tomorrow's "general purpose".

[0] http://cva.stanford.edu/classes/cs99s/papers/myer-sutherland...

Arelius · on Nov 21, 2015

You thought this because it was basically true 10ish years ago. But they slowly gained functionality until we got to where we are now.

chubot · on Nov 21, 2015

I just started learning a little CUDA, but otherwise I know little about GPUs.

Is it surprising that they are Turing complete or surprising that they are using a von Neumann architecture?

You seem to be referring to Turing completeness when talking about branches. von Neumann architecture means you can execute data as code, which seems to be more what the presentation is about (?)

Given that CUDA exists, I don't see why it is really surprising that you can do advanced things with OpenGL shaders, given that they are running on the same hardware. CUDA is definitely turing complete and I think OpenCL is the same.

RyanZAG · on Nov 21, 2015

They can read and write memory (to access textures), do math (to calculate output colors), and can jump (necessary for bounds checks, etc). That's pretty much the definition of a Von Neumann program:

  program variables ↔ computer storage cells
  control statements ↔ computer test-and-jump instructions
  assignment statements ↔ fetching, storing instructions
  expressions ↔ memory reference and arithmetic instructions.

Maps pretty much exactly to what you need.

greggman · on Nov 21, 2015

This is all true but ... AFAIK gpus are optimized for graphics and therefore suck at general purpose programs. Sure it would be fun to get some general code to run on them just as it's fun to make a gameboy or toaster run Linux.

There's also the issue the gpus are not premptable which kind of makes preempitve multi-tasking hard

bjwbell · on Nov 21, 2015

Modern gpus (last 5 years) are optimized for GPGPU in addition to graphics. They also have preemptive multitasking on either a per batch or per workgroup basis.

Intel's latest gpu architecture has an embedded OS running on the gpu for scheduling command batches, I'm not sure what AMD and Nvidia do.

I still wouldn't write a general purpose OS for it.

greggman · on Nov 26, 2015

I think we have different definitions of "preemptive multitasking". There is no GPU I know of that can be preempted once given a command to draw. Once it starts if that drawing command takes 30 seconds there's no preempting it. This is why Windows has a timeout that resets the GPU if it doesn't respond. (I believe other OSes have added that feature but I'm not 100% sure). Anyway, I've yet to use a GPU or an OS that supports preempting the GPU. I'd be happy to be proven wrong. I can also give you samples to test. It doesn't require fancy shaders. All it requires is lots of large polygons in one draw call.

bjwbell · on Nov 28, 2015

That's what I know too for graphics draw calls. For GPGPU there's been hard work for finer grained preemption, last I looked into it (~1yr ago) on Linux it was a work in progress to put it kindly.

If you're curious, lookup the Intel Broadwell GPU specs, there's sections devoted to the various levels of preemption. If you're really curious look up the workarounds needed for the finest grained preemption (this would be preempting a single GPGPU draw call).

Then decide enabling fine grained preemption should probably wait for Skylake, unless you took too much Adderall and no challenge sounds impossible. Do I speak from personal experience? I plead the fifth.

I've no experience with how fine grained nvidia's preemption is.

pandaman · on Nov 21, 2015

>Intel's latest gpu architecture has an embedded OS running on the gpu for scheduling command batches, I'm not sure what AMD and Nvidia do.

Same on AMD and NVidia, except it's been like this for the past 10-15 years (depending if you count at the bottom or at the top of the hardware release pipeline).

AlphaSite · on Nov 21, 2015

Are you talking about Phi or their actual GPUs?

CyberDildonics · on Nov 21, 2015

The Phi has it's own OS running on it.

It will also be sold in stand alone chips soon

nhaehnle · on Nov 21, 2015

To be more precise, GPUs are optimized for embarrassingly parallel workloads. AMD's GCN, for example, has scalar and vector instructions, where the vector instructions are 64 items wide. For graphics, this is used for running a shader on up to 64 items (vertices or pixels) simultaneously. Furthermore, the individual compute units are ridiculously hyper-threaded.

The advantage is that most of the silicon can go towards the actual computation rather than stuff like branch-prediction and out-of-order execution. The disadvantage is that branching and looping is problematic: when only one item wants to go down the other branch of an if-else-statement, the GPU has to run through both branches for all items (and execution is masked off on a per-item basis).

This works extremely well for graphics and high-dimensional numerics workloads (linear algebra, finite elements). It doesn't work at all for, say, spell-checking.