Having written a 2-d curve renderer, what I want is parallel compute and high ba...

Having written a 2-d curve renderer, what I want is parallel compute and high bandwidth; gpus deliver. (Especially with newer interfaces that support scatter-write; not sure how much penetration these have in the browser yet.) It's true this is not what you want at a higher level, but it serves as a fine base to implement higher-level abstractions. You could support them in hardware, but it's not at all obvious what the advantages would be; no one complains that cpus don't have architectural support for for-loops.

Edit: upon a reread, I don't really understand what your problem is with gpus. You can ignore the vertex processing pipeline entirely, drawing just a single fullscreen quad (or use a compute shader); the gpu will handle this with aplomb, and this is the sort of thing the linked article is talking about too.