This is essentially how Pathfinder works, with 16x16 tiles instead of 32x32. It also has a pure-GPU mode that does tile setup on GPU instead of CPU, which is a nice performance boost, though a lot more complex. Note that if you're doing the setup on CPU, the work can be parallelized across cores and I highly recommend this for large scenes. The main difference, which isn't large, is that Pathfinder draws the tiles directly instead of drawing a large canvas-size quad and doing the lookup in the fragment shader.
When I originally checked, Slug works in a similar way but doesn't do tiling, so it has to process more edges per scanline than Pathfinder or piet-gpu. Slug has gone through a lot of versions since then though and I wouldn't be surprised if they added tiling later.
Would you recommend Pathfinder for real world use? Of course, I know that you're no longer working on it but would like to know if there any significant bugs/drawbacks in using it. For context, I'm coding a simple vector graphics app that needs to resize and render quite complex 2d polycurves in real time. So far, the only thing I found working was Skia which is good but not fast enough to do the stuff I need in real time (at least on low end devices).
Yes, I know that the project hasn't got a bright future unfortunately. What I was more interested in was its current stability and correctness. I wouldn't mind using it if it was relatively stable and free of bugs.
Tiling doesn't work too well under domain transforms--3d environments, dynamic zoom, etc. That's why I am betting on high-quality space partitioning. Slug space partitioning is not amazing; I believe it still processes O(n) curves per fragment in a horizontal band.
You are literally talking to the Pathfinder author.
I just implemented basically the same tiling mechanism, it works OK for me, I can already do about 1000 line segments, covering ~10000 tiles (multiply counting tiles that are touched by different polygons) in about 1ms on CPU and GPU each. Meaning the work is already quite well divided for my test cases. This is end to end without any caching. For static scenes doing world space partitioning is an idea that I considered, but for now I'm trying to optimize more in order to see what the limits of this basic approach are.
> You are literally talking to the Pathfinder author.
No worries, I think they know :)
And yeah, I considered doing fancier partitioning, but really toward the end (before work stopped due to layoffs) I was pushing on shipping, which meant deprioritizing fancy partitioning in favor of implementing all the bells and whistles of SVG. I certainly believe that you could get better results with more sophisticated acceleration structures, especially in cases like texturing 3D meshes with vector textures.
A post about vector graphics and the word "stroke" appears zero times ...
> Much better approach for vector graphics is analytic anti-aliasing. And it turns out, it is not just almost as fast as a rasterizer with no anti-aliasing, but also has much better quality than what can be practically achieved with supersampling.
> Go over all segments in shape and compute area and cover values continuously adding them to alpha value.
This is approach is called "coverage to alpha". The author will be surprised to learn about the problem of coverage to alpha conflation artifacts. E.g. if you draw two shapes of exactly the same geometry over each other, but with different colors. The correct result includes only the color of the last shape, but with "coverage to alpha" you will get a bit of color bleeding around the edges (the conflation artifacts). I think Guzba also gives other examples in this thread.
Also, they did not mention the other hard problems like stroke offsetting, nested clipping and group opacity etc.
This seems like a silly way to do vector graphics in a shader.
What I've done in the past is representing the shape as a https://en.wikipedia.org/wiki/Signed_distance_function to allow each pixel to figure out if it is inside or outside the shape. This avoids the need to figure out the winding.
Anti-aliasing is implemented as a linear interpolation for values near zero. This also allows you to control the "thickness" of the shape boundary. The edge become mores blurry if you increase the lerp length.
Signed distance fields only work well for relatively simple graphics.
If you have highly detailed characters like Chinese or emojis, you need larger resolution to faithfully represent every detail. The problem is that SDFs are sampled uniformly over the pixel grid. If the character is locally complex, a high resolution is required to display it, but if the character has simple flat regions, memory is wasted. One way to get around excessive memory requirements is to store the characters in their default vector forms and only render the subset of required characters on demand, but then you might as well render them at the required pixel resolution and do away with the additional complexity of SDF rendering.
SDF is cool but not a generally good solution to GPU vector graphics. It only works for moderate scaling up before looking bad, the CPU prep figuring out the data the GPU needs takes far longer than just rasterizing on CPU would, etc. It's great as a model for games where there are many renders as world position changes but that's about it.
The decomposition method used by msdfgen works well enough for fonts, but I'm not sure if it would work for more complex vector drawings. Probably would need much more than 3 color channels (= decompositions) to render something like the tiger SVG...
Seems like you're thinking of precomputed SDFs that are stored in a texture. They don't have to be :) (for example, my little library: https://github.com/audulus/vger)
If you are talking about analytic SDFs, then you need a good "decomposition" method that can divide an arbitrary SVG into simple SDF-friendly primitives. Mabye it could be done, I think it can be a good research topic.
I don't know the details of using SDF (especially MSDF!) for doing vector graphics, but my understanding is that essentially it's a precomputation that involves _already_ a rasterization.
I would like to know why you think the described approach is silly? It doesn't involve a final rasterization but merely a prefiltering of segments.
What about SDFs of cubic bezier curves and rational bezier curves? Because these appear in vector graphics and I think there is no analytic solution for them (yet?).
SDF of a cubic bezier involves solving a quintic, so it's not analytic. There are approximations, of course, but for an outline, using an sdf is just silly. (For a stroke, you don't really have much choice--though it's common to approximate strokes using outlines.) I'll add that sdf aa is not as good as analytic aa.
IIRC Loop-Blinn gives you a pretty good distance approximation using the 2D generalization of Newton's method to avoid having to solve the quintic. (Though I've never actually seen anybody implement Loop-Blinn for cubics because the edge cases around cusps/loops are annoying. Every Loop-Blinn implementation I've actually seen just approximates cubics with quadratics.)
(Fun fact: Firefox uses the same 2D Newton's method trick that Loop-Blinn does for antialiasing of elliptical border radii in WebRender—I implemented it a few years back.) :)
> Though I've never actually seen anybody implement Loop-Blinn for cubics
I implemented them (implicit formulation of rational cubic bezier curves) in my renderer (see one of my other replies in this thread). Here is an extract of the relevant code in a shader toy: https://www.shadertoy.com/view/WlcBRn
Even in Jim Blinn's book "Notation, Notation, Notation" he leaves some of the crucial equations as an exercise to the reader. I remember spending 2 or 3 weeks reading and trying to understand everything he wrote to derive these equations he hinted at myself.
Out of curiosity, where have you seen loop-blinn implemented? I was under the impression that it was generally a no-go due to the patent (which, incidentally, expires in 2024).
Of course you can implement smooth circles directly in a shader the way you describe, but note that there are Vector outline shapes that are not circles...
Check out the outlines you can do using SVG - paths composed of straight line segments as well as cubic bezier curves and various arcs. Also color gradients, stroking...
I did vector graphics using SDFs in this library (https://github.com/audulus/vger). Works pretty well for my uses which are rendering dynamic UIs, not rendering SVGs. But I can still do some pretty gnarly path fills!
Here's the approach for rendering path fills. From the readme:
> The bezier path fill case is somewhat original. To avoid having to solve quadratic equations (which has numerical issues), the fragment function uses a sort-of reverse Loop-Blinn. To determine if a point is inside or outside, vger tests against the lines formed between the endpoints of each bezier curve, flipping inside/outside for each intersection with a +x ray from the point. Then vger tests the point against the area between the bezier segment and the line, flipping inside/outside again if inside. This avoids the pre-computation of Loop-Blinn, and the AA issues of Kokojima.
It works pretty well, and doesn't require as much preprocessing as the code in the article. Also doesn't require any GPU compute (though I do use GPU compute for some things). I think ultimately the approach in the article (essentially Piet-metal, aka tessellating and binning into tiles) will deliver better performance, and support more primitives, but at greater implementation complexity. I've tried the Piet-metal approach myself and it's tricky! I like the simpler Shadertoy/SDF inspired approach :)
Using SDFs. It looks fairly good! Maybe not as good as the proper coverage methods, but when rendering UIs on a high resolution screen, it seems good enough.
I don't want to be the guy that doesn't read the entire article, but the first sentence surprised me quite a bit:
> Despite vector graphics being used in every computer with a screen connected to it, rendering of vector shapes and text is still mostly a task for the CPU.
Do modern vector libraries really not use the GPU? One of the very first things I did when learning Vulkan was to use a fragment shader to draw a circle inside a square polygon. I always assumed that we've been using the GPU for pretty much any sort of vector rasterization, whether it was bezier curves or font rendering.
SVG paths can be arbitrarily complex. This article really doesn't discuss any of the actual hard cases. For example, imagine the character S rotated 1 degree and added to the path on top of itself in a full rotation. This is one path composed of 360 shapes. These begin and end fill sections (winding order changes) coincide in the same pixels at arbitrary angles (and the order of the hits is not automatically sorted!) but the final color cannot be arrived at correctly if you do not process all of the shapes at the same time. If you do them one at a time, you'll blend tiny (perhaps rounded to zero) bits of color and end up with a mess that looks nothing like what it should. These are often called conflation artifacts IIRC.
There's way more to this than drawing circles and rectangles, and these hard cases are why much of path / vector graphics filling still ends up being better on CPU where you can accumulate, sort, etc which takes a lot of the work away. CPU does basically per-Y whereas this is GPU per-pixel so perhaps they're almost equal if the GPU has the square of a CPU power. Obv this isn't quite right but gives you an idea.
Video discussing path filling on CPU (super sampling and trapezoid): https://youtu.be/Did21OYIrGI?t=318 We don't talk about the complex cases but this at least may help explain the simple stuff on CPU for those curious.
I want to say yes, but it depends on what you're actually doing as a final goal.
Detecting when a path is antagonistic to most GPU approaches takes time, as does preparing the data however it needs to be prepared on the CPU before being uploaded to the GPU. If you can just fill the whole thing on CPU in that time, you wasted your time even thinking about the GPU.
If you can identify a simple case quickly, it's probably totally a good idea to get the path done on the GPU unless you need to bring the pixels back to the CPU, maybe for writing to disk. The upload and then download can be way slower than just, again, filling on CPU.
If you're filling on GPU and then using on GPU (maybe as a web renderer or something), GPU is probably a big win. Except, this may not actually matter. If there is no need to re-render the path after the first time, it would be dumb to keep re-rendering on the GPU each frame / screen paint. Instead you'd want to put it into a texture. Well.... if you're only rendering once and putting into a texture, this whole conversation is maybe pointless? Then what is simple is probably the best idea. Anyway lots to 2d graphics that goes underappreciated!
No, mostly it is not practical to offload the edge cases.
The reason for this is that the single 2D application that people most want to speed up is font rendering. And font rendering is also the place where the edge cases are really common.
Rendering everything else (geometric shapes) is trivial by comparison.
Why is that? Glyphs can be cached in a texture. That's what nanovg does and it works quite well. That's what my little library does too (https://github.com/audulus/vger)
Doesn't work in the face of real-time dynamic transforms; 3-d, smooth zoom, etc. Atlases are also a bit heavy, so now you need a cache replacement policy, and you have annoying worst-case performance...
Only if you give up doing any script-based languages even remotely properly. And, it really doesn't even work in character-based languages with heavy kerning.
Text rendering is really complicated. There is a reason why we have so few text shaping engines.
Skia mostly uses the CPU -- it can draw some very basic stuff on the GPU, but text and curves are a CPU fallback. Quartz 2D is full CPU. cairo never got an acceptable GPU path. Direct2D is the tessellate-to-triangle approach. If you name a random vector graphics library, chances are 99% of the time it will be using the CPU.
Skia has code paths for everything: CPU path drawing, CPU tessellation followed by GPU rasterization with special paths for convex vs. concave paths, NV_path_rendering, Spinel/Skia Compute... It's actually hard to figure out what it's doing because it depends so much on the particular configuration.
3D vector graphics are not as full featured as 2d vector graphics.
2d vector graphics include things like "bones" and "tweening", which are CPU algorithms. (Much like how bone processing in 3d world is also CPU-side processing).
---------
Consider the creation of a Beizer curve, in 2d or 3d. Do you expect this to be a CPU algorithm, or GPU algorithm? Answer: clearly a CPU algorithm.
GPU algorithms generally are triangle-only, or close to it (ex: quads) as far as geometry. Sure, there are geometry shaders, but I don't think its common practice to take a Beizer Curve definition and write a Tesselator-shader for it and output (in parallel) a set of verticies. (And if someone is doing that, I'm interested in heading / learning more about it. It seems like a parallelizable algorithm to me but the devil is always in the details...).
GPUs have evolved away from being strictly triangle rasterizers. There are compute shaders that can do general purpose computing. The approach described here could in theory be set up by "drawing" a single quad - the whole screen, and it doesn't even need compute shaders but can be implemented using conventional vertex/fragment shaders with global buffer access (in OpenGL, UBOs or SSBOs).
There is a well-known paper that describes an approach how to draw bezier curves by "drawing" a single triangle. Checkout Loop-Blinn from 2005.
Your project sounds very impressive. I would like to try it out, unfortunately I'm unlikely to be able to get it to run by building Rust. If I understand correctly it should be able to run it on the Web, you have a demo somewhere? Or a video?
You will need to try different nightly browsers (I think Chrome works ATM), because the WebGPU API changes and breaks all the time. Also don't forget to enable WebGPU, you can check that here: https://wgpu.rs/examples-gpu/
The WASM port is highly experimental:
It currently does not use interval handlers. So for animations to run you need to constantly move the mouse to provide frame triggering events. In WebGPU MSAA is limited to 4 samples ATM, so anti aliasing will look kind of bad in browsers. And the keyboard mapping is not configured, so typing in text fields produces gibberish.
That's a bummer, I tried with Chrome and Firefox but no luck. Can't be arsed to try different versions or obscure settings right now.
2 years ago I had a similar experience with WASM/WebGL, I tried to make use of emscripten in a sane way but it was painful to get things like event handling, file I/O and quality graphics to work. Results weren't great. When using specific libraries and coding the app in the right way from the start, porting GPU applications to the Web is allegedly easier.
If you could provide a fool-proof description how to build and set the project up in a few minutes, I would very much be willing to try your project out, it still sounds great. Or provide a few screenshots/videos just to get the idea across how it looks.
Seeing that this is from Microsoft research, and that Microsoft's font renderer has always looked nicer (and is known to be GPU-rendered to boot) makes a lot of sense.
Still, my point stands that this is relatively uncommon even in the realm of 3d programmers. Unity / Unreal engine doesn't seem to do GPU-side Beizer curve processing, even if the algorithm was researched by Microsoft from 2005.
What font renderer do you mean? I don't know about the internals of Microsoft renderers but vector graphics and font rasterization generally are distinct disciplines. This has started to change with higher-resolution displays, but font rasterization traditionally has been a black art involving things like grid snapping, stem darkening etc. Probably (but can't back this up) the most used font rasterization technologies are still Microsoft ClearType (are there implementations that use GPU??) and Freetype (strictly a CPU rasterizer). Don't know about MacOS, but I heard they don't do any of the advanced stuff and have less sharp fonts on low-dpi displays.
I would also like to know where Loop-Blinn is used in practice? I once did an implementation of quadratic Beziers using it, but I'm not up to doing the cubic version, it's very complex.
Its a blackbox. But Microsoft is very clear that its "hardware accelerated", whatever that means (IE: I think it means they got GPU-shaders handling a lot of details).
GDI / etc. etc. are legacy. You were supposed to start migrating towards Direct2D and DirectWrite decades ago. Cleartype itself moved to DirectWrite (though it still has GDI renderer for legacy purposes).
I'm not really experienced when it comes to GPU programming, so forgive me if I'm wrong with this, but some of the things you say don't make a lot of sense to me:
> 2d vector graphics include things like "bones" and "tweening", which are CPU algorithms. (Much like how bone processing in 3d world is also CPU-side processing).
Changing the position of bones does seem like something you would do on a CPU (or at least setting the indices of bone positions in a pre-loaded animation), but as far as I'm aware, 99% of the work for this sort of thing is done in a vertex shader as it's just matrix math to change vertex positions.
> Consider the creation of a Beizer curve, in 2d or 3d. Do you expect this to be a CPU algorithm, or GPU algorithm? Answer: clearly a CPU algorithm.
Why is it clearly a CPU algorithm? If you throw the bezier data into a uniform buffer, you can use a compute shader that writes to an image to just check if each pixel falls into the bounds of the curve. You don't need to use the graphics pipeline at all if you're not using vertices. Or even just throw a quad on the screen and jump straight to the fragment shader like I did with my circle vector.
A few of the previous approaches are mentioned in Other work near the end. And from reading a few articles on the topic I got the impression that, yes, drawing a single shape in a shader seems almost trivial, vector graphics in general means mostly what PostScript/PDF/SVG are capable of these days. This means you don't just need filled shapes, but also strokes (and stroking in itself is a quite complicated problem), including dashed lines, line caps, etc. Gradients, image fills, blending modes are probably on the more trivial end, since I think those can all be easily solved in shaders.
There's definitely a lot of code out there that still does this only on the CPU, but the optimized implementations used in modern OSes, browsers and games won't.
The best GPU vector rendering library I have seen is https://sluglibrary.com/. The principal use case is fonts, but it appears that the underlying mechanism can be used for any vector graphics.
I think the issue with slug is that it requires a fair amount of pre-computation. So it's great for its use case: rendering glyphs, especially on surfaces in games.
A possibly dumb question. GPUs are really, really good at rendering triangles. Millions of triangles per second good. Why not convert a vector path into a fine enough mesh of triangles/vertexes and make the GPU do all the rasterization from start to finish instead of doing it yourself in a pixel shader?
You can do that, except now you've moved the bulk of the work from the GPU to the CPU -- triangulation is tricky to parallelize. And GPUs are best at rendering large triangles -- small triangles are much trickier since you risk overdraw issues.
Also, typical GPU triangle antialiasing like MSAAx16 only gives you 16 sample levels, which is far from the quality we want out of fonts and 2D shapes. We don't have textures inside the triangles in 2D like we do in 3D, so the quality of the silhouette matters far more.
That said, this is what Direct2D does for everything except text.
I've used Direct2D and DirectWrite to render vector graphic and text (basically various HUD displays) in one of my products ( game like application) and was overall happy with the quality / performance.
8 or 9 years ago I had need to rasterize SVG for a program I had written back then and looked into gpu vs cpu, but a software rasterizer ended up being fast enough for my needs and was simpler, so I didn't dig any further.
At the time I looked at an nvidia rendering extension, which was described in this 2012 paper:
In addition to the paper, the linked page has links to a number of youtube demos. That was 10 years ago, so I have no idea if that is still a good way to do it or if it has been superseded.
There’s a OpenGL-ish gpu graphics library (who’s name I can’t currently remember) that’s in mesa, but not built by default in most distros, and IIRC is also supported on raspberrypi.
I played with it a bit, wrote a python wrapper for it, borked a fedora install trying to get real gpu support, fun times all around. Seems nobody cares about an accelerated vector graphics library.
Not exactly - the article you link to is about SVG/CSS filters, not path drawing. Modern Chrome (skia) supports accelerated path drawing but only some of the work is offloaded to the GPU. In even older Chrome the GPU was used for compositing bitmaps of already-rendered layers.
I’m glad it seems more and more people are looking into rendering vector graphics on the GPU.
Has anyone done any image comparisons between CPU vs GPU rendering. I would be worried about potential quality and rendering issues of a GPU rendered image vs a CPU rendered reference image.
The interesting primitives are: add mul fma sqrt. All of these are mandated by ieee-754 to be correctly rounded. While gpus have been found to cut corners in the past, I wouldn't worry too much about it.
Shouldn't a GPU render (given a correct algorithm implementation) be more correct in environments where zooming and sub-pixel movements are common (eg. browsers)? The GPU runs the mathemarical computations every frame for the exact pixel dimensions while the CPU may often use techniques like upscaling.
I would at least think it would be faster/smoother during operations like zooming.
My concern was about precision of math operations on the GPU and potential differences between GPU vendors (or even different models of GPUs from the same vendor).
There's nothing to worry about. You can do the same things on the GPU as on the CPU. The tricky part is finding a good way to distribute the work on many small cores.
> a very good performance optimization is not to try to reject segments on the X axis in the shader. Rejecting segments which are below or above current pixel boundaries is fine. Rejecting segments which are to the right of the current pixel will most likely increase shader execution time.
Thread groups are generally rectangular IME--nv is 8x4, others 8x8. So it doesn't make sense to distinguish X from Y in this respect. But yes, you do want a strategy for dealing with 'branch mispredictions'. Buffering works, and is applicable to the cpu too.
Is there no built in GPU path draw command? Seems like it would be similar (although not identical) to what the GPU does for vertices visibility.
Especially when you consider what tile based renderers do for determining whether a triangle fully covers a tile (allowing rejection of any other draw onto that tile) it seems like GPUs could have built in support for 'inside a path or outside a path.' Even just approximating with triangles as a pre-pass seems faster than the row based method in the post.
Are arbitrary paths just too complex for this kind of optimization?
From my understanding there is no closed form solution to arbitrary paths defined in that way. So the only way to figure out what the shape looks like, and to figure out if a point is inside or outside, you would need to run all the commands that form the path.
Meet "Wavelet Rasterization" by J. Manson and S. Schaefer, from the paper:
> Because we evaluate wavelet coefficients through line integrals in 2D, we are able to derive analytic solutions for polygons that have Bézier curve boundaries of any order, and we provide solutions for quadratic and cubic curves.
> ... seems faster than the row based method in the post.
But the row-based method in the post is not what they describe doing on the GPU version of the algorithm. The row-based method is their initial CPU-style version.
The GPU version handles each pixel in isolation, checking it against the relevant shape(s).
At least, if I understand things correctly (:
As far as I can tell, the approach described here is probably similar to what a built-in "draw path" command would do. Checking if something is inside a triangle is just extremely easy (no concavity, for instance) and common, and more complex operations are left up to shader developers — why burn that special-case stuff into silicon?
What I don't understand - why is there this "cover table" with precomputed per-cell-row coverage? I.e. why is the cover table computed per-cell-row when the segments are specialized per-cell? There is the paper "ravg.pdf" that gets by with basically one "cover" integer per tile, plus some artificial fixup segments that I believe are needed even in the presence of such a cover table. I'm probably missing something, someone who is deeper in the topic please enlighten me?
Does this mean it is mostly done in the fragment shader and there is no tessellation, like how Bezier patches are rendered in 3D land? That's quite different from what I thought I knew.
I think this article is more about alternative rasterization algorithm for 2D geometry. 'vector graphics' is misleading as vertices used in graphics APIs are vectors.
When I originally checked, Slug works in a similar way but doesn't do tiling, so it has to process more edges per scanline than Pathfinder or piet-gpu. Slug has gone through a lot of versions since then though and I wouldn't be surprised if they added tiling later.