Hacker News new | past | comments | ask | show | jobs | submit login
A Tiny Grammar of Graphics (observablehq.com)
240 points by pavpanchekha on June 14, 2022 | hide | past | favorite | 37 comments



A larger, more detailed introduction to this topic is best discussed by Hadley Wickham in his seminal paper "A Layered Grammar of Graphics".

https://byrneslab.net/classes/biol607/readings/wickham_layer...

Wickham is the Chief Scientist at RStudio and created R packages such as ggplot2 and the tidyverse.


Even more detailed is the original book by Leland Wilkinson, on which ggplot2 is based: https://link.springer.com/book/10.1007/0-387-28695-0.

The original implementations go back to SYSTAT and SPSS GPL (Graphics Production Language).

GPL especially, with its statement-based approach, has arguably better ergonomics for interactively and iteratively producing plots compared to function-based approaches.


I use ggplot2 regularly and have read Wilkinson but honestly I have a bit of trouble seeing how ggplot2 is an implementation of Wilkinson's book.

(It's been a few years, maybe I should take another look at the book.)


The overview links to the book. See: https://ggplot2.tidyverse.org/

> ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.


I have the book! I just should figure out where it is on my shelves.


Rather expensive by my standards these days. When I was a starving student, I would not have hesitated to buy it from that corner in the back of Cody's Books.

Springer runs occasional sales up to 40% discount about once a year, but a don't recall if "The Grammar of Graphics" was eligible last time.


They're running a sale right now, through June 30, but it doesn't look like G of G is eligible: https://link.springer.com/shop/springer/yellow-sale/en-eu/


Love that paper. Wickham also made his ggplot2 book freely available:

https://ggplot2-book.org/


H2O Wave has a similar (probably tinier) Grammar of Graphics API: https://wave.h2o.ai/docs/plotting

Leland Wilkinson (GoG inventor) and I designed it together a couple of years back.

The function for creating marks (a layer) tries to be as "flat" as possible, in the sense that it should be possible to render most common kinds of plots without having to pass nested/hierarchical options: https://wave.h2o.ai/docs/api/ui#mark


I’ve been building data visualizations for web for almost ten years. Most of the time it was some kind of dashboard with custom charts, interactivity and of course brand look.

Grammar of graphics always was this North Star for me. It is very helpful to go through papers and books and search for inspiration how to organize your system. But direct implementations are finicky to work with. And in my hubris I attempted to write yet another one implementation of grammar of graphics and it resulted in exactly the same problems! With complex marks it is ambiguous what is a data point and what is a series. Tuning looks require this configuration objects scattered around chart definition and composition sometimes require to inject something in two different parts of definition.

Now I treat grammar of graphics as this collection of patterns and good practices. But surrender to pragmatic solutions when necessary.

Anyway I think I owe big part of my career to works of Wickham and Wilkinson.


> With complex marks it is ambiguous what is a data point and what is a series.

I sort of agree with you. I've implemented the Grammar of Graphics from the ground up four times professionally (!), twice in collaboration with Leland, all in different products.

The main reason why it might be finicky to work with directly is that the point vs. series vs. series-of-series distinction can run arbitrarily deep, so there's some mental gymnastics involved on the part of the library's user how to refactor the data and present it correctly so that the library can do its thing.

Tableau, which is also a GoG system, sort of deals with this by having slots for "Dimensions", "Pages", "Color", etc. as proxies for multi-level aggregation ("group/slice/dice" in BI terms). So even though it's not immediately apparent to new users how to present data correctly to the rendering system to get the kind of vis they want, at least it's pretty low-friction UX to shuffle variables between those slots till satisfied.

With programmatic use, that shuffling-around gets cumbersome because now you have write code to munge data into submission.

Tableau introduced the "Show Me" feature precisely for this reason - most new users would rather get stuff done quickly than figure out how the GoG can best solve their vis problem.


I'm really curious how Observable will fare in the VC bear market. They seem to be a fantastic team of conceptual thinkers but I'm not sure what the cash flow looks like for this kind of tooling


I’ve never wanted to use a product more, yet not been able to use it.

Javascript data products fall into a weird gap of having the best visualization tooling and the worst data manipulation tooling. I know there are efforts like arquero and DuckDB which make data more accessible, but there’s no really strong scipy/numpy/statsmodels/scikit-learn equivalent.


> the worst data manipulation tooling

For your situation, is it possible to separate the tasks: Using Python to handle manipulating the data, then JavaScript to display it?


I've been having fun using Observable Plot to graph various things related to digital signal processing. Here are my notes. Feedback welcome!

https://observablehq.com/collection/@skybrian/digital-signal...


I am ever increasingly a fan or just manually generated SVG. Creating DSL abstraction layer on top is fine, I guess. But ultimately I know what I want and it’s easier to just make it directly instead of fighting a tool to try and induce it to make the thing I want.

That said, I’m weird. My blog is artisinally hand crafted HTML, JS, and CSS for the same reason.


This is definitely possible, and gives you a high degree of control over the visual design of visualizations.

I built an experimental system to do just that - design the layout and all data visualizations in a single Sketch/Figma document, export to SVG, then map data to the SVG elements in the browser. It's all 100% declarative.

Here's an over-the-top example: https://youtu.be/S9cmi89fvT8

There are no limits to what you can achieve with this approach, but obviously needs Sketch/Figma skills (or a colleague who does).

edit: typo


I’ve done it in both JavaScript and Rust. At the end of the day it’s just a bunch of lines, circles, polygons, and text.


That's true of almost all computer UI (excepting 3D modeling). It's all just boxes and text.


Kind of. Except with SVG data viz you likely don’t need a layout engine. It doesn’t need to be “responsive” and render differently on different devices.


I also find that writing SVG directly is easiest.

There are a few primitives that d3 uses, and once you implement those you can produce easy SVG results. `Scale` is the most important, for mapping your x,y plane into SVG pixel coordinates. Then some of the ticks helpers can be handy too.

Writing the SVG directly is just the fastest way to get what you need.


Relevant, the OG Bertin's Matrix Theory of Graphics

http://www.geo-informatie.nl/courses/grs60312/visualisation/...


His masterpiece Semiology of Graphics is a treasure trove of novel visualizations.


The thing I hate about all of these GoG approaches is the wastefulness of the translation of data + style into visual representations. For example, if you have a dashboard with 8+ charts visible, the scaffolding of the charting library starts to weigh down the system in both performance and memory usage. VegaLite, especially, seems to make a copy of the data being passed in. Looking at the examples of ObservablePlot, I can see more wasteful processing in the form of dataset.map(d => d.property) sprinkled in several places.


This has nothing to do with the GoG.

This applies to any charting library that forces you to provide both spec and unaggregated data to memory/cpu constrained clients (e.g. Javascript in the browser). This is done for implementation-simplicity (Vega, for example), but obviously doesn't scale to larger datasets.

I've implemented a system where the data part of the spec is munged in-database, and aggregated data is provided to the browser, along with hints for axes, scales, legends, etc. It requires a part of the GoG interpreter to be resident on the server-side.


That sounds very similar to VizML (the visualization/data processing library underlying Tableau). That has been my big complaint about most visualization libraries - there is no sharing of the underlying data set for multiple projections across the same large data set. Grid/table libraries have the same issues


Yes. Tableau would have to separate rendering from data select/filter/aggregation, especially because integrating with customer databases live is a key use case. Hence the built-in buffet of connectors/drivers.

It looks like with later versions they switched to kind of a hybrid approach (part-remote, part-local) with Hyper to reduce latency for interactivity.

> there is no sharing of the underlying data set for multiple projections across the same large data set

But that would require some kind of open standard for portability, no?


>But that would require some kind of open standard for portability, no?

I like the approach AGGrid uses - they provide a viewport based interface that the grid uses to display data, and you can implement that interface on top of your data model - https://www.ag-grid.com/javascript-data-grid/viewport/. Unfortunately it's only available in their enterprise version, but this approach scales to both grid and chart based UIs. D3 has a bit of that flavor as well, since you can map visual attributes into your underlying data any way you'd like.


I was assuming you were referring to a declarative approach in your previous message.

The Ag Grid approach makes sense if data and vis need to be wired together programmatically.


I didn't know GoG existed when it came to writing up a couple of tutorials[1][2] on how to go about building a (very, very simple) charting tool[3] on top of my canvas library. I'm going to have to re-assess those lessons, and add some links to other guides, now that I know about them.

Luckily for me, the main purpose of the lessons was not so much about how to build a charting tool, but rather concentrated on how to break the code into modules in the hope that some of the modules could be reused in other, similar projects.

If I'm making obvious mistakes in the approach, or code, that I set out in the lessons then feedback is always welcome so corrections/improvements can be made to them!

[1] - Building the chart frame, code management, etc - https://scrawl-v8.rikweb.org.uk/learn/eighth-lesson/

[2] - Generate bar charts and line charts from crime data - https://scrawl-v8.rikweb.org.uk/learn/ninth-lesson/

[3] - demo of the final code - https://scrawl-v8.rikweb.org.uk/demo/modules-001.html


Here's a ~4K line implementation of a useful GoG subset that you might find useful:

https://github.com/h2oai/lightning/blob/master/src/lightning...

It's used in H2O: https://github.com/h2oai/h2o-3


As the other guy mentioned this has nothing to do with GoG. A good data language or library should provide the user (and plotting libraries) copyless, cheap and immutable slices of the data being handled. Javascript just doesn't really have one. It shouldn't be the concern of the plotting library however.


I never quite got the allure of this "grammar of graphics" business. I mean, I think I "get it", but I don't get it.

For me GoG is basically "use ggplot2 instead of base R plotting libraries when in R".

Which is fine and all, but graphing utilities of matlab/octave are far simpler/flexible.

ggplot2 to me seems like it feels it has to add a layer of complexity to achieve the "grammar" part, without a real practical benefit over matlab etc.


Boy, as newbie tyring to learn DV is really confusing these days when it comes to picking a platform - R, Pythong DJ3, Obervablehq.


Start with the platform one is already using for data munging/transformation. I think R with its ggplot2 library is very good. Python's Matplotlib is also not bad. ObservableHq is also good when the data is closer to visual representation. Overall, I find the data transformation technique does the 80% of the work when it comes to data visualization.


You can also generate Vega(lite) JSON: https://vega.github.io/vega-lite/ And then pick your favourite language/library that generates that : https://vega.github.io/vega-lite/ecosystem.html If needed you can switch to a different library/language while keeping the end result the same (or use different libraries/languages for different parts of your visualization, depending on which is best at a particular task)


Once you've decided which programming language you're using, it's easier to choose a plotting library.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: