Visualization techniques for different data sets

Theodores · on April 20, 2019

What surprises me with visualisation is that you soon find yourself on your own when it comes to making a 'simple' graph. This takes you out in to a whole world of coding that nobody else will ever re-use.

The examples out there will not do for your data and even if you wonder 'am I using the wrong chart for this?' you can find that you are not.

Recently I needed to do a stacked bar chart with different things in each stack. Imagine column 1 - fruit - with that broken down by apples/bananas/oranges and column 2 - vegetables - with that broken down by tomatoes/potatoes/carrots.

The other problem area was colouring the chart. I generated colours in the hsl colour space so that where something was on the chart determined colour and how much of it there was determined saturation/lightness. Some people get offended by eXcel style hard colours so I could not just use random clashing colours and call it a day.

Beautiful visualisations are hard, even with the best graph tools out there I expect some inevitable situation of being an edge case and having to find solutions the hard way. The goal being so that nobody even notices the hard work put in. Only if the graph is ultra intuitive and simple is it going to work with some audiences. Simplicity is hard.

rocker_pj · on April 20, 2019

Experimentation when it comes to visualization is a pretty inherent task. Whenever, and I mean literally every time, I start creating a graph, I end up opening at least 5-7 SO tabs, matplotlib, and seaborn docs, etc.

It is just that visualizations are not a solved thing yet.

alexilliamson · on April 20, 2019

I'm finding this to be pretty accurate for python viz, but I gotta say, when I was working in R, I got to the point where I could make most graphs without looking at the docs.

igg · on April 20, 2019

While you still need some experimentation to get a plot right, following the grammar of graphics approach can make it quick and methodical. For any plot, I just need to answer a few questions:

* What kind of plot do I want? Bar chart? Scatter plot? Boxplot? Pick the geom based on this.

* What column goes on each axis? Do I need to transform the axis? For e.g., is log scale better?

* Do I split data by some other column? Use different colors/fill, shapes, or facets.

* Do I need any annotations to point out interesting parts of the plot?

GGPlot and Plotnine are joy to work with once you grok the ideas of tidy data and grammar of graphics. Since the API is consistent, experimentation can be quick:

* That scatter plot looks to busy? Change the geom to a boxplot.

* Want to see if an attribute has any predictive power? Assign it to any unused aesthetic.

Seaborn can provide some higher-level functions compared to plotnine/ggplot like pair-plot. But as you said, you need to lookup seaborn's documentation all the time to work with it.

xtiansimon · on April 20, 2019

Nice article. Concise. Code examples. The plots look good. I like the way the author takes several passes on each problem--'do better' as the author says.

Even when focused carefully on the coding and choosing the right plot for the data, an article about plotting is still so much about the data. And so, it's ultimately disappointing if you can't relate the data to your most common problems.

the8472 · on April 20, 2019

Box+swarm plots are ok if you only have a few data points. if you have more there are better visualization options that don't obscure the PDF.

https://wellcomeopenresearch.org/articles/4-63/v1

lichtenberger · on April 20, 2019

This is very nice. It reminds me, that I'd love to spend some time on novel visual analytics tasks again (as it was one of the main topics while studying) :-)

For instance in the context of Text Mining or geo-spacial stuff.

Thanks for the submission and happy easter :-)

rocker_pj · on April 20, 2019

Thank you lichtenberger. Keep learning and happy easter