Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Turn your Pandas dataframe into a Tableau-style UI for visual analysis (github.com/kanaries)
712 points by AwsmDef on Feb 20, 2023 | hide | past | favorite | 61 comments
Hey, guys. I've just made a plugin which turns your pandas dataframe into a tableau-style component. It allows you to explore the dataframe with easy drag-and-drop UI.

You can use PyGWalker in Jupyter, Google Colab, or even Kaggle Notebook to easily explore your data and generate interactive visualizations.

PyGWalker (pronounced like "Pig Walker", just for fun) is named as an abbreviation of "Python binding of Graphic Walker".

Here are some links to check it out:

The Github Repo: https://github.com/Kanaries/pygwalker

Use PyGWalker in Kaggle: https://www.kaggle.com/asmdef/pygwalker-test

Feedback and suggestions are appreciated! Please feel free to try it out and let me know what you think. Thanks for your support!




I love this, it seems like the heavy lifting is done by the web app here: https://github.com/Kanaries/graphic-walker

I’m amazed that this is open source, it’s incredibly useful.

I wish there was a profiler implementation, the best profiler is in GCP’s DataPrep.


Ah, there’s a really nice profiler implemented in one of their other projects here (AGPLv3): https://github.com/Kanaries/Rath/tree/master/packages/rath-c...

There’s a lot of really nice features in this other tool, the author’s thought of everything: https://github.com/Kanaries/Rath


Holy shit that's incredible! Thanks for sharing..!


> it seems like the heavy lifting is done by the web app here: https://github.com/Kanaries/graphic-walker

FWIW, both are made by the same entity, Kanaries.


The Tableau algorithm is patented. In particular, the algorithm that chooses the type of the chart depending on the properties of the data. It was developed under the name Polaris at MIT (?).

A few patent applications were accepted in the US and declined in the EU.

This is based on graphic-walker which is based on vega as far as I see. It would be interesting to see if vega is different enough from Polaris/Tableau.


This is very cool. I'm the creator of Mito [1] -- we're also building a data visualization tool in JupyterLab. The Tabluea approach that you took is really interesting! Going to send you a message -- would love to learn more!

[1] https://www.trymito.io


I was working on building something like this as an extension to Datasette and I still believe that would be a very powerful combination. Maybe this could be embedded with datasette via an extension? Maybe something I'll look into if I can find the time.


Very cool!

Anyone know how this compares to Apache Superset? https://superset.apache.org/


Superset doesn't run in a Jupyter Notebook. So they are used differently. This seems very easily embeddable for instance. Or otherwise really practical.


any idea why it doesnt? im uninformed of the architectural decisions behind superset but this seems like a huge wasted opportunity


No I don't know the technical reasons why it is not feasible to get it running in a Jupyter Notebook.

But my best assumption is that it isn't a Python library. You cannot import Apache Superset from a Python program.


This was quite slow and never worked for 100K rows 20 columns. Any work on improving performance ?


Awesome, but I just filed #28 (https://github.com/Kanaries/pygwalker/issues/28) because it makes it a trifle hard to do timeseries visualizations with relatively high res data.


This is really cool, but I wonder if it will get into legal trouble because it looks very much like Tableau.


They have a boatload of patents, too.


Mega impressed, I can see myself using this regularly. Charting and viz with matplotlib and pandas transforms is great, but a lot of time there’s a benefit of dumping to point-and-click mode with tableau to quickly spin data around without much coding. This will make that workflow much smoother


Yep. And also there are so many alternatives of dataviz that I struggle to decide which one to use. You learn a few syntax, then come a newer and more beautiful visualization lib and you have to learn all basic syntax again.


Nice! Do you think it's useful for working with non-numeric tabular data too? I'm using a dataframe in vscode as a kind of mini SQL database. Vscode visualizations of dataframe are pretty bad, would be great to have another option for rendering.


Yes, it's a general visualization tool. But it depends on what you need for your data. Dimensions in non-numeric tabular data can be used with a 'count' to make charts of the distribution of values in the dimensions.


According to the Github VSCode support isn't available as yet. Major bummer tbh.


Good News: it has supported Jupyter Extension for VSCode since release 0.1.3-alpha. https://github.com/Kanaries/pygwalker/issues/21


[Update]

>> Please follow us on twitter for latest updates https://twitter.com/kanaries_data


Can we "embed" these visualizations into a web application -- say a React app?

Any guidance on how one would go about that would be appreciated. Thanks!


If you want to embed the visualization part only, you can export a vega-lite/vega specification and then use Vega-Embed: https://github.com/vega/vega-embed or React-Vega to embed in your web app.

At the graphic walker toolbar, active debug mode, there will appear a button on the top-right corner of the chart. Click it, and then you can export the chart spec.

Or Embed the entire Graphic Walker as a react component: https://github.com/Kanaries/graphic-walker


This looks great trying it with the example data [0], but as far as I can figure out, is there no way to change the column datatypes when they are wrongly typed after loading in local files?

[0]: https://graphic-walker.kanaries.net/


Looks good! was impressed with your web app, having hte same functionality in a Jupyter notebook is a huge plus.


Thanks a lot for your suggestion. It'll be our next goal.


Hey mate: This is Vincent from the ILLA Team. Nice to see your product launch so well!! Hahah.


Amazing! Does it work with Pyspark?


I'm sure it'll be a useful feature!

Welcome to subscribe the discussion for it. https://github.com/Kanaries/pygwalker/discussions/23


I'm sure it'll be a great feature!

Welcome to subscribe the discussion for it. https://github.com/Kanaries/pygwalker/discussions/23


Hmm Im trying to open a few csv's but none of them seem to work. The ui opens but I have no filters or ways to display anything, I just can see the name of the columns/fields from my data. Am I doing something wrong?


With a bit of effort you could consider writing a new issue for the maintainer to consider :)

https://github.com/Kanaries/pygwalker/issues


Is there any support or implantation for writeback functions? We use tableau at work for lots of reporting, and it just pissed me off how expensive many functionalities are if you need to purchase from third party vendors.


What do you mean by writeback functions?


Edit the content of the viz and have it save back to the datasource.


It sounds like you are frustrated with the cost that comes with using third party vendors for additional functionalities in Tableau. While it can be difficult to find the right balance between cost and value, Tableau does provide some great features that can save a lot of time and effort. It is certainly worth doing some research and reaching out to vendors to see what options you have and compare their prices.


Thanks, chatGPT, great insight


I noticed this and the other library it uses assume you know what Tableau is - I don't, but these widgets look useful anyway.

It might be better if they were more up front about what they do, while still acknowledging Tableau.


Cool! I just created a course covering visualization with Pandas, Seaborn, Excel, Tableau, and a few more apps. Would be interested to see how easy it would be to recreate some of the visualizations with this.


This looks incredible. Well done! Can we give money to you for this?


Yes please send the money to rest in my account and I'll be sure to forward it to the authors


Yes, I can easily convince my boss this is worth done some money too...


Wow this looks absolutely awesome. I'll need to play around with it a bit but this would be huge for me.


I thing DfWalker would be a better name.


Is it possible to generate the matplotlib (or whatever charting library) is used to make the visuals?


This is incredible! I've wanted something like this in a notebook for a long time.


Does the pandas data frame data get uploaded to kanaries web servers?


No, it is a python package runs in your machine.


I was curious about how you'd include this as a web component?


We took advantage of the Ipython.display.HTML class which handles HTML pages with embedded JS scripts very well.


Gonna give this a try - I end up doing a lot of complicated data transforms in pandas for plotting purposes. Hopefully this can help out. And minimize the time having to spend labelling etc.

Looks cool


oh my, thank you! This could definitely fill a very noticeable gap in my tools.

I'm looking forward to playing with it this week


This goes in my toolbox, thanks!


Yeah, I'm going to use this a lot.

Like, a LOT.

Thank you.


Wow! this is super cool!


This is the best thing since sliced bread! As someone that uses both Jupyter notebooks in lab(and loves to make vis) and uses tableau to prototype this is epic. It works trying it out of the box.

However, the most important feature I love about tableau and why I’m not dropping it is: the data import and sharing dashboard section. I’m sure this is something that could be interesting to investigate.

The data section where you can link data columns (and filter across all datasets after linking) and do pivot work in an intuitive way(see melt in pandas).

The data dashboards are great to share with my clients the output of my analysis work. I’d love it move to a bokeh style, but customizing those dashboards is not for the faint of heart, although you get an opensource very robust product at the end, and you don’t need to pay license fees going forward. Clients that want to keep their analysis for the long term can go for this option.


This is pretty cool! It uses Vega transforms under the hood, right? The default backend uses JS for transformation which is slow for large datasets. Did you consider using Duckdb under the hood via WASM? I run into this project but not sure how active it is: https://github.com/vega/vega-plus


There is a WIP PR for combining DuckDB with Graphic Walker: https://github.com/Kanaries/graphic-walker/pull/18

WASM of Graphic Walker is coming soon!


Cool! How does it work in practice? Looking at the PR, it seems to pull all the data from Duckdb but is there any plan to apply the transformations (filter, aggregate that the user selected in UI) in Duckdb while rendering the chart?


This PR had just tested running DuckDB-WASM. For more complex computation, I am planning to build a computation engine to generate SQL and push it to DuckDB or other databases. (I built a POC like this in another project, RATH, but the SQL is push to clickhouse instead.)

If you are interested in this, welcome to discuss with us in the PRs/issues on Github or in our Discord: https://discord.gg/Z4ngFWXz2U




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: