Hacker News new | past | comments | ask | show | jobs | submit login
A fast SQLite PWA notebook for CSV files (dirtylittlesql.com)
186 points by mwenge on Dec 30, 2021 | hide | past | favorite | 33 comments



There are a lot of different solutions knocking around for running queries on one or more CSV files. This is a web-based notebook that prioritizes loading large files quickly so you can get running queries on them as soon as possible. It supports most text files, Excel files, JSON. It can also display your results as graphs.


I quite like Q for this though it doesn't have graphs.

Jd is also decent. More expressive languages for data.


Visidata[0] is another great tool in this vein I often forget the name of when I really need it. Python, in the terminal, kind of like vim on CSV with graphing capabilities.

0. https://www.visidata.org/


Second this recommendation for the terminal. For my CLI toolbox, VisiData is my favorite.

I find VisiData is great for quickly exploring and querying data from the CLI. It can handle many types of files (SQLite, CSV, TSV, Excel, JSON, YAML, etc). Visidata loads all the data into memory, and so is very responsive when exploring the data. It allows you to quickly do all sorts of of adhoc queries interactively, without having to write a valid SQL query.

I haven't used Q. When I first heard of it, I liked the idea that Q allowed you to run random queries on CSV and TSV files. However, it seemed like it would be slow if you wanted to do follow up queries, since it had to repopulate the in memory SQLite file for each query. Though it looks like the latest version has a way to cache the generated sqlite file. So that seems like it could help.

Also, if I have some CSV, TSV, JSONL data sqlite-utils is useful for converting them to SQLite, and then exploring with Visidata or SQL queries.

Q: https://github.com/harelba/q sqlite-utils: https://github.com/simonw/sqlite-utils


In this space I absolutely love https://lnav.org -- the "mini-ETL powertool" featuring embedded SQLite and *nix-y CLI for chaining / scripting.

Edit:

PS Unaffiliated, just a fan since 2016(?)

PPS Despite the name and original / primary use case, lnav is useful for things beyond "just" logfiles(!)


A PWA and vanilla JavaScript; we need more of these. Fantastic job!


Not sure what's your definition of "vanilla JavaScript", but https://dirtylittlesql.com/separators.js is clearly produced by a bundler, bundling a bunch of npm packages like events, buffer, etc. I would be pretty alarmed if someone wrote a single 7659-line vanilla JavaScript file by hand in 2021.

And of course the SQLite part is wasm.


Vanilla JS nowadays means not using a framework, in yesteryears it meant not using jQuery, in either case, it doesn't preclude from having dependencies.


to be fair, in case the dependencies are pegged to specific known-good versions and are bundled into one or a few files then two concerns with dependencies go away, namely, insanely deep and wide file system trees and the lingering danger of any one of hundreds of software titles getting malware-ized. One could even add that it's probably a good idea to prefer tried-and-tested existing software over writing everything from scratch.


Why does it matter that it's vanilla Javascript? I don't think this would actually fit the general definition of vanilla Javascript, but I'm more curious why it matters at all?


Not OP, but in my opinion:

When I am reading through the source code of somebodies personal project, it's absolutely amazing to:

- see how to do X in discernible, concise, modern JavaScript

- follow along significant architectural decisions

- watch a web API in action, not a specialized abstraction that will necessarily hide some capabilities

That being said: This project doesn't offer much in that regard. It's hard to read and makes little use of modern JavaScript.


This is really wonderful! The discussion about lay people's knowledge of sql reminded me that the Pandas API is often useful for non-sql folk. Likewise there are some projects similar to dirtylittlesql to bring Python data manipulation to the browser.

https://github.com/jtpio/jupyterlite

https://github.com/gzuidhof/starboard-notebook


Also worth a look, built as a static web spa

https://github.com/lana-k/sqliteviz


Very cool, but I didn't found a way to configure the delimiter (most of my files use ; as a delimiter)


Forgot to add a semi-colon as a possible delimiter! Fixed it there now.


hey there, i've left a github issue / feature request.

Column UI would be awesome


What would be cool is integrating this with OpenAI Codex - the natural language to SQL transformations were impressive when I played around with them.

This could then be pitched to non-techie folk who want to simply ask questions from their data on flat file.


I’m skeptical of the utility of building for non techie people. It takes literally tens of minutes to Learn SQL necessary to a single csv file; and the overlap of people needing to query csv files and capable of learning basic select syntax is probably nearly perfectly overlapping.

I don’t knock the research at all, it’s very cool. But it has to be basically perfect for someone even slightly experienced to consider using, otherwise it will cause more frustration then time saving.


> It takes literally tens of minutes to Learn SQL necessary to a single csv file

It might take you, as a programmer or technically literate person, tens of minutes to learn the SQL necessary to parse a csv file.

But I have met a whole raft of people that would firstly take much longer to learn enough SQL to get by, then would still need help when basic syntax error messages came up, and then would get frustrated before getting someone else to do basic analysis for them.


I have met those people to, but none of those people have ever had a csv file to run queries on.

Analysis has a lot of meanings. Getting basic facts (there are N rows that X) and selecting subsets of data are realistic analysis non technical people might do on csv files.

And I want to be clear that parsing a csv file with sqlite or MySQL is much more complex than learning “SELECT .. FROM .. WHERE” (with no joins!) when there is a tool like OPs that magically does the “parsing”.


In reality, a basic query might be something like "Show me how many units of product code 010201 were sold each month this year", which actually isn't that easy to write in SQL (but very easy to express in plain English).


> And I want to be clear that parsing a csv file with sqlite or MySQL is much more complex than learning “SELECT .. FROM .. WHERE” (with no joins!) when there is a tool like OPs that magically does the “parsing”.

Sqlite CSV import:

.mode csv

.import file.csv tablename


You mean like this? - https://leesaapp.com/


Yes, thanks for sharing.


Nice app, best of all no Electron needed.


Welp, I was gonna post my web0 desktop Electron app for working with CSVs using SQL: https://superintendent.app


Here's another one I'm working on [0]. Run it as a desktop Electron app or as a server/web app. There's also a serverless/in-memory web app demo running here [1].

[0] https://github.com/multiprocessio/datastation

[1] https://app.datastation.multiprocess.io



Love it! Could you also provide some options for decimal point data? Comma instead of period.

It would be great if I could use it as a kernel in JupyterLite!


There is a sqlite kernel for regular jupyter. Not sure about jupyterlite though.


This is really good! Would be great if there was a way to rename the created tables!


Is it available as Docker Image?


Given it's a PWA couldn't you just install it and use it offline?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: