In addition to using a pager with sqlite3's fantastic text-only output .modes, if the CSV contains hyperlinks I use a custom UNIX filter I wrote that outputs simple, minimalist HTML. Then I view with text-only browser.
For example, this is how I use YouTube. I never use the YouTube website, with its gigantic pages and its "Javascript player", not to mention all of the telemetry. All the search results and information about videos is stored in SQL or CSV, viewed with a text-only sqlite3 output .mode, and optionally converted to simple HTML.
For me, this is better than a "modern" web browser that's too large for me to compile.
I use DuckDB for queries and Visidata for quick inspections.
Between those two, I can work with not only CSVs, but also JSON and Parquet files (which are blazing fast -- CSVs are good for human readability and editability, but they're horrendous for queries).
CLI CSV tools pop up every now and then, but there's too many of them and I feel that my use cases are sufficiently addressed with only 2 tools.
If you use jupyter check out what I'm building with Buckaroo. The aim is to have a table viewer that does the right/obvious thing for most data (formatting, sorting, downsampling large datasets, summary statistics, histograms). all customizable. Supports pandas and polars.
Nice! Once or twice I've used tad as a GUI to view csv files, but I usually use vi with nowrap or read the file in R. Now csvlens will be my default for csv files.
I like viewing most CSVs in vim. It can be a little annoying when the columns don’t line up well. Often I’ll replace all the commas with a tab, and then set the tab width to some very high value to make it all line up.
Of course, there exist plenty of CSV files that don’t follow any particular standard, so the trick doesn’t always work (if there are tabs in the data, if there’s some very wide field, or if there’s a comma in the data). Works good enough for the files I have, though.
If it's that important and you can afford a few bucks a month, I strongly urge you stop considering your donation, and (right now) follow the links on the GitHub to actually start supporting the project.
It's really easy to wallow in the "considering" phase of a decision indefinitely (or maybe that's just me :D)
Can confirm, I consider visidata to be the gold standard for data exploration.
I have largely given up on spreadsheets, replacing them with relational databases. And on top of being a world class tool for understanding the data for import, visidata makes for a pretty great query pager.
I've been thinking about a terminal spreadsheet editor for about a year now. I haven't deeply investigated this tool while I write this comment but I'd love for something like this for Terminal that supported `=A1+B2` kind of formulas.
Back then I stopped using sc-im because it could not import/export XLSX, if I remember correctly. Apparently it can today!
vim-table-mode always felt a little fragile and I don't want to be bound to vim anymore. That said, it still feels like a small miracle to me to have functional spreadsheet formulas inside markdown documents – calculation and typesetting all in one place.
(1) On one hand you can use it for data that is really tabular,
(2) But it also has an engine that can compute the dependency relationships between cells and recalculate the cells affected by a change. This is quite different from conventional programming languages where you are required to specify an order to put operations in. Of course this a problem for exploiting parallelism but I'd charge it is one more bit of cognitive load that makes it harder for beginners and non-professional programmers. (Professional programmers are just used to it and only run into problems in unusual cases where circularity is involved, but I think it's one more thing that beginners struggle with.)
The worst problem is a lack of separation between code and data. If you are doing an analysis you might put something like
=SUM(A1:A28)
in A30. Excel will change that to
=SUM(A1:A29)
if you insert a row in there, which helps, but they lock you into the mindset of "I'm making the December sales report" as opposed to "I'm making the monthly sales report". That is, the data and the analysis should be two separate things: just as you can put the December data or the January data into a Python script.
(Note some of the same problems still exist with "notebooks" and "workspaces" where you wind up with a file that has both code and data in it which can be problematic to check into git, particularly when your are working for people who would like to have a beautiful notebook with an analysis in it to view in GitHub but will then struggle to version control it. Many "data scientists" fail to rise above the December sales report even though there's a clear path to turn a Jupyter notebook into a Python script.)
---
As much as that is a rant I think there's a huge untapped market for things that are like spreadsheets but different. That is, something that looks like Excel but is specialized for editing tabular data (no formulas, CSV import 'just works' all the time, ...) or something that has formulas like Excel but not on a grid or not just on a grid. For the latter there was this product
"Unlike models in a spreadsheet, Javelin models are built on objects called variables, not on data in cells of a report. For example, a time series, or any variable, is an object in itself, not a collection of cells which happen to appear in a row or column... Calculations are performed on these objects, as opposed to a range of cells, so adding two time series automatically aligns them in calendar time, or in a user-defined time frame. Data are independent of worksheets..."
That was forty years ago and as far as I know there hasn't been a program that works the same way since. Perhaps someone could reverse-engineer it (the jav.exe file is only 56,448 bytes!)
Spreadsheets are kind of interesting tech. Because the user is more-or-less directly building the data dependency graph, I’d expect them to map better to actual hardware (massively parallel in so many dimensions) better than a sequential procedural language like C.
But, they are not very fashionable.
It would be fun to have a pipeline to compile a spreadsheet, that produces an executable. Potential outputs: cpu, gpu, fpga. Haha.
Yes, exactly - imagine a modern version written in Rust, with vi key-bindings, etc. Please someone be inspired by this idea and build this. TUIs are making a comeback and the spreadsheet is still one of the most useful and effective tools in all information science.
I've been using Visidata and love it for viewing, extracting rows or columns to a different file, but I know it is capable of much more that I haven't figured out yet
I enjoy using clickhouse-local for parsing csv files. I generally hit situations where I need custom delimiter and custom parsing rules, I find it handles all of these edge cases very well. Recently I found that if my csv files are compressed, i don't even need to uncompress them, it auto-magically figures out the compression format and process it for me.
many csv data sets cannot be handled by awk due to csv supporting things like commas inside fields, newlines inside fields, etc. Your solution would also fall over for any tabs within fields. Plenty of tools out there support RFC 4180 CSV files as well as common csv variants. And these tools have been around for a decade and have suites of cli commands that can be piped together, just speaking "CSV" as a format rather than pure text streams.
Good points on using proper CSV tooling 100%; so that’s why the disclaimer “more often than not”
BTW I just realized that I was omitting the field separator ‘-F,’ actually.
For those pesky nested commas from top off simplistic text hat still I’d go back to sed for fun and potential profit:
sed 's/","/"\t"/g; s/^"/"\t/; s/"$//; s/,,/, ,/g' | column -t
Not elegant nor hands-off and might be missing something else; def needs the right data / volume and fiddling as we know ymmv; can get unwieldy quickly etc etc
One thing I often do with CSVs is sum up all or specific rows in a column.
For example maybe you're doing end of year taxes and now you have this large CSV export from your bank or payment provider with multiple categories and you want to get the totals for certain things.
In a GUI tool it's really easy to sort by a column and drag your mouse to select what you want and see it summed in real time.
Oftentimes things aren't clean enough to have 100% confidence that you can solve this with an automated script because maybe something is spelled slightly different but it's really the same thing. This feels like one of those things where spending a legit 10-15 minutes once a year to do it manually is better than trying to account for every known and unknown edge case you could think of. The stakes are too high if you get it wrong since it's related to taxes.
Has anyone found a really good standalone basic spreadsheet app that "just works" which isn't Microsoft Excel that works on Windows or Linux? I don't know why but Libre and Open Office both struggle to parse columns out in certain types of CSVs and the sorting behavior is typically a lot worse than Google's spreadsheet app but I'd like to remove some dependence on using Google.
If you want to use the CLI, Visidata might be what you want. It does have a bit of a learning curve. Beyond that, I've found it quite handy to do quick data explorations. e.g. there are shortcuts for histograms, filtering, x-y plots, etc.
You can but it's back to depending on code to get the totals. This is one spot where IMO being able to visualize the data by seeing the rows and have immediate feedback on the sum is useful. You can continue dragging or use CTRL + SHIFT clicking to select more stuff as needed while letting your brain decide what should be grouped together.
With a SQL / code approach you have to account for these things without being able to see them and then adjust the code afterwards to include your custom groupings. It ends up taking more time. If the categories didn't change every year it would for sure be worth it to code up a solution since you'll know the edge cases by looking at the existing CSV but it's a moving target because it could change next year.
Have you looked at data wrangling software like Easy Data Transform, Knime or Alteryx? It is specifically written for cleaning, merging, summing and reshaping tabular data. They can all handle CSV and Excel files (plus a lot more).
Nope, but I just tried. I went to their site and noticed there's no binaries for Windows. The site is also served over HTTP (not HTTPS) and the default experience for apt installing it on Ubuntu 22.04 didn't work due to a bunch of packages no longer existing:
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/e/evince/evince-common_42.3-0ubuntu3_all.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/poppler/libpoppler118_22.02.0-2ubuntu0.2_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/p/poppler/libpoppler-glib8_22.02.0-2ubuntu0.2_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/g/ghostscript/libgs9-common_9.55.0%7edfsg1-0ubuntu5.5_all.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/g/ghostscript/libgs9_9.55.0%7edfsg1-0ubuntu5.5_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/e/evince/libevdocument3-4_42.3-0ubuntu3_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/e/evince/libevview3-3_42.3-0ubuntu3_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/e/evince/evince_42.3-0ubuntu3_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/w/webkit2gtk/libjavascriptcoregtk-4.0-18_2.42.1-0ubuntu0.22.04.1_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/w/webkit2gtk/libwebkit2gtk-4.0-37_2.42.1-0ubuntu0.22.04.1_amd64.deb 404 Not Found [IP: 91.189.91.83 80]
Does it allow specifying columns when filtering for rows?
For instance, the shown example filters for "Bug" but seems to filter for rows that contain "Bug" in any of the columns. Can I specify to filter for "Bug" only looking at a certain column?
At the same time - is filtering by numbers implemented, like filtering for rows that have a numeric value above X in a certain column?
If not these should be on TODO list - those are very common operations for .csv type data.
I've enjoyed using csvkit[^0] in the past. The viewer isn't as good as csvlens seem to be, but it comes with the ability to grep, cut and pipe CSV data which has come in handy.
I don't understand if this is meant as a joke or not. Do you mean that we then would have Tsvlens? Tabs don't automatically make columns aligned, nor do they eliminate the problem of quoting. I don't get it, but then again, I'm exceptionally dim-witted.
Other people have mentioned a lot of advantages but I'll go for another one: TSV is eminently more readable and parsable just off the bat. Empty space is always more readable than `,`, and `,` appears a lot in strings, tab appears a lot less.
TSV are far superior to CSV as the separator is a TAB and not commas, semicolons or other weird stuff depending on your locale. Same for single and double quotes. Only thing to be aware are tsv created by Excel with newlines embedded in text fields, which are a huge PITA for almost all parsers.
Because tabs are flexible space, you can render them really wide in most text editors, wide enough that all the columns line up. But it is a hacky solution!
I have been looking for a viewer exactly like this for so long! Visidata is nice, but way more complex than what I was after. This will fit perfectly into my workflow, thank you for sharing!
I tried some TUI tool that was pretty advanced at viewing tables (sorting, filtering etc.), but I don't remember the name. It was on a WSL instance on my work laptop which I nuked when they laid me off.
I remember that it was tab-based. Any idea what that could have been?
Call me lame, but are there any open-source projects that accomplish this type of gui in typescript? I have an idea but it uses something written in JavaScript.
You can use any javascript library in typescript, and TS is all JS at runtime anyway. You can even add types without touching the JS library, just write a .d.ts file and stick it anywhere that TS looks for sources.
I don't think this would work as a lesspipe.sh viewer, since it is interactive.
I have wanted this for VisiData for data files. You would need a wrapper around less that could start other programs instead of less.
You might be interested in Tidy Viewer with lesspipe.sh