I don’t agree. There is a great deal of room for improvement in jq performance. I profiled one invocation and it spent the majority of its time asserting that the stack depth was lower than some amount, which is crazy. I rebuilt it with NDEBUG defined and it was seriously ten times faster, but it’s not safe to run it that way because it has asserts with side effects, which is also crazy.
Rewriting all or parts of it in C++ would make it dramatically faster. I would start by ripping out the asserts and using a different strtod which they spend an awful lot of time in.
Fair point! I don't mean to say jq performance can't or shouldn't be improved.
Just that jq does two things: 1) ingest and 2) query.
If you're doing a bunch of exploration on a single dataset in one period of time or if the dataset is large enough and you're selecting subsets of it, you can ingest the data into a database (and optionally toggle indexes).
Then you can query as many times as you want and not worry about ingest again until your data changes.
All three of the tools I listed have variations of this sort of caching of data built in. For dsq and q with caching turned on, repeat queries against files with the same hashsum only do queries against data already in SQLite, no ingestion.
I have a large GeoJSON dataset I analyze to answer local government questions. It is of course loaded into a database for common questions but I also find myself doing ad hoc queries that aren’t suited to the database structure, and that’s where I find myself waiting for jq. Also I use jq as the ETL for that database.
Rewriting all or parts of it in C++ would make it dramatically faster. I would start by ripping out the asserts and using a different strtod which they spend an awful lot of time in.