Hacker News new | past | comments | ask | show | jobs | submit login
jq 1.7 (github.com/jqlang)
434 points by wwader on Sept 6, 2023 | hide | past | favorite | 205 comments



This is great, JQ is brilliant.

I love JQ so much we implemented a subset of JQ in Clojure so that our users could use it to munge/filter data in our product (JVM and browser based Kafka tooling). One of the most fun coding pieces I've done, though I am a bit odd and I love writing grammars (big shoutout to Instaparse![1]).

I learned through my implementation that JQ is a LISP-2[2] which surprised me as it didn't feel obvious from the grammar.

[1] https://github.com/Engelberg/instaparse

[2] https://github.com/jqlang/jq/wiki/jq-Language-Description#:~....


I can't stand jq. I realize this is an unpopular opinion, and our codebase at work has plenty of jq in the bash scripts, some of it even code that I wrote. I begrudgingly use it when it's the best option for me. But something about it rubs me the wrong way - I think it's the unintuitive query syntax and the need to search for every minute step of what I'm trying to do, and the frequency with which that leads to cryptic answers that I can only decipher if I am some sort of jq expert. But I have this instinctive reaction to all DSL languages that embed themselves into strings, like htmx and tailwind (both embedded in attribute string values). I realize some people like it, and it's a well-made piece of software, and I will even admit that sometimes there is no better choice. But I guess I just hate that it's necessary? I guess I could also admit it's the least-bad option, in the sense that it's a vast improvement over various sed/awk/cut monstrosities when it comes to parsing JSON in bash. Certainly once you find the right incantation, it's perfect - it transforms some raw stdin into parsed JSON that you can manipulate into exactly what you need. But for me, it ranks right next to regex in terms of "things I (don't) want to see in my code." I hate that the jq command is always some indecipherable string in the middle of the script. The only real alternative I've ever used is piping to a Python program that I define inline in a heredoc, but that ends up being at least as nasty as the JQ script.


> I hate that the jq command is always some indecipherable string in the middle of the script

It might be worthwhile to just learn how jq works. At the end of the day, you need to learn some language to parse json. I hate DSLs too, but I cannot think of anything as useful and concise as jq.

> but that ends up being at least as nasty as the JQ script

That's exaxtly why jq is so nice. Nice alternatives just don't exist


> That's exaxtly why jq is so nice. Nice alternatives just don't exist

Write a simple Python script, parse JSON into native objects, manipulate those objects as desired with standard Python code, then serialize back into JSON if necessary. Voila, you have a readable, maintainable, straightforward solution, and the only dependency (the Python interpreter) is already preinstalled on almost every modern system.

Sure, you may need a few more lines of code than what would be possible with a tailor-made DSL like jq, but this isn't code golf. Good code targets humans, not "least possible number of bytes, arranged in the cleverest possible way".


The simple existence of DSL tools like jq is the testament to the fact that people don't want to go to a generic language to solve every kind of problem. I'm also convinced that a big subset of "use generic language for everything" do it because they want to use their shiny hammer on that nail as well.


If I don't use something often enough I just forget how it works.


> Sure, you may need a few more lines of code than ...

jQ integrates very nicely into bash script. Especially in between pipes a short&simple jq-snippet can work wonders for readability of the overall script.

On the other hand, if the bash script becomes too complex it may be a good idea to replace the entire bash script with python (instead of just the json-parsing-part)


> a short&simple jq-snippet can work wonders for readability of the overall script.

... if the reader happens to be familiar with the niche language "jq".

Otherwise, you may as well have put some Akkadian cuneiform in there.


> ... if the reader happens to be familiar with the niche language "jq".

Eh. Linux/Unix has always had an affinity for DSLs and mini-languages. If you're willing to work with bash, sed, awk, perl, lex, yacc, bc/dc etc. jq doesn't seem like it should cause too much consternation.


So awk or sed or maybe even grep?

jq seems slightly better than those...


Certainly for JSON.


> Especially in between pipes a short&simple jq-snippet

Many of them are not short and simple though. And each time you do a some transformation, you pretty much need to go in/out of jq at each step of it want to make some decisions or get multiple types of results without processing the original multiple times.


The point in my career at which I used jq the most was when I was doing a lot of work with Elasticsearch doing exploratory work on indexed data and search results. Doing things such as trying to figure out what sort of values `key` might have, grabbing ids returned, etc.

Second to this, I've mostly used jq to look at OpenAPI/swagger files, again just doing one-off tasks, such as listing all api routes, listing similarly named schemas, etc.

From what I've seen in the companies I've worked for, this is fairly consistent, but naturally I can't speak for everyone's use-cases. At the end of the day, I don't think most people use jq in places where readable or maintainable would be most appropriate.


Yea except the python solution is probably going to be several hundred lines, instead of a few.

Python is often not installed in server environments unless it's a runtime environment for Python.

Want to use a non standard library? Now your coworkers are suddenly in Python dependency hell. Better hope anyone else that wants to use this is either familiar with the ecosystem, or just happens to have an identical runtime environment as you.

Or someone could just curl/apt/dnf a jq binary to use your 3 line query, instead of maintaining all of this + 200 lines of Python.


I got to jq for the same reason I go to regular expressions. If you tell me this is too complex

    (?:[A-Z][a-z]+_?(\d+))
Then I don't know what to tell you. Do you think that's too complex and should be a python script too? I don't think so. It looks complex, but if you just learn it, it's easier than a 'simple' script to do the same thing.

I'd argue it's good code if you don't have to sift through lines of boilerplate to do something so trivial in jq or regex syntax.



I do lots of exploratory work in various structure data, in my case often debugging media filea via https://github.com/wader/fq, which mean doing lots of use-once-queries on the command line or REPL. In those cases jq line-friendly and composable syntax and generators really shine.


jq can be easily downloaded, unpacked, executed and deleted in one line of bash.

It's beauty is the simplicity and portability.


Something not having alternatives doesn't make it necessarily nice. It's okay to wish for better even if you have something that works.


> Something not having alternatives doesn't make it necessarily nice

Of course not, but compared to every alternative today, jq is eons better than everything else. It's conciseness, ease of use, ease of learning all make it awesome. So as of right now, it is the nicest thing to use by far.

Personally though, I don't think I do wish for better. Jq is missing nothing that I want.


I think in this case "better" reduces to convincing the upstream data source to not use json.

Putting that frustration on jq seems like a case of transference.


And now you've turned a JSON traversal problem into a parsing problem.. congratulations?


Or maybe jq does have some design flaws.


I really like jq, but I think there is at least one nice alternative to it: jet [1].

It is also a single executable, written in clojure and fast. Among other niceties, you don't have to learn any DSL in this case -- at least not if you already know clojure!

[1] https://github.com/borkdude/jet


What about JSONPath?


It may be fine for hello-world stuff, but how would one JSONPath this?

  echo '[{"name":"_skip"},{"name":"alpha"},{"name":"_other"}' | \
    jq '[ .[] | select(.name|test("^_.*")|not) | . ]'
The same is roughly true for JMESPath, also, although at least it does actually try to allow projections and some limited functions


I believe like this.

  $[?(@.name =~ /[^_].*/)]


Interesting, I only knew about <https://goessner.net/articles/JsonPath/index.html> which makes no reference to =~ nor any such // syntax but it seems there's an IETF draft that is (a) an actual specification (b) includes more modern stuff like what you used: https://www.ietf.org/archive/id/draft-ietf-jsonpath-base-07....

And, while whatever is powering https://jsonpath.com/ does honor your syntax, albeit with an absolutely useless result:

    [
      {
        "name": -1
      },
      {
        "name": -1
      },
      {
        "name": -1
      }
    ]
I found that `pip install jsonpath-ng` does not accept it nor mention it <https://github.com/h2non/jsonpath-ng/tree/v1.5.3?tab=readme-...> so I think it's out on the bleeding edge or something


Is ".*" necessary? I think "^_" should be enough.


I hadn't seen this before. At a quick glance, the syntax looks fine. Though I don't know what command line utility I'd need to use it. It makes me wonder how hard a translator from jq syntax to jsonpath would be... Then we could have our cake and eat it too.


In my opinion (potentially nor popular) JQ has this appeal to nerds the same way that stuff like Perl does. I say this as someone who did Perl for 20years but now prefers python or JS…

For many people regexes are as bad as the jq queries… and vice versa. I would not recommend to write python script instead of regexp, but indeed it may work the same for small data and be more readable.

I love régex and been mastering it since 1999. So much that in 2013 I used it in production to parse binary protocol with dynamic sized fields. I believe the project is still talking 10k plus devices. Google must’ve just released protocol buffers… I would love to finally see regexes which can work over custom flow of objects and also on trees.

I also loved XPath which is very powerful and very comprehensible, then there is CSS1/2/3 which are again for queries to structures tree like data.

The prospect of now learning jq does not appeal me that much even though I appreciate its ingenuity. I may recommend it to dev/ops colleagues now and then, but for me this syntax is a lot of additional cognitive pressure which does not necessarily pay up. Of course if there is large amount of JSON data - it is the Swiss knife.

But nowadays I’ll likely use some LLm to generate the jq query for me. Also would joke with my bash-diehard colleagues who would love one more DSL…


> For many people regexes are as bad as the jq queries… and vice versa

That's almost certainly because both have pervasive generators/backtracking.


jq support regex so now you can mix!


I love and hate it.

For simple things like navigating down one key, or one array entry, I know by heart, and it's incredibly useful. But anything more complicated, and I'm too lazy to lookup the documentation.

jq will fall into the bucket along with sed/awk of "tools I once wished to become an expert on, but will never do so because ChatGPT came along".

Would also put regex into that bucket, but they're so ubiquitous that I've already learned regexes. I wonder if the new wave of coders learning coding via ChatGPT will think of regexes the same way I think of sed/awk.


I think these very terse languages are precisely the ones you shouldn't unleash ChatGPT on. It needs to be really exact and if it is wrong, you can easily end up with something that is an infinite loop or takes exponential time with respect to the input.


My way of using ChatGPT is just to ask it to give me some complicated sed/awk command, and then I can usually understand easily if the command is correct, or easily look it up. So it is very good for learning.


Ok but if "[you] can usually understand easily if the command is correct" or "easily look it up", what do you even need the ChatGPT step for there?


many problems seem to have the property that it's easier to verify a solution than to come up with one. If someone provides a filled-out sudoku puzzle, it's relatively straightforward to check if they've followed the rules and completed it correctly. However, actually solving the puzzle from scratch requires a different kind of thinking and might take more time.


Yes exactly.

I've also found that learning by "ask ChatGPT, paste, verify" is so much faster and more fun than banging my head against concrete to deeply read documentation to reason about something new.

I've started doing this for new programming languages and frameworks as well, and it shortens the learning curve from months down to days.


A fair point.


sed became easy for me when I realised it's essentially the the :s[ubstitute] command in Vim.


Agree - by the time I need more than grep and reach for json parsing, it’s already complicated enough for a Python script. stdin pipped to json.loads ain’t that bad.

Def. seen jq thrown into sed/awk scripts where a readable programming language was the right move. People spend hrs finding the right syntax to these things ~ not always well spent.


I've got similar feelings about it and recently I started experimenting with writing scripts in Nushell rather than bash + jq. I get the json object as a proper type in the script, get reasonable operations available on it and don't have to think of weird escaping for either the contents or the jq script. It cuts down the size my scripts by about a half and I'm very happy with the results.


Python is so much harder to process JSON data in than jq, that that is how I got into working with and on jq almost a decade ago.


Yeah, Python is like 10-20x the number of lines required to do the same thing as jq (especially with the boilerplate of consuming stdin), but that's also why it's more readable. But generally I agree - I would choose jq over some weird bash/python hybrid most of the time. I just wish it was more immediately readable.


Simple jq programs are easy to read because simple jq programs are just path expressions, and the jq language is optimized to make path expressions easy to read. Path expressions like

  .[].commit | select(.author == "Tom Hudson")
which basically says "find all commits by Tom Hudson" in the input.

`.[]` iterates all the values in its input (whether the input be an array or an object). `.commit` gets the value of the "commit" key in the input object. You concatenate path expressions with `|`, and array/object index expressions you can just concatenate w/o `|`, so `.[]` and `.commit` can be `.[] | .commit` and also `.[].commit`. Calls to functions like `select()` whose bodies are path expressions are.. also path expressions.

Perhaps the most brilliant thing about jq is that you can assign to arbitrarily complex path expressions, so you can:

  (.[].commit | select(.author == "Tom Hudson")) = "Anon"
The syntax is strange probably because of this trying to make path expressions so trivial and readable.

jq programs get hard to read mainly when you go beyond path expressions, especially when you start doing reductions. The problem is that it resembles point free programming in Haskell, which is really not for everyone.

The other thing is that jq is very much a functional programming language, and that takes getting used to.


Also, here’s something that seems not widely appreciated: You can write super clever unreadable one-long-line jq programs embedded in bash scripts (I hear you on the point-free thing), or you can write jq programs that live in their own files, with multiple lines, indentation, comments, and intermediate assignments to variables with readable names. I recommend the latter!


Wait, really? I had no idea you could do that. Might have to try that next time Im tempted to break out python or node for a bash script.


I found a random example on GitHub for you. Search `path:jq$` for more.

https://github.com/flox/flox/blob/019095f8bc40e49abc8e5cd0b1...


data = json.load(sys.stdin)

commits = [elt.commit for elt in data if elt.commit.author = "Tom Hudson"]

json.dump(commits, sys.stdout)

Definitely not as straightforward... would be nice to have a bit more affordances for path expressions in Python.


That doesn’t quite work, because JSON objects are parsed to Python dicts, not Python objects with properties, so it would be:

  data = json.load(sys.stdin)
  commits = [
    e["commit"] 
    for e in data 
    if e["commit"]["author"] == "Tom Hudson"
  ]
  json.dump(commits, sys.stdout)


This also won't work since it'll crash on missing fields. e.get("commit", {}).get("author", "") maybe (ignoring the corner case of non-list top level object).


Which is pretty useful - I will get malformed JSON error as earlier as possible.

P.S. `some.get("A", {})["B"]` is bad programming habit because there might be a list on `some["A"]`


You can do it like this with Jello (I am the author):

    jello '[e.commit for e in _ if e.commit.author == "Tom Hudson"]'
Jello let’s you use python syntax with dot notation without the stdin/stdout/json.loads boilerplate.

https://github.com/kellyjonbrazil/jello


This is a non-problem solved by the jq example. Clearly nobody sane writes (or consumes) APIs which sometimes produce array of object, sometimes produce singular objects of the same shape... Or maybe I'm spoiled from using typed languages and cannot see the ingenuity of the python/javascript/other-untyped-hyped-lang api authors that it solves?


> Clearly nobody sane writes (or consumes) APIs which sometimes produce array of object, sometimes produce singular objects of the same shape...

Has nothing to do with arrays, it has to do with the fact that Python dicts with string indexes and Python objects with properties are different things, unlike JS where member and index access are just different ways of accessing object properties.

> Or maybe I'm spoiled from using typed languages and cannot see the ingenuity of the python/javascript/other-untyped-hyped-lang api authors that it solves?

This isn't an untyped thing, this is a JavaScript (and thus JSON) and Python have type systems (even if they usually don't statically declare them) and those type systems and thus the syntax around objects are different between the two.


I see. I am spoiled, I think. :)


Oops, yep totally. Even more futzy! Think if I was doing this a lot I'd totally pull out one of those "dict wrappers that allow for attr-based access" that lots of projects end up writing for whatever reason


jmespath is your friend for this

    import jmespath
    import json

    doc = json.load(sys.stdin)
    print(jmespath.search("[?commit.author == `Tom Hudson`].commit", doc))
I wish it had won over jq because JMESPath is a spec with multiple implementations and a test suite where jq is... well jq and languages have bindings not independent implementations.


`import jmespath` is a lot like importing jq...

> I wish it had won over jq because JMESPath is a spec with multiple implementations and a test suite where jq is... well jq and languages have bindings not independent implementations.

jq has multiple implementations too! In Go, Rust, Java, and... in jq itself.


So just picking Java https://github.com/eiiches/jackson-jq

> jackson-jq aims to be a compatible jq implementation. However, not every feature is available; some are intentionally omitted because thay are not relevant as a Java library; some may be incomplete, have bugs or are yet to be implemented.

Where JMESPath has fully compliant 1st party implementations in Python, Go, Lua, JS, PHP, Ruby, and Rust and fully compliant 3rd party implementations in C++, Java, .NET, Elixer, and TS.

Having a spec and a test suite means that a all valid JMESPath programs will work and work the same anywhere you use it. I think jq could get there but it doesn't seem to be the project's priority.


Repeating an identifier like this is inelegant, it should be (untested)

  commit|[?author == `Tom Hudson`]


jmespath does look like an interesting thing. Wish it weren't stringly-typed but that is a bit unavoidable.


I've found Ruby much nicer for writing dirty parsing logic like this in a "real" language, it lets you be more terse and "DRY" than Python. Which in bigger software projects doesn't hurt me as much but when I'm primarily trying to write something that otherwise would be well handled by SQL or JQ I found Ruby the better middleground for me.


"Indecipherable string" to me means you likely don't understand the language or how it works. The language itself works very well for what it needs to do.

It does not work the same way as something like parsing an object and manipulating it in python. It is a query language. You are building up a result not manipulating objects.

Definitely unintuitive if you are coming from a programming language. Once learned it makes a lot more sense and is even preferable depending on your needs.


> But for me, it ranks right next to regex in terms of "things I (don't) want to see in my code."

Yeah if you don't like jq you likely won't like Regex, xpath, etc. Any syntax that is incredibly terse and complex.

Like Regex though, jq is too powerful to ignore and many times the best tool to use.


jq has regex support! :)


> it's the unintuitive query syntax and the need to search for every minute step

I love jq as a power tool and have the same challenges. I think the best path would have been for JavaScript to adopt something akin to JsonPath, although I more often reach to jq out of familiarity than use it in kubectl.

https://kubernetes.io/docs/reference/kubectl/jsonpath/


Maybe Kubernetes should make use of libjq (or gojq or...)


I hadn't looked into JsonPath as a standard, and on closer inspection, it looks to be stalled out. Maybe I'll keep piping kubectl get <resource> -ojson | jq '<what I'm looking for>'.

https://github.com/jsonpath-standard

https://datatracker.ietf.org/wg/jsonpath/about/


Also not a big fan, but I hate it for a very specific reason:

  echo 12345678901234567890|jq
  12345678901234567000


Sounds like this is fixed in 1.7


Indeed, just tested it and it is fixed. Thanks!


The responses to this comment seem to miss a vital point that the comment is making: languages executed within a different primary language are usually opaque to the tools in use. Those tools are usually aimed purely at the primary language, not any secondary languages used within it. Tools for the secondary language are now much harder to use because they (usually) have to be invoked and used via the primary language.

If I’m working on a Python script which has some jq embedded in it, then these problems probably exist:

- My editor will only syntax colour the Python, and treat jq code as a uniform string with no structure

- My linter will only consider Python problems, not jq problems

- My compiler, which is able to show parsing errors at compile time rather than runtime, will not give me any parsing errors for jq until execution hits it (yes, Python has a compilation step)

- jq error messages that show a line number will give me a relative line number for the jq code, rather than the real line number for where that code lives in the Python file

- My debugger will only let me pause and inspect Python, and treat the jq execution as a black box of I/O

I’m discussing this as a jq problem, but this happens far more commonly with SQL inside any host language. No wonder ORMs are so popular: their value isn’t just about hiding/abstracting SQL, it’s about wrangling SQL as a secondary language inside a different primary one.

- Microsoft’s LINQ for C#

- Webdev-focused IDEs which aim to correctly handle HTML and Javascript inside server-side languages (e.g. PHP)

- what else?


"I can't stand jq."

jq is way too much for what I need. I hacked together a filter in C to reformat JSON and I like it better than every JSON library/utility I have tried. For simple reformatting, jq is slow and brittle by comparison. Also, I can extract JSON from web pages and other mixed input. All the JSON utilities I have tried expect perfectly-formed JSON and nothing else.


Use gron for adhoc, then jq to script once you know what you need to do.


I'm going to leave a mention of jless [1] as well here. The purpose should be obvious from the name.

[1] https://jless.io/



I also find VisiData is useful for adhoc exploring of JSON data. You can also use it to explore multiple other formats. I find it really helpful, plus it gives that little burst of adrenaline from its responsive TUI, similar to fx and jless mentioned.

For my toolbox I include jq, gron, miller, VisiData, in addition to classics like sed, awk, and perl.

- https://github.com/saulpw/visidata - http://visidata.org/

Also there is a great introduction: - https://jsvine.github.io/intro-to-visidata/ "Intro to VisiData Tutorial" by Jeremy Singer-Vine


I understand where you're coming from and often feel the same, but I'm also afraid that this is a clear case of inherent complexity: querying JSON is just a complex problem and requires a complex query language, regardless of how well a piece of software implementing it is designed. The same is valid for regexes of course.


The main problem is treating one-thing and many-things the same way. Its not a great PL design choice (and its why we can't have slurp as a filter). If streams (not arrays) were also first-class, we would easily have `smap`, `sselect` etc and the code would look like a functional programming language where | is the pipeline operator.

Otherwise, its fine if you try to keep the thought "everything is a 'filter' or a composition of filters, and a 'filter' is a function that either maps, flatMaps or filters things" in your mind at all times


Maybe you'd like the nushell approach more, have the structured data understanding and extraction tools built into the shell language.


`jq` and `GNU Parallel` share a world in my brain where I know they're wonderful tools, but I spend more time grokking the syntax of each one as rarely as I need either, than just writing a bash/sed/awk/perl, ruby, or python script to do what I need.


`jq` solves the problem of JSON in legacy shells. But I think the real problem is that the world is stuck using Bash rather than a more modern shell that can parse JSON (as well as other data structures) as natively as raw byte streams.


how would parsing solve querying? the value of jq is the dsl


The problem with Bash is to do anything remotely sophisticated you end up embedding DSLs (a bit of awk, some sed, a sprinkle of jq, and so on and so forth) into something that is itself already a DSL (ie Bash).

Whereas a few more modern shells have awk, sed and jq capabilities baked into the shell language itself. So you don’t need to mentally jump hoops every time you need to parse a different type of structured data.

It’s a bit like how you wouldn’t run an embedded Javascript or Perl engine inside your C#, Java or Go code base just to parse a JSON file. Instead you’d use your languages native JSON parsing tools and control structures to query that JSON file.

Likewise, the only reason jq exists is because Bash is useless and parsing anything beyond lists of bytes. If Bash supported JSON natively, like Powershell does (and to be clear, I’m not a fan of Powershell but for whole different reasons) then there would be literally no need for jq.


Community refuses to admit that powershell is much better alternative to bash/python combo and here we are stuck in this mess.CI/CD scripts spaguetti is usually the most unstable piece of code in a company.


> Community refuses to admit that powershell is much better alternative to bash/python combo

Because its not.

Powershell is very nice as a glue language for .NET components, and its better as a general purpose shell/scripting language than the old DOS-inspired Windows Command Prompt, for sure.


I think it’s better as a shell and scripting language than string parsing in unix shells too.


I'm not going to defend my opinion on powershell, because it's indefensible and arbitrary.

I just can't stand title case, and Microsoft/.net absolutely love it. Everything in power shell is DoSomethingLikeThis.

Powershell is a great piece of tech that I just can't use because I'm old and grumpy and like snake or kebob casing.

I've never tried it on Linux though, so maybe it's different there?


In this case, you might be glad to learn that PowerShell is case-insensitive, even for CLR methods.


I greatly dislike case-insensitivity. It's a source of many problems for users and implementors.

For implementors case-insensitivity means the need for full Unicode support is urgent, while Unicode canonical equivalence does not often make the need for full Unicode support urgent. In practice one often sees case-insensitivity for ASCII, and later when full Unicode support is added you either have to have a backwards compatibility break or new functions/operators/whatever to support Unicode case insensitivity.

For users case-insensitivity can be surprising.

For code reviewers having to constantly be on the lookup for accidental symbol aliasing via case insensitivity is a real pain.

Just say no to case insensitivity.


Are underscores/dashes coalesced out too?


There are aliases, you can name things with a letter if you want. There are even good ones OTB.


Most builtins are aliased:

    Get-ChildItem == gci


Yeah agreed, especially now that PowerShell is available cross-platform.

Nushell[1] also seems like a promising alternative, but I haven’t had a chance to play with it yet.

[1]: https://www.nushell.sh/


Why does it have to be bash+python? I'm finding myself using node.js scripts glued together by bash ones these days unless I'm working on a lot of data. Doing that means you can work with json natively.


`json.loads` in Python exists, and Python does the intuitive thing when you do `{"a": 1} == {"a": 1}`, at least for most purposes (you want the other option? `is` is right there!). Stuff like argparse is not the easiest thing to use but it's in the standard library and relatively easy to use as well.

Not going to outright say that node.js scripts are the worst thing ever (they're not), but out-of-the-box Python is totally underrated (except on MacOS where `urllib` fails with some opaque errors untill you run some random script to deal with certs)


I haven't had a great experience dealing with JSON in Python, but maybe I'm doing it wrong. What would be the Python equivalent of this JS code?

    JSON.parse(<data>).foo?.[0]?.bar
Basically just return the `bar` field of the the first element of `foo`, or None/undefined if it doesn't exist.


Assuming <data> will be a key-value-object aka dict, it would be something like this:

    import json
    data = json.loads('<data>')
    bar = None
    if foo:=data.get('foo'):
        bar = foo[0].bar    
    print(bar)
    
If you can't be sure to get a dict, another type-check would be necessary. If you read from a file or file-like-object (like sys.stdin), json.load should be used.


I love nodejs, it's my go-to language for server side stuff.

Even with that bias though, I have to admit that it's awful for typical command line script stuff.

Dealing with async and streams and stuff for parsing csv files is miserable (I just wrote some stuff to parse and process hundreds of gigs of files in node, and it wasn't fun).

Python is the right tool for that job IMHO.

Also, weirdly, maybe golang? I just came across this [1] and it has one of my eyebrows cocked.

[1] https://bitfieldconsulting.com/golang/scripting


Ruby is the clear answer here. The fact that more people don't use it for this purpose (its intended purpose!) is a travesty.


Any not-designed-specifically-for-shell language will suck for shell, more or less. Ruby, python, node, whatever, they all have the same problem - you write stuff too much and care about stuff you shouldn't care while in shell.


You're probably right. I just wish there was an easier way to handle json on the command line that didn't turn into its own dsl. The golang scripting seems interesting, might be what motivates me to learn the language.


Apparently, the old community need to literary die with their old habits for new to take place. There is no amount of good argumentation that can be fruitful here. And there is tone of it, pwsh is simply on another level then existing combos.


Has PowerShell learned how to pipe a byte stream without corrupting it?

https://stackoverflow.com/questions/33936074/decode-powershe...


The fact that you have to learn a new language to parse JSON is frankly insulting. If you've gotten to the point you're parsing JSON with a shell script, you should've switched to a real language a week ago.

Some people are weird and awe at the ellegance of piping 8 obscure commands, but if I'm given this shit and have to keep it working, I'm rewriting it on the spot.


Are you rewriting it in the first language you learned?

Sometimes less general tools are nice. If they fit the problem space well, they can be very expressive without feeling unwieldy. And in some contexts reducing the power/expressivity is actually a good thing (e.g. not using a C interpreter to make your program and your config file use the same 'language')


I also just add a JQ parser/grammar to the online LALR(1)/FLEX grammar editor/tester at https://mingodad.github.io/parsertl-playground/playground/ select "Jq parser (partially working)" from examples then click "Parse" to see a parser tree of the source in "Input source".

Any feedback is welcome !


Thanks to this, I found your fantastic list of interesting and powerful parser projects: https://mingodad.github.io/parsertl-playground/

Thank you so much for compiling it!

A related question for you and anyone else into this kind of tooling: if you had to automate some structural edits across a codebase that contains a wide range of popular languages (say: C++, C#, Java, Ruby, Python), and you had to do it with a single tool, which tool would you use?


Hey, that's an awesome playground! Thank you for this!


Just adding that a JQ implementation already exists on the JVM[1], although it's not 100% complete.

[1] https://github.com/eiiches/jackson-jq


jq is great for letting users munge their data; we do something similar letting users provision an incoming webhook endpoint, send us arbitrary json data, and set up mappings to do useful things with their data, along with regression tests, monitoring, etc. jq makes the majority of cases straightforward (basically json dot notation) and the long tail possible.


Is your product available yet? I would be glad to try it out!


Yes, sure is! https://kpow.io/

You can get started with the Community Edition for free, it includes our JQ implementation (we call it kJQ - https://docs.kpow.io/features/data-inspect/kjq-filters)

Just shout if you need any help.


I love jq, but I also use JMESPath (especially with AWS CLI), yq (bundled with tomlq and xq as well), and dasel [2]. I also wish hclq [3] wasn't so dead!

[0]: https://jmespath.org/

[1]: https://kislyuk.github.io/yq/

[2]: https://github.com/TomWright/dasel

[3]: https://hclq.sh/


Make JSON greppable!

https://github.com/tomnomnom/gron

I've been using `jq` for years and I'm always able to cobble together what I need, but I have yet to find it intuitive and I'm rarely able to arrive at a solution of any complexity without spending a lot of time reading its documentation. I wish I found it easier to use. :-(


I also love gron, if nothing else to find the paths I need to use with jq later.

But ChatGPT has genuinely solved my suffering writing jq, it does a pretty good job. It even almost replaces gron, if you feed it an exmaple json and ask for jq, it gives you something. It usually needs a little adjusting but it gets me 90% of the way there and saves me a bit of time.

I rarely use it for much else but its a jq winner :)


jq can be used trivially to approach gron-like output:

  jq -r 'paths(scalars) as $p | getpath($p) | "\($p|join(".")) = \(.)"'
See elsewhere in this subthread for a full gron implementation in jq.


'Trivially'


Start with `jq -c 'paths(scalars)` -- that's pretty trivial.


I think parent is referring to the habit of technically inclined folks of using "trivially" similar to words like "simply" and "just" [0][1] in a way that assumes too much about what the reader already knows.

[0] https://www.parkersoftware.com/blog/stop-using-simply-in-tec...

[1] https://news.ycombinator.com/item?id=35759449


Or maybe I'm trying to whet one's appetite for learning the thing by showing a relatively simple expression that demonstrates the power of the language.


You can trivially approximate gron with jq, and you there are jq implementations of gron too, like https://gist.github.com/mkfmnn/7b637aa0ec9b0422b3b75ec181a13...

E.g.,

  # gron
  gron "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" |
  fgrep commit.author
  json[0].commit.author = {};
  json[0].commit.author.date = "2016-07-02T10:51:21Z";
  json[0].commit.author.email = "mail@tomnomnom.com";
  json[0].commit.author.name = "Tom Hudson";

  # jq
  curl -L "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" |
  jq -r 'paths(scalars) as $p | getpath($p) | "\($p|join(".")|select(contains("commit.author"))) = \(.)"'
  0.commit.author.name = Tom Hudson
  0.commit.author.email = mail@tomnomnom.com
  0.commit.author.date = 2022-04-13T14:23:37Z

  # jq with grep outside jq
  curl -L "https://api.github.com/repos/tomnomnom/gron/commits?per_page=1" |
  jq -r 'paths(scalars) as $p | getpath($p) | "\($p|join(".")) = \(.)"' |
  fgrep commit.author
  0.commit.author.name = Tom Hudson
  0.commit.author.email = mail@tomnomnom.com
  0.commit.author.date = 2022-04-13T14:23:37Z
With just a bit more work you can get it to output valid gron, and even to parse valid gron.


Looks handy, but I'd rather go the other way and extend grep (and diff, etc.) to also work on things that aren't restricted to be lines in a text file. The number of times I've needed to go through contortions for things that should be easy solved problems (e.g., grepping for patterns of line pairs, grepping for records that use a different delimiter when the record data itself could contain linefeed, etc.)...


Tip for pairs of lines. Use grep -A, -B, or -C to emit lines surrounding the first line to match, then pipe that into a second grep (also with -A, -B or -C). e.g.

  grep -A1 foo | grep -B1 bar
Will find a line with "foo" followed by a line with "bar" and emit both. Of course, it will also find a single line with both "foo" and "bar", so it's not perfect. This is a quick and dirty solution. Beyond that, break out sed and awk, or maybe the Practical Extraction and Report Language... it's really good at that stuff.


Thanks, it's a brittle hack as you acknowledge but at least there won't be false negatives. Funny how often I wind up crawling back to Perl.

The problem here is literally that someone hardcoded "IT'S ALWAYS LINEFEED" into an algorithm that could work equally well with any record separator character -- in fact probably with any record separator regex. I notice there's now `grep -z` which is one small step towards sanity... but the fully general problem is so easy and so useful to solve it's exasperating.

I guess I should stop complaining and submit a patch to grep to add a `--dont-use-linefeed-instead-use <arg>` option already.


I really like the JMESPath interactive tutorial page (https://jmespath.org/tutorial.html). It helped me when I was first learning the syntax and I still go to it if I run into a particularly weird syntax that throws me off.


on a similar note I wrote little tool to convert from many formats to many formats

the biggest usecase for me is taking some csv, toml, xml, whatever and converting that to json so I can pipe to jq

https://github.com/sentriz/rsl


Nice!!



Another great alternative is JSONPath[1] which unfortunately not as widely supported and known despite being brilliant!

It's inspired by XPath so it's very familiar instead of a complete new DSL. The killer feature imo is the recursive key lookup so you can write `people..address` and it'll find all "address" keys that descend from "people" anywhere in the JSON. It's by far my favorite parsing language for JSON and I wrote an introduction blog on how to use it in JSON dataset parsing [2] :)

1 - https://github.com/JSONPath-Plus/JSONPath

2 - https://scrapfly.io/blog/parse-json-jsonpath-python/


FYI jq also has recursive traversal.


If you’re like me and use jq infrequently enough that you have to consult the documentation every time, try gron. It’s greppable json

https://github.com/tomnomnom/gron


Wow, that's a pretty cool - so simple but so useful.

I'll also share that I've been using `curl cheat.sh/jq` (cheat.sh in general is a great resource) for years.

Although now I'd probably use something like chatgpt.


Chad is super helpful with jq!


Damn thats incredibly well thought!

You can even `gron | grep | sed | gron -u`.

Awesome tool thanks for sharing.


I do this for archiving my YouTube watchlist into git - all the video/thumbnail/subtitle URLs are dynamic with expiry which makes them pointless to keep around. A quick `gron | sed | gron -u` replaces them all with "URL" and my diffs are much smaller and happier now.


If you like gron, try fastgron :)

It supports everything gron supports, but 50x faster.


I was always struggling with the jq syntax...but I'm consistently impressed by ChatGPT coming up with the right commands from an example JSON.


One of the reasons I like/tolerate jq is that it's stable, i.e scripts written for it a few years ago still work the same today.

I have some code around for yq instead that keep breaking because yq keeps improving in non backward compatible ways (I didn't investigate how often yq introduced backwards incompatible changes, but the issue affected me several times in unrelated places, CI scripts or whatnot, that by their nature end up running with different versions of base tooling and update them at various pace)

I was always thus grateful to the great wisdom of the jq maintainers for their understanding of the importance of backwards compatibility.

I hope this announcement doesn't mean that this stability was just an accidental side product of stagnation and that once stagnation is "fixed" it will be done at the expense of stability.


Hey! as one of the maintainers i can safely say that we take backwards compatibility very seriously


I wonder how often someone will develop a script on Archlinux and later be surprised that it will not work in our Debian CI. One nice property about jq was that 1.6 was everywhere, remains to be seen how annoying this will be. Probably not that much.

Is there a way to get jq version inside the script?


awesome! Any plans for jq to get native support for YAML?


Nothing planned but there has been some talks about it. https://github.com/jqlang/jq/issues/2855 is probably the lastest issue mentioning it


gojq has support for yaml input (via a very annoying argument name) and also has the golang property of "curl binary; chmod; profit": https://github.com/itchyny/gojq#difference-to-jq

Its error reporting is also clang-vs-gcc level wizardry, and I often use it to get a helpful message instead of "ENOWORKY" from jq (I haven't tried 1.7 yet, so it could be better for all I know)


that's awesome!

I'd personally use that, but in the context of sharing scripts and snippets with colleagues, the strength of the incumbent `jq` is that we can all assume everybody will have it installed on their machine.


Not yet, but jq is getting into the kind of shape where we could add support for lots of formats. I'd like to have support for various binary JSON types (e.g., CBOR, but maybe too JSONB and others), YAML, and XML.



What are you using yq for? Unless you use YAML-only features (e.g. integers, arrays, and objects as object keys), it seems like it would be easier to just pipe-convert your YAML to JSON and process it with jq.


Not them but I use it in a few places to make automated edits to yaml files.


Depending on whether or not the resulting YAML is supposed to be human-readable[1], you can just produce JSON output, since YAML 1.2 is a superset of JSON. I did that at one of the places I worked in at the time, and it worked very well.

[1]: I personally think that JSON is plenty readable, but a lot of people seem to disagree.


I use it to make small edits to yaml configuration files which are supposed to remain human-editable and keep their comments and ideally also whitespace. I'm sure some people do enjoy working with raw JSON, but I'm very much not such a person.


Ah, understandable.


"Improving in non backwards compatible way" sounds like "deteriorate"


That's a very negative view. If Excel were to fix their 1900-is-a-leap-year bug, I'd call that a clear improvement, even though it would break some spreadsheets that work around the bug. Seen through that lens every major version of almost all programming languages would be a deterioration.


yes I tried to not be too aggressive towards yq's authors who surely don't deserve bad words, but at the same time I wanted to express how painful is even the smallest backwards incompatible change to a tool that may end up being used in many tiny dark corners of your automation that everybody forgets to maintain.


In addition to my previous comment about jq-like tools, I want to share a couple other interesting tools, which I use alongside jq are jo [0] and jc [1].

[0]: https://github.com/jpmens/jo

[1]: https://github.com/kellyjonbrazil/jc


And jless [1] and gron [2].

This is the first I'm hearing of gron, but adding here for completeness sake. Meanwhile, JSON seems to be becoming a standard for CLI tools. Ideal scenario would be if every CLI tool has a --json flag or something similar, so that jc is not needed anymore.

[1] https://jless.io/

[2] https://github.com/tomnomnom/gron


Related last month "First release of jq in 5 years" https://news.ycombinator.com/item?id=36951830

Huge fan, I use it all the time.


Woohoo!! Finally!

It's really awesome how the community pulled together and helped us recruit new maintainers to revive the project. Special thanks to, well, all involved, but especially @stedolan, @itchyny, and @owenthereal (all GitHub usernames).


jq and miller[1] are essential parts of my toolbelt, right up there with awk and vim.

[1]: https://github.com/johnkerl/miller


Thanks for pointing me to miller. Too good.


> Adds new builtin pick(stream) to emit a projection of the input object or array.

> jq -n '{"a": 1, "b": {"c": 2, "d": 3}, "e": 4} | pick(.a, .b.c, .x)'

This is a godsend! Thanks to the contributors! <3


If you don't need to digg deep it's also possible to do:

  $ jq -n '{"a": 1, "b": {"c": 2, "d": 3}, "e": 4} | {a, e}'  
  {
    "a": 1,
    "e": 4
  }


I just installed from Git the other day to use this feature. It’s very useful!


This is a fantastic new feature. I would also love a version of ‘pick’ that works on streaming data, since that doesn’t seem to be possible currently without reassembling the stream first


I'll give a plug for jaq [0], a clone focused on correctness, speed, and simplicity. It only implements a subset of jq, but I've been enjoying it so far.

[0]: https://github.com/01mf02/jaq


Seeing this news today, I decided to give jq another try and ended up discovering jq-mode [1] for emacs. It doesn't just support jq filter file editing, it supports jq in org-mode and something else called 'jq-interactively'. This interactive mode allows you to apply jq interactively on a JSON or YAML (with yq) buffer. The buffer contents become the filtered value when you finish editing the jq filter with a return. This is especially impressive to see in yaml files.

[1] https://github.com/ljos/jq-mode


Didn’t know this. Thanks for the tip!

Personally, when I test REST APIs, I use „restclient.el“ all the time which also comes with a great JQ integration („jq-set-var“ for example for deriving request variables from responses). For traversing larger responses I use „counsel-jq“ in a customized JSON mode: https://github.com/200ok-ch/counsel-jq

But I’ll give the major mode a try, too.


Love jq. Solved many problems

But at one point I started write long jq modules and while it was pretty straightforward, there are less people familiar with jq.

So I declared jq bankruptcy and rewrote it as a nodejs script. The rest of the team was relived


> I started write long jq modules

If you're writing long jq modules, you probably do want a different (faster, better) language.


There seems to be a few comments from people that dislike jq. If you're that way inclined, gron[0] might be more your thing.

[0] https://github.com/tomnomnom/gron


jq is great. I don't know how many times I've had to explain to engineers the invalid numeric literal error means your json is bad. no really, don't trust me, copy it into the ide. it's not jq. your message is malformed.

Strangely, I also have ECMA-404 and RFC8259 open in other tabs. mostly annoyance with the occasional flashes of anger over number formats and duplicate keys.


Recent and related:

First release of jq in 5 years - https://news.ycombinator.com/item?id=36951830 - Aug 2023 (27 comments)


Works well enough, but it really tripped me up at first with the syntax. Had to constantly look things up while using it.

  jq -r \
  '.apps.http.servers.srv1.routes[0]
  | .match[0].header.[env.AUTH_USER_HEADER][0] = "$username"'
That's an error because you can't select an env var key '[env.AUTH_USER_HEADER]' in the middle of a chain like that, only immediately following a pipe:

  | .match[0].header | .[env.AUTH_USER_HEADER][0] = "$username"'
But then I need to preserve the parent object after the assignment and that pipe throws it out. Thankfully, parentheses fix that:

  | (.match[0].header | .[env.AUTH_USER_HEADER][0]) = "$username"'
After working out the gotchas, it's quite powerful, like regex, but a little clunky, though not nearly as much as regex.


> jq -r \ > '.apps.http.servers.srv1.routes[0] > | .match[0].header.[env.AUTH_USER_HEADER][0] = "$username"'

> That's an error because you can't select an env var key '[env.AUTH_USER_HEADER]' in the middle of a chain like that, only immediately following a pipe:

> | .match[0].header | .[env.AUTH_USER_HEADER][0] = "$username"'

You can use `env.AUTH_USER_HEADER` as a key the way you wanted. The issue is that you had to write `... | .match[0].header[env.AUTH_USER_HEADER] ...` -- no `.` between "header" and the index operator!

This complaint is a fairly frequent one, so in fact we did "fix" this in 1.7! You can now write `.a.[0]` and it works.


> Almost feels like I'm learning regex again.

But isn't that what jq is and always has been? I mean,what led you to believe that an entirely unrelated pattern matching language would work?


I can't stress enough how I love jq. It saved my life countless times as data engineer. Good job guys!


A slight tangential question. I use ripgrep-all along with fzf to interactively search through files from cli. Is it possible to integrate jq (or some equivalent) into this search ecosystem to search through json files?



I wish there was a faster version of jq, I was only able to get a few mb/s throughput out of jq vs few hundred mb/s throughput out of ripgrep.

I often use ripgrep to setup quick bash pipelines for rapid data analysis, would love to be able to use jq for that purpose. These days I am setting up scripts with simdjson but the cost of writing a script vs quickly setting up jq or ripgrep in a bash pipeline are orders of magnitude different.


Did you try gojq or jaq?


Thought this is an array programming language considering the name's made out of J and Q. Nice util


The way it works is sort of reminiscent of array programming.


It's great to see jq get some releases again.

It's a big part of our check evaluation infra at OpsLevel.


Let me drop my jq-zsh-plugin here: https://github.com/reegnz/jq-zsh-plugin

It's a line-editor, and allows me to query json documents ultra-fast. It optimizes for a really fast feedback.


'jq' is a lot easier to use with ChatGPT assistance. Simply provide sample JSON and tell the LLM how you want to use JQ to parse and transform.


It would be nice to have this in something like coreutils so that I can be reasonably sure it's included in most systems.


A new jq release is exciting because it means that jq can finally stop mangling numbers that it does not do any math on.


The next killer feature for postman alternatives is integrated jq support for your json output.

Postman handles this especially poorly...



What is postman?


It's an API tool: https://www.postman.com/


The addition of raw 0-byte separated output will make reliable scripting (e.g. piping into xargs -0) a lot easier.


Is this a proper summary of the use cases over Deno/Python scripts?

* Run directly from the command line

* Small interpreter ~1MB

* Compact language (for better or worse)

* Stable


This is a common refrain. So much so that when we changed the release artifact names in 1.7 we broke scripts and docker recipes that would download jq from the github releases 'latest' URIs! (This is fixed now.)


ChatGPT has massively improved my ability to work with jq. I use it just infrequently enough that I constantly have to read the docs for anything non trivial, but ChatGPT lets me pump out scripts quickly. It's been super nice for my work flow.


Reading “JQ” in this context makes me realize how terminally online I’ve been all these years


More programmers need to be aware of the jq.


These are great release notes. Looks like the project is in good hands


jq is amazing, and I'd like it even more if it supported CBOR too. The forks are just too finicky and are quickly abandoned, and the format is basically binary JSON.


I'm working on binary support, then we can have CBOR, yeah.


my favorite tool; makes writing a bash fun for me.


jq is a tool that is so powerful and useful it really should be made part of the POSIX standard. It should come with everything preinstalled. It’s an amazing tool like that of an earlier more refined age.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: