Nothing, YAJL got a lot of things very right. We use it in our C daemons for a bunch of things. Not having to unpack the whole JSON into memory is pretty handy.
This Show HN makes me think there needs to be a site for more formalized code reviews of open software. Ideally with some great game mechanics to make sure engagement is high and thing are getting reviewed well.
I am actually working on that as we speak. Here is my (very) alpha prototype. I plan on expanding the supported languages as I roll out each iteration.
What happens when the input length is longer than 2^31? You used an "int" for the length (also, why ever use a signed value for length?) --- even on LP64, that counter wraps at ~32 bits.
(Same question applies to how you handle the max_memory computation).
To clarify: Any "lookup table" that maps hex values to assumed character values is a portability red flag. When using them, it's polite to add comments to explicitly call out the code page dependency and argue (from a spec or RFC, say) why that assumption is okay.
Neither the execution nor source character set (of C) is guaranteed to be ASCII though. This makes the general parsing as well as lines like "if (c >= 'A' && c <= 'F')" non-portable.
\u is utf16 so you should be able to append two characters to get something in the extended unicode set outside ucs16. You dont seem to handle this; not sure how many parsers do.
Also not sure you handle the case where the json invalidly terminates in the middle of a \u sequence.
I've used cJSON (http://sourceforge.net/projects/cjson/) in the past, which worked very well for what I needed (simple 1 file JSON parser for config files). Maybe I'll give this a shot the next time I need to do some simple JSON parsing.
If that's the same cJSON I was using a few months ago, I found it a lot more memory-hungry than it needed to be. I was doing some network code with lwIP on an embedded system, so the all-static nature of js0n (with some helper functions) was a better fit for me.
I've used it and think it's pretty neat. One of these days I'll get around to releasing the helper functions we've written to make it easier to use too.
why are the flag values not enums (and why is 4 missing?)? is using a lookup table for decoding hex really faster than the (minimal) logic (what if it causes cache misses)? do you really think that a state machine with bit flags is the best way to express the logic here? is string_add meant to increment string_length on subsequent passes? what is "json_value * cur_value" supposed to do at the top of json_value_free (maybe i am missing some c trick here?)?
[not dissing you, just bored on a sunday afternoon...]
> why are the flag values not enums (and why is 4 missing?)?
What would the advantage of using an enum be? (and I guess I used 4 and then removed it later.)
> is using a lookup table for decoding hex really faster than the (minimal) logic (what if it causes cache misses)?
No idea, that's just the way I did it. Feel free to try something else and profile if you're really that concerned.
> do you really think that a state machine with bit flags is the best way to express the logic here? is string_add meant to increment string_length on subsequent passes?
There's only two passes, and it increments the length on both (the first is to measure the string, the second is to know where to write in it).
> what is "[..] cur_value" supposed to do at the top of json_value_free (maybe i am missing some c trick here?)?
You're not supposed to mix code and value declarations in ANSI C, so I put it at the top of the function. It's just used to temporarily store the value while reading the parent.
I've converted the lookup table to a few lines of logic. I think it's more readable, and I would definitely bet on it being faster, though since I haven't profiled I don't know how much difference it would make.
ha. on the last one i was confused by your spaces - thought it was a multiplication... (sorry)
[edit] on the bitfield / enum question, i've been looking around for a consistent, standard way of doing things and there doesn't seem to be any one best practice (although various people note that bit fields are normally unsigned ints, while enums are signed).
In ordinary usage, "ANSI C" is a synonym for strict C89. C99 is not well-supported by many compilers, so code intended to be widely portable is still frequently written to strict C89.
longjmp to an earlier stage where you can "retract" the error or somehow wrap it in a chainable form (e.g. add a union to your result to signal whether it's a value or error, or whatever)
That's what exceptions are supposed to do. C doesn't have exceptions, so you use setjmp/longjmp.
On a related note, if you want to get something done on a sunday afternoon, writing a simple recursive descent JSON parser from scratch is both doable and fun.
http://git.qemu.org/?p=qemu.git;a=blob;f=json-lexer.c;h=3cd3...
http://git.qemu.org/?p=qemu.git;a=blob;f=json-parser.c;h=849...
Among other things, this supports streaming, is fairly fast, and has gotten a fair bit of scrutiny against malicious input.
The lexer is a hand written state machine which seems like something you should never do but turned out to be pretty reasonable.