I love flatbuffers but they're only worthwhile in a very small problem space. If...

mort96 · on Jan 17, 2023

> But it's not simple to implement. You have to write a schema then turn that schema into source files in your target language. There's an impressive array of target languages but it's a custom executable and that adds complexity to any build. Then the generated API is difficult to use

Worth noting that all these things are true for protobuf as well.

lmm · on Jan 17, 2023

Less so. Many languages have a native implementation of protobuf that uses that language to build rather than a binary (e.g. pbandk), and will generate relatively idiomatic code.

anonymoushn · on Jan 17, 2023

Are there protobuf throughput benchmarks somewhere? I haven't been able to verify that they're faster than JSON.

Edit: I was able to find these at https://github.com/hnakamur/protobuf-deb/blob/master/docs/pe... but these numbers don't seem conclusive. Protobuf decode throughput for most schemas tested is much slower than JSON, but protobufs will probably also be a bit smaller. One would have to compare decode throughput for the same documents serialized both ways rather than just looking at a table.

dub · on Jan 17, 2023

While I haven't benchmarked JSON vs protobuf, I've observed that JSON.stringify() can be shockingly inefficient when you have something like a multi-megabyte binary object that's been serialized to a base64 and dropped with an object. As in, multiple hundreds of megabytes of memory needed to run JSON.stringify({"content": <4-megabyte Buffer that's been base64-encoded>}) in node

qorrect · on Jan 17, 2023

What kind of sicko embeds a whole binary in JSON ?

dub · on Jan 18, 2023

JSON is the default serialization format that most JS developers use for most things, not because it's good but because it's simple (or at least seems simple until you start running into trouble) and it's readily available.

Large values are by no means the only footguns in JSON. Another unfortunately-common gotcha is attempting to encode an int64 from a database (often an ID field) into a JSON number rather than a JSON string, since a JS number type can lead to silent loss of precision.

A more thoughtful serialization format like proto3 binary encoding would avoid both the memory spike issue and the silent loss of numeric precision issue, with the tradeoff that the raw encoded value is not human readable.

ycombobreaker · on Jan 18, 2023

Isn't HTTP POST content similarly encoded? Likewise with small embedded images in CSS, though I am rusty on that topic. Likewise with binary email attachments in SMTP (though this may be uuencoded, same net effect).

The particular example of a trivial message that is mostly-binary just sounds like a useful test case, more than anything else.

mrlonglong · on Jan 18, 2023

Asshole coders, perhaps.

edwincheese · on Jan 20, 2023

Here is a benchmark of Protobuf, FlatBuffers, and Cap'n Proto across Go and Rust. It doesn't include JSON tho

https://github.com/kcchu/buffer-benchmarks

secondcoming · on Jan 17, 2023

IF your protobuf is string-heavy then the C++ code is actually not as efficient as it could be because it copies the field into a new std::string.

It does not support std::string_view (or any equivalent). Google's internal protobuf does support ABSL's string view though, but it's not public.

tylerhou · on Jan 18, 2023

Copying into a string is a safe default. Also, proto’s API currently returns string references (and not views) so making a copy is required for open source.

(Although now std::string_view is common, I hear rumors that the proto API might change…)

secondcoming · on Jan 18, 2023

I don't think it was about safety. It was because they didn't want to make ABSL part of the public API.

tylerhou · on Jan 18, 2023

Maybe that's true, but safety is an additional concern. You have way more lifetime headaches if you alias the underlying data. Copying avoids all that.

anonymoushn · on Jan 17, 2023

simdjson also does this sort of thing. All strings are decoded and copied to an auxiliary buffer. For strings without escape sequences in them or for end users who don't mind destroying the json document by decoding the strings in-place, these copies could be avoided. I may get around to shipping a version of simdjzon (the Zig port) that optionally avoids these copies (this sort of has to be optional because the current API lets you throw away your input buffer after parsing, and this option would mean you cannot do that), but porting this stuff back to C++ and getting it upstreamed sounds more difficult.

secondcoming · on Jan 18, 2023

yes, rapidjson also supports optional insitu parsing.

vl · on Jan 17, 2023

Also, if your data is binary-heavy, protobufs will give you better bandwidth and storage cost than JSON.

karmakaze · on Jan 18, 2023

If you only want faster than JSON then a 'binary json's like MessagePack or JSONB (is that the PostgreSQL one?) avoids dealing with schema. If you want schema it's not like JSON.

BerislavLopac · on Jan 18, 2023

I think that Amazon Ion is gravely underappreciated. [0]

[0] https://amazon-ion.github.io

elcritch · on Jan 18, 2023

MsgPack is great. However I’ve been curious about Amazon’s Ion. Mainly since the namespace stuff allow you to define tags upfront into ints. MagPack transfers field names as strings still unless you crest your own string to int conversion.

packetlost · on Jan 17, 2023

This. They're only really worthwhile in performance sensitive contexts or when you only need a small portion of a larger data-structure.

ilogik · on Jan 17, 2023

there are json stream decoders that allow you to deocde large objects without using a lot of memory, but they are harder to use