Deeper in the site they have a benchmarks page[1] that has a short "why not Cap'n Proto":
"Cap'n'Proto promises to reduce Protocol Buffers much like FlatBuffers does, though with a more complicated binary encoding and less flexibility (no optional fields to allow deprecating fields or serializing with missing fields for which defaults exist). It currently also isn't fully cross-platform portable (lack of VS support)."
Hmm, my personal (biased) opinion is that using variable offsets and vtables (as FlatBuffer does) is a lot more complicated than Cap'n Proto's fixed offsets.
"Optional" fields (ones that don't take space on the wire at all) are not clearly much of an advantage. The usual use of optional fields in Protobufs was to emulate unions, which Protobufs itself never supported directly, but Cap'n Proto does. Also, when wire size really matters, Cap'n Proto offers a very fast compression pass which deflates zeros, and unset fields are always zero, largely mitigating this issue.
The lack of VS support in Cap'n Proto is of course a very fair criticism, and a huge sore point in Cap'n Proto adoption. MSVC is finally catching up in its C++11 support so I hope this can be solved soon.
(Disclosure: I'm the author of Cap'n Proto, and also the former maintainer of Protocol Buffers.)
I think you have maybe misunderstood the purpose of why flatbuffers exists; message packing for real time high performance games. It's not a replacement for Protocol Buffers or Cap'n Proto.
You've heard of Valve Software right? Their game engine, being a distant relative of Quake, used the old quake message packing format up until 1 or 2 years ago. Since then they have retrofitted all their popular games to use protocol buffers instead since it allows them to more easily extend their net code without breaking backwards compatibility (up to a point of course). In the context of games, optional fields are very important as are default values.
Also in this context; the author of flatbuffers is Wouter van Oortmerssen, author of the Cube and Cube2 game engines: http://sauerbraten.org
From the white paper:
> In particular, FlatBuffers focus is on mobile hardware (where memory size and memory bandwidth is even more constrained than on desktop hardware), and applications that have the highest performance needs: games.
I think there's some confusion here. To be clear, Cap'n Proto allows you to add new fields to a message without breaking backwards compatibility, and Cap'n Proto implements default values.
The question is how a field is represented on the wire if it's defined in the schema but you don't assign it a value before serializing. With FlatBuffers, it won't take any space on the wire (although it still takes space in the vtable, but that can be amortized with many instances of the same structure). With Cap'n Proto, all fields are located at fixed offsets from the start of the struct, therefore they take space even if they aren't set. You can truncate unused fields off the end, but not from the middle.
Also, note that Cap'n Proto uses pointers for variable-width fields. The pointer can be null if the field is not set. So the maximum wasted space for an unset field is 8 bytes, all of which are zero and therefore compress nicely.
Cap'n Proto's approach is exactly the approach used by in-memory data structures in C, C++, and all other high-performance programming languages. If your C struct contains a field you don't use, it still takes space in memory. I would think this would be preferable for games.
I think if you have a 50 field data object, and you have cases where you transmit only 5 of those fields, the problem is your data modelling, not the serialization library.
I think looking up values based on a simple offset is far simpler to all the clever rules and "parsing" that happens in capn' proto (some of which no doubt caused by the desire to support streaming "automatically", an anti-feature IMO, streaming should happen at a higher level to take advantage of domain knowledge).
I really don't see how you can even pretend that cap'n proto is less complicated. Yes there's an extra indirection at runtime, but the benefit is great simplicity, and quite possibly better performance when the vtable is hot in the cache.
I dont know full details of Cap'n Proto, so please correct me if I am wrong. My impression is that Cap'n Proto is a re-implementation of protocol buffers with additional functionality for RPC wrapped in a futures/promises abstraction. Would that be a reasonable summary of Cap'n Proto ?
Guessing from the name I expected FlatBuffers to be a unified file format for on-disk and on-RAM presence, so that you can just mmap the disk file and you have the data-structure in memory. No need to translate between on disk and on disk representations. I am not claiming that it is so, this is what I expected it to be, given the name. But after spending some time looking around I am still confused if that's what it is. Would appreciate confirmation either way. Have to look at the code when I get around to it.
I have an use case in mind for a tree of appendable/insertable matrices. HDf5 comes close, but isnt quite what I am looking for.
EDIT @vertex-four thanks for the clarification much appreciated.
> My impression is that Cap'n Proto is a re-implementation of protocol buffers with additional functionality for RPC wrapped in a futures/promises abstraction.
No. It's a serialisation format that's specifically designed so that data structures in this format can be used as-is by programs, without unnecessarily copying data into or out of the format to use it. The RPC layer is then built on top of that, but the format could also be used for mmapped files, etc.
They are compiled as separate libraries, so you can use the serialization layer without linking in the RPC layer.
But if anything it was the serialization that distracted me from the RPC. Cap'n Proto was originally (briefly) a capability-based RPC system on top of Protobufs. :)
Huh. So Google is releasing a competitor to Cap'n Proto. As the former maintainer of Protobufs (at Google) and author of Cap'n Proto (after Google), I'm pretty surprised that I hadn't heard about this. I also don't recognize any of the names, so this is not from the people who were working on Protobufs at the time I left.
I'm the main competitor, so take me with a grain of salt here.
The docs don't look very detailed, but taking a quick look through...
> "On purpose, the format leaves a lot of details about where exactly things live in memory undefined, e.g. fields in a table can have any order... To be able to access fields regardless of these uncertainties, we go through a vtable of offsets. Vtables are shared between any objects that happen to have the same vtable values."
Hrm, that sounds like a lot of complexity and a big performance hit. In order to read a field from a struct, you have to do a vtable lookup to find the offset? Maybe you can still get that done in an inline accessor, but it will make the optimizer sad.
How is de-duping of vtables accomplished? It doesn't look like the API exposes any details about vtables, meaning you'd have to do it magically behind the scenes, which seems expensive?
It looks like the authors may have been trying to maintain the concept of "optional fields" from Protobufs, where if you don't set an optional field, it takes zero bytes on the wire. But the main use of optional fields in practice is to implement unions -- i.e. a list of fields where only one should be filled in at a time.
Cap'n Proto's solution to this was just to build unions into the language, something Protobufs should have done a long time ago.
> "FlatBuffers relies on new field declarations being added at the end, and earlier declarations to not be removed, but be marked deprecated when needed. We think this is an improvement over the manual number assignment that happens in Protocol Buffers."
That's not an improvement at all. Declarations should be ordered semantically, so that related things go together. More importantly, even if you say that they can't be, developers will still do it, and will accidentally introduce incompatibilities. To make matters worse, it tends to be hard to detect these issues in testing, because your tests are probably build from the latest code. Simply adding field numbers as in Protobufs and Cap'n Proto has proven an effective solution to this in practice.
Very surprised to see Google get this wrong, since they have more experience with this than anyone.
> Benchmarks
I couldn't find the benchmark code in the github repo, so it's hard to say what they may be measuring here. FlatBuffers presumably shares Cap'n Proto's property that encodes and decodes take zero time, which makes it hard to create a meaningful benchmark in the first place. Indeed the numbers they put up kind of look like they're timing a noop loop. I can't see how the "traverse" time could be so much better than with protobufs, which makes me think the code might be written in such a way that the compiler was able to optimize away all the traverse time for FlatBuffers because it thought the results were unused anyway -- a mistake that's easy to make.
But I'd love to see the code to verify. Hopefully I just missed it when digging through the repo.
Hi. I designed most of FlatBuffers, so let me see if I can clarify:
Clearly, FlatBuffers seeks a different design tradeoff than Cap'n Proto. We wanted to retain all the flexibility of Protobufs, while having the advantages of the "zero parsing / zero allocation" approach. That brought us to the vtable design.
In use cases where performance matters, i.e. you are churning through a large amount of data, you will be accessing the same vtable repeatedly, and the cost of the extra indirection (and the code associated with it) will be masked by the memory latency of the main data you're going through. It will be (close to) "for free". Even in cases where that isn't the case, whether or not it is able to match Cap'n Proto's accessor code is less interesting than it being endlessly faster than Protobuf and most other serializers out there, given that it is just as flexible.
vtables are de-duped on the fly during construction. It is typically not expensive.
Note this tiny overhead holds for "tables", we also have naked "structs", which do not have the indirection overhead, but have no forwards/backwards compatability. Useful for things like 2d/3d vector types, colors, and other small POD types that are used a lot.
FlatBuffers also has unions, which you should use instead of optionals when appropriate. We feel optionals have a lot of uses beyond just mere unions and forwards/backwards compatibility, however. Game objects can have a LOT of fields, many of which are often at their default value, and thus not stored on the wire. This gives significant compression. The zero-byte compression in Cap'n Proto is cool, but we prefer to not have to use additional buffers when reading. Optionals also give a lot of design freedom, i.e. you can add a field that you know is only needed for very few instances without fear of bloating your binaries, as an alternative to "subclassing", or indeed unions.
On field declaration order: to each his own, I guess. That said, the omission of explicit field id's is something that's handled at the level of the schema compiler, we could easily allow optional id declarations in the schema for people who want the flexibility of declaring things in an arbitrary order. Noted.
We should clean up the benchmark code, so it can be included, yes. It is not running a noop loop, we made sure of that. I haven't verified what makes Protobuf traversal that much slower, I am guessing it's causing a memory allocation somewhere.
Structs are a welcome improvement. Using Protobuf, I was forced to move from a 3d vector type to an array of doubles to get packed=true performance. Small loss of expressiveness, no big deal, but I can imagine it getting a lot worse for heterogeneous and/or more complex structs. Was a shock to see a dynamic allocation for every vector in my 1000-element array. (I understand why this is necessary given the spec of Protobuf though.)
OK, AFAICT there is no bounds checking. When you want to read a message, you give FlatBuffers a bare pointer to the start of the message -- no size. So you can't use this to read data you don't trust I guess.
Which is an OK trade-off for certain situations (like reading your game data from disk). But... not for any kind of secure network protocol.
Maybe I'm missing something, though. I've only been looking at this for a few minutes.
> Most readers of binary file formats can be made to read memory outside the buffer by corrupting the data
I'm pretty sure that would be considered a serious security bug for any format likely to be displayed in a web browser. For instance, an image format with such a bug would allow you to implement a heartbleed-like attack on a user's browser by displaying a malicious image and then reading back the pixel values. That would be very, very bad.
But I can believe that your statement applies to formats used by games for their own assets, where those assets come directly from the game developer.
> OK, AFAICT there is no bounds checking. When you want to read a message, you give FlatBuffers a bare pointer to the start of the message -- no size. So you can't use this to read data you don't trust I guess.
I think the key use case for FlatBuffers is mostly for very-high-performance communication between a set of processes that you control to scale out high-performance systems into distributed systems while keeping the communication overhead minimal, not for, e.g., communicating between untrusted machines over a public network. So, I don't see that as a huge problem in the key use case.
Would be a bad tradeoff even for trusted on-disk data, since disks are so glacially slow and serialization systems always run into bad data from time to time (due to fs data corruption, io errors, memory corrupting bugs, etc).
Incidentally, the first time I saw the Cap'n Proto homepage I thought the whole thing was a joke because I saw the "cerealization protocol" tagline and "infinitely faster" badge and stopped reading after the first paragraph...
EDIT: Oh yeah, and when I skimmed the rest of the page this "confirmed" my suspicions: "Time-traveling RPC: Cap’n Proto features an RPC system implements time travel such that call results are returned to the client before the request even arrives at the server!". I'm familiar with promise pipelining but that sentence (and diagram) made it sound like a complete joke.
FWIW it had the opposite effect on me: it sounded like this wasn't some run of the mill corporate middleware project with bland and enterprisey web pages, but made by someone who actually puts their personality into the game.
Go has a similar vibe although more subdued, with the funny mascot and opinionated culture.
If everyone always asks the same question, can you not respond correctly before they speak? If question B is always followed by question A, wouldn't it be prudent to answer both at the same time?
I don't understand your question -- the choice of language for what?
The blog article you linked is about my own protobuf implementation "upb", which is separate from anything else we've been talking about in this thread.
Many of SBE's unusual properties (such as permitting new fields only at the end of a structure) are designed to maximize performance during streaming decoding.
Hmm. The vtable approach used by FlatBuffer seems harmful for streaming. I don't think the SBE people would have used such a design.
> permitting new fields only at the end of a structure
Be careful not to conflate the order of fields as written in the schema with the order of fields on the wire. Cap'n Proto (like SBE) more or less requires new fields to go on the end on the wire, but lets you put them in any order in the schema. FlatBuffer apparently requires them to go on the end in the schema but lets you put them in any order on the wire.
(I say "more or less" because Cap'n Proto can actually stick a new field into what was previously padding space between two existing fields. But usually new fields go on the end.)
> The vtable approach used by FlatBuffer seems harmful for
> streaming. I don't think the SBE people would have used
> such a design.
If the vtable and fixed-size fields are placed together with variable-length fields (e.g. byte blobs) at the end, then decoding can make better use of the processor cache than if the fields are all mixed together.
But the idea with the vtable is that many instances of the same "table" (record) would share a vtable. So the vtable would presumably never appear inside the table instance, I think.
It's always surprised me that zero-copy serialization is worth it at all. Given how much slower memory is than CPU, once you've taken the effort to stream in data from somewhere, why not transform it into a more convenient format, especially if that means your on-the-wire format might be smaller.
Is the key here that the input gets copied into memory directly without CPU intervention?
Well, first of all, there is mmap(), which does indeed make data available in memory without streaming it through the CPU first. If you have a 10GB file and really just want to read one data point out of the middle, with Cap'n Proto (and FlatBuffers) it's perfectly reasonable to mmap() the whole thing and traverse to that one item.
Meanwhile, in networking, there is RDMA, which indeed allows memory to be transferred over the network without involving the CPU on either end.
But even ignoring those things, memory bandwidth is actually a big reason why you should not want to transform things upfront. You see, if you have an upfront transformation, you're streaming the data into the CPU, then back out in the new format, and then reading it back in again later on when you actually use it. If your data is small enough, perhaps it all stays in cache and this isn't an issue. But if it's large, then you're doubling or tripling your memory bandwidth usage by having a decode step.
In any case, the fact is that there are lots of real servers out there that are CPU bound and spend double-digit percentages of their time encoding and decoding Protobufs.
"In any case, the fact is that there are lots of real servers out there that are CPU bound and spend double-digit percentages of their time encoding and decoding Protobufs."
Having written two such servers (C++, fully async, single thread), this is true. At the time I chose Protobufs, it was the best option, but avoiding the encode/decode is a big win on an async. server that is handling 50,000 connections. Saving a few bytes here and there on the wire is much less important than blasting messages out and consuming them as fast as possible. This can also mean queueing as much of the work to be done by another process on the server side after the consumption of a message (i.e., work queues).
Just because I'm a fan of his, I'd like to point out that the author, Wouter van Oortmerssen aka Aardappel (strlen.com) is creator of the Cube 3d engine and lots of other interesting stuff.
Oh I'm also a fan of Kenton's in case it matters :-)
The problem with FlatBuffers and Cap'n Proto is that while in-memory rep matching serialization formats are much faster (infinitely faster as Cap'n Proto says tongue in cheek) for languages with unsafe direct memory access (like C/C++), they're actually slower and more cumbersome to use in any language without unsafe direct memory access (like Java, Rust, Python, Go, JavaScript/Node, Swift and... well, most languages).
The other problem is that Google tends to release random stuff as open source and then modify it (breaking BC) or abandon it without regard for the people who are using it.
Java has ByteBuffer, Javascript has TypedArrays, Python has the "struct" module, and other languages have other fine ways of reading raw data in this kind of format. Some languages may lack the ability to inline calls well enough for true zero-copy to be efficient, but the worst case is you fall back to parsing the message upfront like you would with any other encoding. That upfront parse is still going to be faster, because it's just easier to read fixed-width values than variable-width.
ByteBuffer has methods for reading primitive values of all sizes -- ints, floats, etc. -- from arbitrary offsets within the buffer. So you just need a wrapper object with nice generated methods for each field which in turn call the ByteBuffer getters.
None that the C++ code works this way too. We don't cleverly generate a struct that just happens to have the right layout; we generate a class that wraps a pointer and provides inline accessors that read/write values from the correct offset relative to that pointer. After inlining, it's just as fast.
If any Pittsburgh HNers are interested, dwrensha is likely to be giving a presentation regarding his Rust-based Cap'n Proto library sometime in the next few months.
[1] http://kentonv.github.io/capnproto/