Protobuffers Are Wrong

alecbenzer · on Dec 24, 2019

This was like 2 or 3 sentences worth of good points sprinkled in between a bunch of stuff that I'm pretty sure isn't an actual problem for anyone. If anything, the things pointed out are issues for people trying to write code that manipulates protos generically, which is not what most people spend their time writing and is probably exactly the wrong thing to optimize.

The main good point: Google's problems are probably not your problems, don't just blindly adopt Google tech for no reason.

Also: calling people amateurs without really substantiating is a huge smell IMO. The average Google engineer isn't a genius or particularly amazing by any stretch, but especially for something as core/foundational as protobuf, the answer is much more likely something like "these decisions made more sense for Google internally, especially when weighing against the cost of significantly re-architecting how proto works". The ad-hominem at the beginning reeks of someone who had an email chain that went like:

"You guys are doing proto wrong, don't you realize protos should obviously be like XYZ?"

"Well actually we'd like to do X but it would've been too hard, I'm not actually sure Y is a net-win, ..."

(omg what amateurs...)

kentonv · on Dec 24, 2019

> calling people amateurs without really substantiating is a huge smell IMO. The average Google engineer isn't a genius or particularly amazing by any stretch

Note that Jeff Dean and Sanjay Ghemawat -- the original creators of Protobuf -- aren't average Google engineers. They are literally the highest-ranked engineers at Google (Level 11, "Senior Fellow", a title assigned only to the two of them last I heard), and they basically invented MapReduce, BigTable, Spanner, and a variety of other foundational distributed systems technologies. Jeff now leads the AI division while Sanjay continues to focus on systems infrastructure.

So yeah, "amateurs".

(Disclosure: I wrote Protobuf v2, but it was just a fresh implementation of the same design.)

deanCommie · on Dec 24, 2019

While I agree with alecbenzer's original point, I don't think that Dean and Sanjay created protobuf is a signal that they made the best decisions. Everyone has strengths and weaknesses even within the same job family.

I've encountered plenty of genius-level engineers when it comes to algorithms, architectures, and distributed system design that wrote GOD-AWFUL unreadable and unmaintainable code. I wouldn't trust them to design an API or a common framework optimized for usability.

In fact I almost wonder if intelligence is a hinderence in such cases. When you're too much like Cypher, and "don't even see the code", all code and all library choices feel equivalent.

coldtea · on Dec 24, 2019

Sure, but you didn't address any of the points in TFA.

Whether Dean and Ghemawat are high ranking or not, do the points stand? Is the design of protobuf solid?

Buge · on Dec 26, 2019

It's possible to criticize one part of the article (calling people amateurs) without criticizing other parts of the article.

kentonv · on Dec 25, 2019

See: https://news.ycombinator.com/item?id=21873418

wollsvagwn · on Dec 24, 2019

[flagged]

dang · on Dec 25, 2019

We've banned your accounts in this thread. Please don't create accounts to break HN's guidelines with; it eventually gets your main account banned as well.

https://news.ycombinator.com/newsguidelines.html

Also, please don't create accounts for every few comments you post. We ban accounts that do that. This is in the site guidelines too.

HN is a community. Users needn't use their real name, but do need some identity for others to relate to. Otherwise we may as well have no usernames and no community, and that would be a different kind of forum. https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...

20160216 · on Dec 25, 2019

You want HN to ge a community, start obeying the rules yourself.

HN is not a community, it’s a cargo cult where only certain thoughts are allowed and you ban people who think differently, no matter how well they present their argument or how polite they are.

Further, you engage in ad hominem and dishonest attacks, make calling and violation of every other rule in your rule book you capriciously enforce against everyone else.

Because of this, you have no standing to expect anyone to ever respect your rules.

This is why this site is considered a joke to the rest of the world and the tech community.

It’s the epitome of Bay Area smelling-ones-own-farts, and you’re oblivious.

jvanderbot · on Dec 24, 2019

You need to seperate publicity statements or interview answers from technical contributions. They are so incompatible as to be meaningless comparisons and are intended for totally different uses. I doubt he or any senior engineer at any company would make a statement like that in a design discussion or product review, but they'd all turn around wave hands at the future like that over beers with a journalist.

lepoint · on Dec 25, 2019

What is the point of your justification? That it's ok to spout nonsense during interviews? If so that's a weird position to take.

jvanderbot · on Dec 28, 2019

Alright, I'll condense: High level statements about vague possibilities are perfectly fine in some settings, but not in otheres. They sure smell like bullshit in a highly technical setting, or to people who always take the most technical angle.

fizx · on Dec 24, 2019

You don't get to lead thousands of people by making publicly offensive statements, even accidentally. So yes, that does lead to more boring answers to newspapers.

Politicians don't swear in public, but they sure do in private. I'm sure a smaller conversation with Jeff would be fascinating.

justinclift · on Dec 25, 2019

> You don't get to lead thousands of people by making publicly offensive statements ...

That's not correct. There are (current) examples of people leading thousands (or more) people by making publicly offensive statements.

It's just the statements are polarising (on purpose), to create feelings of inclusion among the supporters + and blame [social problems?] on the people being pointed at.

packetslave · on Dec 24, 2019

If you're going to be an edgelord and call Jeff Dean an "idiot", have the guts not to do it on a throwaway account.

otabdeveloper2 · on Dec 24, 2019

Why? So you can tar, feather and throw him into the fire for crimethink?

strbean · on Dec 24, 2019

TIL filling time in an interview instead of being as concise as possible makes you an idiot. I thought it was just called 'being a good interview subject'.

tedunangst · on Dec 24, 2019

Does that retroactively make protobuffers stupid too?

mironic · on Dec 24, 2019

It doesn't but I don't get why saying someone is smart makes their work better when smart people obviously make stupid decisions and give stupid answers. I've used protobuffs, they're fine. I was riffing on Kenton's justification.

kentonv · on Dec 24, 2019

My only point was that it's absurd to call them "amateurs", and that the author damages his own credibility by doing so.

lobo_tuerto · on Dec 24, 2019

Because being smart is related to making less stupid decisions? Think about it in terms of probability.

Smarter -> less probability of making stupid decisions and giving stupid answers.

Dumber -> more probability of making stupid decisions and giving stupid answers.

bradhe · on Dec 24, 2019

That's like 100 words to say "Yes."

cmrdporcupine · on Dec 24, 2019

As a Googler who has seen gPRC and Protobufs continually rammed into places where they clearly don't belong (ahem, embedded systems), I actually have a lot of sympathy with this article. I wouldn't call the authors of protobufs amateurs, but I do think there's flaws there, many of them around the type system, as this article points out, but also a lot around the language APIs.

My biggest concern with protobuf though is that it ends becoming the proverbial "I have a hammer, now everything looks like a nail" scenario. Every Googler goes through orientation with protobufs and gRPC, and then they proceed to stamp it everywhere... including places it may not belong. And then take it with them when they leave Google.

I think the article tone is inflammatory but most of the points are solid.

alecbenzer · on Dec 24, 2019

> many of them around the type system, as this article points out, but also a lot around the language APIs.

I think there's a lot not to love about gRPC/protobufs (used them at Google and again now at a startup) but I don't feel like this did a good job highlighting those issues.

jhallenworld · on Dec 25, 2019

I wrote my own serialization library for embedded systems recently. It serializes data that can be defined and initialized natively in C. The intention is for data section only data, not heap data. So for example, you can have variable length arrays, but you must have a maximum length to match the preallocated C array declaration.

The serialized format mimicks JSON, except that it supports tables- arrays of structs. This saves space compared with actual JSON, since the column names only have to be given once. Also tables where the columns are primitive types can be loaded directly into common spreadsheet applications. This is useful for my specific application.

The application has a CLI that allows you to browse the data with xpath like expressions. You can set or get any field of the hierarchy (using the serialization format), and since the type is known it can parse and schema check the user input.

I use the serialized format for storing a copy of the data in flash memory, so that it can be restored on boot up.

The metadata with the type information is all marked as const. In embedded systems, this data is placed in flash memory instead of precious RAM.

mhdhn · on Dec 25, 2019

Sounds cool, is it open source?

kudokatz · on Dec 24, 2019

> sprinkled in between a bunch of stuff that I'm pretty sure isn't an actual problem for anyone

I'd push back at this. After starting to use protos (even at Google) these issues smack pretty much everybody right in the face.

joatmon-snoo · on Dec 24, 2019

Also worth calling out: protos have evolved a lot since they were originally created, and there are very few people that are actually aware of the deep, dark corners of protos (think extensions, JS, Android...). (Got a complaint about what Google's open sourced for protos? You don't even want to imagine some of the batshit crazy stuff we did with it before that...)

colanderman · on Dec 25, 2019

> extensions

As someone who has done extensive work with Protobuf extension fields in Python, I can only echo that the API is nightmarish.

msla · on Dec 24, 2019

Using the word "amateur" to mean "inexperienced" or "unskilled" is a smell, or, more precisely, an equivocation: It lumps not being paid to do something with doing that thing poorly, which definitely aren't the same thing.

coldtea · on Dec 24, 2019

>This was like 2 or 3 sentences worth of good points sprinkled in between a bunch of stuff that I'm pretty sure isn't an actual problem for anyone. If anything, the things pointed out are issues for people trying to write code that manipulates protos generically, which is not what most people spend their time writing and is probably exactly the wrong thing to optimize.

Handling a serialization scheme "generically" is the "wrong thing to optimize"?

Sounds like the #1 thing anybody would want from it...

mooted1 · on Dec 24, 2019

Have you ever worked in a polyglot ecosystem with rapidly evolving schemas?

Tools like protobuf and thrift were designed to facilitate schema evolution since interfaces in these ecosystems evolve quickly and independently. Generics undermine this by creating strict dependencies on a few types, making it difficult to evolve a single type without breaking things.

Poorly implemented generics would undermine one of the design goals of this project. In addition, there aren't nearly as many opportunities for generics in an IDL as in a programming language, so what would the upside even be?

alecbenzer · on Dec 25, 2019

Obviously you need certain core pieces of infra that handle protos generically (like serializing and deserializing) but

a) total SLOC for that logic is much, much less than SLOC for code that works with _specific_ protocol buffers

b) "consumers" of protocol buffers as a tool/technology mostly don't worry about what's going on under the hood of the generic serialization, etc. code

So it will often make sense to to make that core generic logic even significantly more complicated if it means making the stuff that everyone has to write over and over again even a little bit easier.

mattnewton · on Dec 24, 2019

The author lost me at “make all fields required” for some nice type system properties.

required was a mistake, in my opinion and in the proto 3 spec’s opinion. Capn’ proto has a nice write up here too With essentially the same points I would make but written better: https://capnproto.org/faq.html#how-do-i-make-a-field-require...

I think protos might just be being used for the wrong thing in the author’s example. You shouldn’t replace your application’s data structures with protos everywhere, in my experience protobufs are for when you want to serialize and you write a bunch of backwards compatible serialization code by hand. This code is hard to generate because it encapsulates all the changing requirements needed to work across different versions, so the lack of general type system tools doesn’t really offer opportunity to cut down on the schlep. If you don’t have these problems, don’t think you will have these problems, evaluate whether the tech is right for you. I’ve worked on projects before at Google that have made this mistake and threw away the nice data model expressable in a language to use proto interfaces where there was no need for serialization. I don’t think the solution is to have protos expanded to be comparable to that in every language.

Disclaimer: Googler who is forced to use a lot of protos, my opinions are my own and and I didn’t design or ever work on them directly. Probably also just an amateur :D

davedx · on Dec 24, 2019

> I think protos might just be being used for the wrong thing in the author’s example. You shouldn’t replace your application’s data structures with protos everywhere

Yeah, I really scratched my head at that part. Even in my small line-of-business TypeScript apps, I often have separate interfaces for the "API models" (what I get as JSON from the API) and the "app models" (the models my app uses to represent the internal state of things).

In the big .NET platform I work on, we have multiple namespaces of Models and Dto's to represent different things at different interface boundaries.

Why on earth would you try to re-use a network-layer interface in levels several layers higher? That's crazy talk!

kentonv · on Dec 24, 2019

> Why on earth would you try to re-use a network-layer interface in levels several layers higher? That's crazy talk!

It's theoretically ugly but in practice it can be really convenient. Translating large structures between different formats is tedious and error-prone... sometimes it's just a lot faster and easier to leave it as a Protobuf.

joatmon-snoo · on Dec 24, 2019

Worth calling out that Google literally compiled a list of outages caused by `required` fields before killing that field parameter.

seriesf · on Dec 24, 2019

Yes but proto 3 was also a mistake. Throwing away presence entirely was wrong and not preserving unknowns was also wrong. Proto 2 forever, imho.

xyzzyz · on Dec 24, 2019

I agree. Preserving unknowns fortunately was recognized as mistake, and fixed in some more recent version, and presence can be worked around: it is still preserved for message types, so you only need to wrap your primitives into objects[1], Java style.

[1] - https://developers.google.com/protocol-buffers/docs/referenc...

lonelappde · on Dec 24, 2019

That's not "agree" with "not preserving unknowns was also wrong."

xyzzyz · on Dec 25, 2019

Sorry, I meant not preserving unknowns was also wrong. Proto2 did preserve them, then proto3 didn’t, and then they realized it was a bad idea and went back to preserving them.

abernard1 · on Dec 24, 2019

This gets to the article's issue but from the dynamic typing angle as well.

By not allowing presence checks, one has to use convention _in every single class_ to determine basic things like PATCH semantics (https://github.com/protocolbuffers/protobuf/issues/359). This makes it impossible to treat protobuf as a general data format and requires object-specific logic to properly composite data structures. In some cases it's impossible to even do PATCH correctly without excluding sentinel values from the allowed range and having the application developer know about it.

There are so many other problems with Proto v3, but this one is glaring.

rcfox · on Dec 24, 2019

It can be approximated with a single-item `oneof` field. It's ugly and boilerplatey, but at least it's binary-compatible with proto2 and gives the original behaviour.

My main problem with proto2 these days is that I needed to interface with some C# code, and there is no proto2 library for C#!

kentonv · on Dec 24, 2019

> there is no proto2 library for C#!

Looks like that's getting fixed!

https://github.com/protocolbuffers/protobuf/blob/master/docs...

https://github.com/protocolbuffers/protobuf/pulls?q=is%3Apr+...

mattnewton · on Dec 24, 2019

Personally I even like this more since this feels much more explicit. The idea of depending on being set to the null value or unset at all gets my spider sense shivering and shuddering; someone is going to miss this, probably me.

saberience · on Dec 24, 2019

Dude, there are tons of packages for doing Protobuf in C#, I've been writing systems in C# that used proto for over three years I think at this point. Here's one for starters: https://www.nuget.org/packages/protobuf-net

tlarkworthy · on Dec 24, 2019

This is the wrong argument. Who cares about the type system of a binary packaging format? The joy is how these messages can be used as rows in storage systems as well as RPC. Complicating the type system limits the domain applicability and increases the porting cost. No.

Protobuffers are shit coz they don't support zero copy and you have to deserialize the whole thing even if you are interested in one field or an outer envelope, causing memory churn in your JVMs. Cap'n'proto and flat buffers attack this real problem. The expressivity of the type system is a minor issue, hence no credible competition.

Note grpc abandoned required fields! Nothing is required over a decade, backward compatibility is important! Required should be enforced at application later not the binary packing layer. It is a property of the version of the code processing the blob, not the blob representation itself.

Ex googler and equally happy openAPI spec user.

mixedCase · on Dec 24, 2019

> Who cares about the type system of a binary packaging format?

The people who have to write them and map them to actual domain data structures. Monomorphizing by hand and working around oneof+repeated's crap is an absolute joke. Tools in widespread use should do better.

doublement · on Dec 24, 2019

This is going to sound sarcastic but it's not: Can we get back to just putting the members of C structures into network byte order and sending that over the wire in binary, à la 1995?

btown · on Dec 24, 2019

This is more or less what https://capnproto.org/ does.

davedx · on Dec 24, 2019

I think this is also what Rust's bincode crate does. It's very sane and you can just open up files in a hex editor and see what's there.

https://crates.io/crates/bincode

omginternets · on Dec 24, 2019

From the capnproto docs:

>Isn’t this all horribly insecure?

>No no no! To be clear, we’re NOT just casting a buffer pointer to a struct pointer and calling it a day.

Isn't this a direct contradiction to your claim? Or have I misunderstood them?

jtolmar · on Dec 24, 2019

IIRC: capnproto generates messages that you could deserialize by casting them to the right struct, but refrains from actually doing it that way. Instead it generates a bunch of accessor methods that parse the data, as if you were reading something that's not basically a c-struct, like a protobuff.

kentonv · on Dec 24, 2019

That's basically correct. Cap'n Proto generates classes with inline accessor methods that do roughly the same pointer arithmetic that the compiler would generate for struct access.

There's a couple subtle differences:

* The struct is allowed to be shorter than expected, in which case fields past the end are assumed to have their schema-defined default values. This is what allows you to add new fields over time while remaining forwards- and backwards-compatible.

* Pointers are in a non-native format. They are offset-based (rather than absolute) and contain some extra type information (such as the size of the target, needed for the previous point). Following a pointer requires validating it for security.

(Disclosure: I'm the author of Cap'n Proto.)

asveikau · on Dec 24, 2019

Re-read the comment I think. It doesn't say casting a struct pointer. It says putting the members of the struct into network byte order over the wire. I read that as individually serializing each member in a portable, safe way.

Anyway even if you do choose the struct pointer hack (which I do not see advocated here) it can be done relatively well albeit requiring language extensions and a bit of care. Pragmas and attributes to ensure zero padding and alignment between members. No pointer members. Checking sizes and offsets after a read (the hardest part).

Animats · on Dec 24, 2019

"As of this writing, Cap’n Proto has not undergone a security review, therefore we suggest caution when handling messages from untrusted sources."

Something like that has to be rigorously tested or proven to be free of buffer overflows. It's so easy to attack with malformed messages. Parsers for remote messages are a classic source of vulnerabilities. It's hard to test this, because it's a code generator.

This looks promising as an attack vector for a big system built on microservices. If you can find an exploit in this that lets you overwrite memory, and can break into some service of a set of microservices by other means, you can leverage that into a break-in of other services that thought their input was a trusted source.

The "zero overhead" claim goes away as soon as you send variable length items. Then there has to be some marshaling.

kentonv · on Dec 24, 2019

> As of this writing, Cap’n Proto has not undergone a security review

This is outdated, I should remove it. Cap'n Proto has been reviewed by multiple security experts, though not in a strictly formal setting. I trust it enough to rely on it for security in my own projects, but yeah, I am cautious about making promises to others...

> Something like that has to be rigorously tested or proven to be free of buffer overflows.

I've done a bunch of fuzz testing with AFL and by hand. I've also employed static analysis via template metaprogramming to catch some bugs. See:

https://capnproto.org/news/2015-03-02-security-advisory-and-...

(That was... almost five years ago.)

> The "zero overhead" claim goes away as soon as you send variable length items. Then there has to be some marshaling.

Space for messages is allocated in large blocks. The contents of the message are allocated sequentially in that space and constructed in-place. So once built, the message is already composed of a small number of contiguous memory segments (usually, one segment), which can then be written out easily. Or, if you're mmaping a file, you can have the blocks point directly into the memory-mapped space and avoid copying at all -- hence, zero-copy.

So no, there is no marshaling.

lidHanteyk · on Dec 24, 2019

Capn is better than C at struct layout. We are not, under any circumstances, going back to the 90s. We are moving forward and learning from mistakes.

salgernon · on Dec 24, 2019

I would like to submit apples archaic “Rez”[1] as a great language for declaring binary formats. It was designed to be able to describe c and pascal structures.

[1] http://preserve.mactech.com/articles/mactech/Vol.14/14.09/Re...

sagarm · on Dec 25, 2019

The wire encoding for protos is much more compact than the in-memory representation, especially for sparsely populated messages (very common especially in mature systems).

You'd still have to figure out some way to serialize nested messages. Note that you can have recursive message definitions.

rbanffy · on Dec 24, 2019

> Can we get back to just putting the members of C structures into network byte order and sending that over the wire in binary, à la 1995?

I hope not (I know you are being sarcastic). We should use something that's trivial to implement correctly, as well as easy to read and to debug.

mikece · on Dec 24, 2019

Is that less of a configuration mess than WCF was? JSON isn't "The Magical Elixir" of data exchange and I'm more than open to something better but at least we (in the .NET community) have moved past the WCF configuration nightmares.

bob1029 · on Dec 24, 2019

WCF is an unmitigated dumpster fire. We have actually written a non-WCF client that uses a raw HttpClient implementation with StringBuilder to compose SOAP envelopes around cached XMLSerializers in order to talk to other WCF services. First request delay went from 1-2 seconds down to a few milliseconds. Memory overhead is negligible now. Prior, you could watch task manager and immediately recognize when WCF is "warming up". Additionally, the XML serializer in .NET seems almost pathologically determined to ruin everything you seek to accomplish.

By comparison, JSON contracts are an absolute joy to work with. We still practice strong-typing on both sides of the wire (we control both ends), and have pretty much nothing to complain about. If you are concerned with space overhead w/ JSON, simply passing it through gzip can get you down to a very reasonable place for 99% of use cases. I understand that there are arguments to be made against JSON for extremely performance sensitive applications, but I would counter-argue that these are extremely rare in practice.

heavenlyblue · on Dec 26, 2019

Isn’t the problem with WCF rather than XML?

innagadadavida · on Dec 24, 2019

Protobufs have a pretty nice variable encoding integer wire format. This gives you the flexibility of saving space without doing compression.

While zero copy is nice, you cannot make it work when using compression.

rapsey · on Dec 24, 2019

That variable integer encoding is however slow for encoding/decoding. The space savings are of questionable worth.

seriesf · on Dec 24, 2019

They are the same size as UTF-8 numbers but much slower to decode. I think the more-bit format is the only glaring mistake in proto that can never be fixed.

pantalaimon · on Dec 24, 2019

This gets hairy the moment you want to add new fields.

CoolGuySteve · on Dec 24, 2019

Both protobuf and plain C structs are append only formats if you put the message-type and size at the start of the C-struct.

marcan_42 · on Dec 24, 2019

C structs do not compose extensively. Protobufs do. You can't put variable length data into a struct, and hence you can't put extensible structs into it either.

doublement · on Dec 24, 2019

You can put variable length data into a struct:

https://en.wikipedia.org/wiki/Flexible_array_member

xyzzyz · on Dec 24, 2019

Only one field can be variable length, and it must be last. I'll pass.

CoolGuySteve · on Dec 24, 2019

You definitely can but it's not as obvious, make a separate message type for list elements and append them on the wire. If you only have one list at the tail, you can use a flexible array[] at the end but it's finicky to deal with if you need more than one.

You can build large hierarchical structures of messages with lists contained therein. It's pretty much how .mov/.mp4/many, many media container formats work. The technique dates back to the Amiga days.

caffeine · on Dec 24, 2019

https://github.com/real-logic/simple-binary-encoding is a good way to do this

izacus · on Dec 24, 2019

This is practically exactly what Protobuffers are. Except that they actually are defined clearly enough for multiple services written in multiple languages can work with them.

CoolGuySteve · on Dec 24, 2019

Definitely not, protobuf's strange wire format becomes apparent if you ever look at the hexdump of one or the profiler output of your favourite protobuffer-decoding C/C++ application.

They're actually kind of performance heavy for no benefit.

imtringued · on Dec 24, 2019

I have once looked at a benchmark that compared protobuffer, message pack, json and a variety of other serialization formats. In terms of reducing bytes per message gzipped json was ahead of all of them at the cost of increased CPU time for gzip. Protobuffer did pretty poorly, the only benefit was decreased CPU usage. I'm sure you could use some other compression algorithm like LZMA to get both good compression and good performance for JSON messages.

kentonv · on Dec 24, 2019

> In terms of reducing bytes per message gzipped json was ahead of all of them

Try gzipping the protobuf. Binary encoding and compression are different things which can be stacked. Gzipped protobuf should be smaller and faster than gzipped json in basically all cases.

CoolGuySteve · on Dec 24, 2019

I use LZ4 (with "best" compression) for packet captures and replay with great results.

I get about a 37% compression ratio with extremely fast decoding, like 10 million packets per second off an SSD.

It was better than snappy, gzip, and bz2 for the trade-off of compression time, decompression time and file size.

As for protobuf: flatbuffers, capn proto, HDF5, and plain C structs all deliver much, much faster decoding time. It's really not the best answer for any serialization at this point but it's still inexplicably popular.

touisteur · on Dec 24, 2019

Where you thinking of https://www.lucidchart.com/techblog/2019/12/06/json-compress... ?

cma · on Dec 24, 2019

I thought that's what capt'n proto was, not protobuffers.

grandmczeb · on Dec 24, 2019

> This is practically exactly what Protobuffers are.

Not really. Encoding/decoding protobufs is straightforward, but not nearly as simple as you’re suggesting.

izacus · on Dec 24, 2019

Sure, but anything you're trying to transport between languages which don't even agree on endianess will end up like this.

Dumping a struct on a wire is just a wishful dream that turns into a nightmare as soon as you need to send that to a service written in another language or running on another architecture.

Don't get me wrong - there's plenty of insanity in protobufs. But trying to cover the same use-case will not create a simple protocol.

grandmczeb · on Dec 24, 2019

Cap'n proto didn't "end up like this" and works across languages.

heavenlyblue · on Dec 26, 2019

Cap’n’proto isn’t well supported apart from C or Rust.

Python library is an absolute nightmare. Their tests used to catch Exception, and what they ended up testing was basically whether their test try to access nonexistant attributes.

The issue is that capnproto is relatively more complex, and as such is harder to implement well.

ping_pong · on Dec 24, 2019

XDR and ONC/RPC for the win!

nabla9 · on Dec 24, 2019

The memory layout of a C struct is ABI and compiler dependent.

Some compilers conform to same ABI in same system or similar system and work almost exactly the same, so you may grow old thinking that's how it is until it's too late. I think gcc, clang and Intel work almost the same in Linux and OSX.

doublement · on Dec 24, 2019

Indeed, that's why I specified putting the members of the C structure on the wire, not the structure as a whole, so it's just basic types in network byte order (i.e. consistent endian-ness) being sent.

itronitron · on Dec 24, 2019

I've worked on an application where that was the standard data transfer scheme, and then while working with protobuf on another project felt that after looking under protobuf's covers it was doing something very similar but wrapping an entire API around it.

CoolGuySteve · on Dec 24, 2019

No, not really. #pragma pack and/or __attribute__((packed)) have been supported for eons now and guarantee the alignment of struct members between compilers.

In newer C++ specs, you can also static assert that the struct is a POD type to statically ensure that there's no accidental vtable pointer.

This argument pops up every time someone mentions this and every time it's completely uninformed.

cyphar · on Dec 24, 2019

Though it should be noted that packed structures cause compilers to produce absolutely garbage code when accessing them (because most of the accesses become unaligned) and it becomes incredibly memory-unsafe (as in "your program may crash or corrupt memory") to take pointers of fields inside the struct because they are (usually) presumed to be aligned by the compiler.

Explicit alignment doesn't suffer from this problem nearly as badly (yeah, you might have to add some padding but that's hardly the end of the world -- and if you have explicit padding fields you can reuse them in the future).

amacbride · on Dec 24, 2019

XDR FTW!

BubRoss · on Dec 24, 2019

Why even put them in network byte order? Every modern system is little endian, if you standardize on that, only exotic systems would have to deserialize anything.

syncsynchalt · on Dec 24, 2019

If you force the most common system to translate byte order, then you'll have some confidence that your code is performing the translation correctly. If instead you rely on hoping that everyone added the correct no-op translation calls everywhere, you'll find your code doesn't work as soon as you port it to another CPU.

This is a nice side effect of network byte order being the opposite of the dominant cpu order, though obviously it was never intended.

doublement · on Dec 24, 2019

Because when someone builds a hugely popular exotic system in the future, because it is one (1) cent cheaper, you'd end up with code that has to check to see if it's running on such a system.

BubRoss · on Dec 24, 2019

This doesn't make any sense for multiple reasons, but especially because you wouldn't be checking anything in the first place. A big endian system would would reorder bytes and a little endian system would just use it directly from memory without another copy or reordering anything.

toast0 · on Dec 24, 2019

There's not a library pattern for host to little endian, or little endian to host, like we have with hton and ntoh. Which makes it more likely to be messed up.

bserfaty · on Dec 24, 2019

Ha - I just had this exact argument yesterday. Why indeed?

m0th87 · on Dec 24, 2019

I maintain a couple of protobuf-based libraries, and the issue I've seen with its anemic type system is that it inevitably creates an impedance mismatch between itself and the host language's type system. To make the library usable, you end up having to wrap the autogenerated code in a bunch of boilerplate, which defeats one of the major selling points of gRPC/protobuf in the first place.

apeace · on Dec 24, 2019

Exactly. I use Typescript with strict null checks. I would love to directly use the Proto objects I get from gRPC calls, but since the type of every string field is actually string|null, I have to do some validation and then turn it into an object with a regular string field (or else check that it’s not null every time I use it).

I get that forcing this validation is a good thing, especially for bigger/distributed teams. But I’m the guy who wrote the backend and I know it will return an error if it cannot set some value in that string field. Therefore I consider it “required”, and I resent the Protobuf authors’ insistence that I am wrong to ever use that concept.

kentonv · on Dec 24, 2019

That sounds like a quirk (I'd argue, a misfeature) of the specific Protobuf implementation you are using.

In most implementations, the getter methods for optional fields will return a default value if the field isn't set. If you want to explicitly check for presence, you call a separate "has" method.

There are several reasons for this design:

* Convenience of avoiding null checks when you know the field is always set.

* (Sometimes) Easy backwards-compatibility -- new fields can be declared with a default value that is appropriate when dealing with older senders.

* Security: It shouldn't be trivially easy for a malicious client to omit a field that the server is expecting will be there, causing the server to crash or throw an exception.

The C++ and Java implementations of Protobuf, at least, have always worked this way. It sounds like the TypeScript implementation you are using does not, unfortunately.

(Dislosure: I wrote the C++ and Java implementations of proto2.)

yegle · on Dec 24, 2019

If you are looking for 0 copy alternative, look at https://google.github.io/flatbuffers/

kortex · on Dec 24, 2019

Do you know if flatbuffers can be used as data structures in applications? That seems to be a shortcoming of protobuf, all the ser/de code makes it suboptimal for in-app data transport.

kentonv · on Dec 24, 2019

Zero-copy formats usually turn out to be worse for use as in-app data structures, because they have to carefully control memory allocation and layout in a way that makes it hard to mutate an already-constructed message.

E.g. in Cap'n Proto, all objects in a message are allocated sequentially within the larger message buffer. If you remove an object or change its size (e.g. overwrite a string field with a new value of different length), the new value needs to be allocated on the end and the memory space for the old value cannot be reused -- it is wasted.

I'm not super-familiar with FlatBuffers, but I believe it uses a model where messages must strictly be constructed in bottom-up order, such that all pointers point in the same direction. This seems to imply that you can't modify a message at all after construction, but I haven't actually played with it so I could be mistaken.

(I'm the author of Cap'n Proto.)

kortex · on Dec 30, 2019

> Zero-copy formats usually turn out to be worse for use as in-app data structures, because they have to carefully control memory allocation and layout in a way that makes it hard to mutate an already-constructed message.

Oh, I never thought of it that way, but that makes perfect sense. I guess I just assumed messages would be collections of pointers and buffers living "somewhere" in memory, but of course the actual layout can make a ton of difference.

I guess there is an implicit rule that if you are dealing with inbound structures in a read-only fashion, passing around the serial structure is OK for when the field access cost is minor compared to the copy cost, but if you want to mutate it, or doing lots of access operations where that isn't trivial, it makes sense to copy into your own data structure.

memling · on Dec 24, 2019

> Protobuffers are shit coz they don't support zero copy and you have to deserialize the whole thing even if you are interested in one field or an outer envelope, causing memory churn in your JVMs. Cap'n'proto and flat buffers attack this real problem. The expressivity of the type system is a minor issue, hence no credible competition.

I'm not terribly familiar with protobuffers, but I'm a little surprised by this. Some ASN.1 encodings (BER, CER, DER), by contrast, use nested tag-length-value triads. This allows you to skip parts of the message that aren't interesting. (This is, by the way, not that uncommon.)

> Required should be enforced at application later not the binary packing layer. It is a property of the version of the code processing the blob, not the blob representation itself.

This might depend a bit. Suppose you wanted to make sure that messages could be read (if not exactly decoded) without a schema. The structure of the message could, in principle, include this information in the form of bit-field preambles that indicate the number of fields, extensibility, and so forth.

I don't suppose that's strictly necessary for most applications: embedding message structure in message content seems like a bit of an anti-pattern, but I bet you could come up with a use case that makes sense in some context.

kentonv · on Dec 24, 2019

> nested tag-length-value triads. This allows you to skip parts of the message that aren't interesting.

Protobuf does that too. But in order to seek horizontally through an array, you need to inspect the tag/length of each element in order to skip to the next element. So you can't rapidly seek through a massive array.

In order to allow such seeking, you either need statically-sized elements (which is often too restrictive), or you need pointers to data represented out-of-line. Cap'n Proto uses pointers. Using pointers is pretty uncommon among serialization formats, which is strange considering how ubiquitous they are for in-memory data structures.

(Disclosure: I'm the author of Cap'n Proto, and also Protobuf v2.)

cryptonector · on Dec 24, 2019

"ASN.1/DER sucks -- everyone knows that. Let's build a new, better thing. How shall we encode things? Oh oh oh! I know! Let's have a type tag, a length, and a value encoding. Perfect! So so much better than ASN.1/DER!!!"

Those who don't study the past...

Look, ASN.1 is a crappy (but not awful!) schema language, but it supports many many encodings, and some of them are dumb, stupid, and bad, like BER, DER, and CER, and some are clever (PER), and some are awesome (OER), and it even supports things like XML (XER) and even JSON. So what's so bad about ASN.1? Not much, really, just that the first generation of encoding rules (BER/DER/CER) for it were.

But no, people don't look. They jump without looking, and then they re-create things, and do it badly.

The only thing since ASN.1 that doesn't suck is XDR. That's because XDR very much resembles ASN.1's PER/OER, but with 4-byte units, so XDR is very ergonomic. EDIT: I should also mention flatbuffers as not sucking.

kentonv · on Dec 24, 2019

> Look, ASN.1 is a crappy (but not awful!) schema language,

TBH I think it's awful. The type system is woefully overcomplicated, and the syntax is totally unrelated to any popular programming language, making it hard to learn.

The fact that there are so many encodings creates confusion for the average developer who frankly cares much more about the schema language accessibility, tooling, API, language support, and documentation than the actual encoding. Protobuf is way, way ahead on all of those compared to ASN.1.

> The only thing since ASN.1 that doesn't suck is XDR.

Sun XDR, from the 80's? Is that actually newer than ASN.1?

cryptonector · on Dec 24, 2019

> > Look, ASN.1 is a crappy (but not awful!) schema language,

> TBH I think it's awful. The type system is woefully overcomplicated, and the syntax is totally unrelated to any popular programming language, making it hard to learn.

Is it? How? What is an alternative you like better?

> > The only thing since ASN.1 that doesn't suck is XDR.

> Sun XDR, from the 80's? Is that actually newer than ASN.1?

The first ASN.1 specs are from 1984 (I guess it goes back a bit further). XDR/NFS are from 1986-1987. ASN.1 probably did not inform XDR in the least. Other RPC technologies from that time probably did (thinking of Apollo and such). What's interesting is that even if XDR is completely uninformed by ASN.1, it's essentially a subset of ASN.1 with different syntax and a PER-like encoding with 4-octet unit size and alignment -- the similarities are striking! And even more interesting is that the ASN.1 crowd in 1984 felt that TLV encodings were easy and non-TLV encodings like PER difficult, but XDR shows that non-TLV is not very difficult at all.

The lesson of the ASN.1 experience is that TLV == bad, and PER-like == good, though flatbuffers is probably the best. And more than that, the real lesson is that open source tooling is essential. It took too many decades for ASN.1 to have decent open source tooling.

Another lesson of the ASN.1 experience is that non-free standards really suck for pervasive and essential technologies. It took way to long for the ITU-T to make the ASN.1 specs available for free downloads. Now, I do understand that the ASN.1 specs are extremely well-written -- it's clear that it cost quite a lot of money to produce them, and somehow that has to be paid for, for the IETF model is much more accessible, and that is much more important than the high quality of specifications that the ITU-T is able to produce (much better than IETF RFCs, IMO).

kentonv · on Dec 24, 2019

> Is it? How?

There's like a million built-in types and too many unnecessary options for specifying constraints.

> What is an alternative you like better?

I think Protobuf has a simpler, more practical type system and more accessible syntax compared to ASN.1.

Cap'n Proto is very close to Protobuf in terms of type system and syntax, but adds some polish on both. (But Cap'n Proto is my own design, so obviously I think it's the best.)

In terms of encoding, Cap'n Proto is, of course, completely different from Protobuf. I guess it is closer to PER and OER... but not particularly close.

cryptonector · on Dec 24, 2019

Protobufs is a TLV encoding no better than DER. If it's just the syntax, then it's not much of an upgrade. Syntax is not a big deal, but semantics is, and there hasn't been much, if anything at all that's new semantics-wise since ASN.1.

(Of course, that the ASN.1 syntax is difficult to parse with LALR(1) parsers is a problem. But the syntax doesn't have to change much to be easier to parse.)

You don't have to know about and use all the built-in universal types in ASN.1 -- they're there if you need them.

Protobufs is just a history-repetition disaster.

IDK about Cap'n Proto, but I'm glad it's closer to PER/OER, if it is.

kentonv · on Dec 24, 2019

I'm saying both the syntax and semantics (i.e. type system design) of protobuf are much superior to ASN.1 (mostly, by being much simpler and easier to understand).

I believe syntax and type system are, in fact, much more important than the encoding details. To most application developers, there is no difference whatsoever between protobuf encoding, BER, DER, PER, OER, etc., because they never see the encoding. The library and tools handle that part. As long as the data gets through end-to-end with acceptable performance, nobody cares how it is represented in the interim.

Cap'n Proto's encoding is different in that by being zero-copy it actually enables new use cases, like mmap()ing a very large file for random access. Still, I'd certainly choose protobuf's syntax and encoding over ASN.1's syntax paired with Cap'n Proto's encoding.

seriesf · on Dec 24, 2019

You can just skip messages, strings, and byte arrays you don’t need or can’t decode. They are length-prefixed. Also there’s nothing about proto that prevents it from being aliased to its network buffers (zero copy). The GPs complaint stems from their own ignorance, nothing to do with the nature of protos.

tlarkworthy · on Dec 26, 2019

https://github.com/protocolbuffers/protobuf/issues/1896

Ok some of the things I said couldn't be done, can be done, but it's situational

erik_seaberg · on Dec 24, 2019

> these messages can be used as rows in storage systems

Storage needs a strong schema language, much more so than RPC does, because every mistake becomes permanent. A schema should only allow messages that make sense within the problem domain, and Maguire is right that protobuf is not good at this.

ves · on Dec 24, 2019

I agree with a lot of this post, although the tone isn’t great. The problems we ran into with protobufs at my job include:

1. The schema evolution claims don’t really hold water for our systems.

2. The type system isn’t very expressive (e.g. no generics means you have to write the same error wrapper for all your endpoints) and lots of our devs found it unintuitive, especially oneofs.

3. The “default value”/nullable field feature turns out to be a recipe for postmortems and data quality degradation. Making everything nullable isn’t good.

4. The python library doesn’t have mypy typing and the generated objects aren’t... super pythonic.

I (along with some colleagues) built a library to paper over protobuf and address these issues. Notably, it includes a very well-specified algorithm to automatically assign version numbers to schemas during development, as well as provide operational instructions to avoid bumping a version without causing downtime if possible. And all the codegenned models have mypy types!

You can read more about it here:

https://tech.affirm.com/defining-data-models-with-idol-a3109...

It’s so far turned out really really well for us.

In particular, “schema evolution” is a property of a particular distributed system and there aren’t universally safe rules; schemas for historical machine learning datasets and rpc services, say, have to evolve differently cos the data flow is different. Also, there’s no version bumping algorithm built in, and nullable/optional fields are a pain to program against for data scientists and client devs alike.

kortex · on Dec 24, 2019

You can generate stubs for mypy with this tool but yes this should be something supported out of the box.

https://github.com/dropbox/mypy-protobuf

GeneralMayhem · on Dec 24, 2019

re: (3) - nullability is more or less required for backwards compatibility. If you have existing data and add a new field going forward, your options are to make the old data invalid until you backfill, or give your code a way to detect "this field doesn't exist" and deal with it accordingly.

ves · on Dec 25, 2019

I opted to go for “pinning” based on the version number, so if you make a breaking change, like adding a required field, IDOL copies your schema into a v2 (say) namespace and then applies the change, leaving v1 untouched.

At this point we just have separate types for separate versions and tools in the host language can help you deal with that.

This turns out to be much better for data quality and client code than adding lots of nullable fields, at the cost of making breaking changes to APIs a bit more work. It seems to have been worth it so far.

Going forward, the service author has to support the “old” versions until we can determine that there’s no old data sitting around (so all clients are on the new version, all serialized data has been backfilled or dropped, or whatever’s appropriate), at which point they can delete the old schema. And we have some simple tools to verify this, since we stick the version number onto the models / serialized data.

vivekseth · on Dec 24, 2019

Couldn’t find IDOL on Affirm’s github. Is any of it open source? I’d love to take a look if so.

ves · on Dec 25, 2019

It isn’t... yet!

it would be a bit of work to open source it (rip out any lingering affirm bits, spruce it up some) and my team is real pressed for time at the moment.

but I definitely want to do it sooner rather than later!

hardwaresofton · on Dec 24, 2019

I finally feel safe to suggest that I think the cargo-culting of gRPC on to projects these days is also wrong. One of the best (and to be fair, worst) parts about http is it's flexibility, and it's like people just completely skipped over `Content-Type` and other simple options.

Throwing out standards-compliant HTTP (whether 1,2 or 3) with the bathwater that is JSON decoding was a mistake. JSON + jsonschema + swagger/hyperschema should be good enough for most projects, and for those where it isn't good enough, swap out the content type (but keep the right annotations) and call it a day! Use Avro, use capnproto, use whatever without tying yourself into the grpc ecosystem.

Maybe gRPC's biggest contribution is the more polished and coherent tooling -- in combining three solutions (schema enforcement, binary representation and convenient client/server code generation), they've created something that's just easier to use. I personally would have preferred this effort to go towards tools that work on standards-compliant HTTP1/2/3.

lkramer · on Dec 24, 2019

I'm not necessarily saying gRPC is the solution to everything, but I don't see why HTTP is so great? It's a protocol for transferring, primarily text over networks. Most backend systems operates in binary, so serializing binary data into a text format seems to be unnecessary overhead.

unlinked_dll · on Dec 24, 2019

One pro of HTTP is that the methods are barebones and error codes standardized, while there are plenty of battle tested front ends for your tx/rx endpoints that might touch the service. Basically works everywhere.

The con is that you can do that with the protocol of your choice directly and you don't need to bolt HTTP to whatever you're building.

cogman10 · on Dec 24, 2019

gRPC works over HTTP.

That said, the http body and response are perfectly fine being binary. It's only the headers that are text based (in http 1. Http 2 turns those headers into binary as well.)

rapsey · on Dec 24, 2019

It is great because it has quality implementations in every language. Much like protobuffs.

hinkley · on Dec 24, 2019

HTTP also has a vast range of proxies, transport encodings, cryptographic layers, solutions for client/server clock skew, tracing and a whole bunch of other things like rerouting and aliasing baked in.

CuriouslyC · on Dec 24, 2019

The processor usage of serialization is almost never the bottleneck, usually it's bandwidth. Despite that, unless you're sending floats or large integers over the wire, the difference probably isn't usually worth the engineering investment over gzipped json until you're "web-scale".

imtringued · on Dec 24, 2019

>It's a protocol for transferring, primarily text over networks.

Who told you that? You can specify an arbitrary content type. Not just text.

barnabee · on Dec 24, 2019

Whilst I tend to agree, the fact that a gRPC service is very unlikely to be designed to be ‘RESTful’ to the point of obtuseness is a huge plus. It might not be the best tool for the job but it’s a lot better than the other most cargo-culted option.

_skel · on Dec 24, 2019

The biggest problem with HTTP is the way developers tie themselves into knots with their HTTP clients. I've seen a lot of bad decisions, including nonsensical timeout and retry logic, nonstandard use of headers, bodies on GET requests, query strings over a megabyte in size, and performance bottlenecks caused by manual management of HTTP connections and threads.

The biggest advantage of an RPC is that it takes most of that out of the hands of the developers. Developers can just focus on business logic and leave the connection and request management to the standard library.

dub · on Dec 24, 2019

gRPC is literally just calling conventions with HTTP2/3

For people who prefer JSON to protobuf, gRPC is serialization-agnostic. For folks who prefer REST verbs to gRPC methods, proto3 has native support for encoding REST mappings and tools like envoy and grpc web can do the REST <-> gRPC proxy translation automatically

skybrian · on Dec 24, 2019

A good research paper would first explain what the protobuffer design goals are before explaining why they are misguided, inapplicable, or aren't achieved. But I guess this is just a blog post.

As it is, it's unclear whether the author of the blog post even understood the reasons behind protobuffer design decisions.

Ozzie_osman · on Dec 24, 2019

100%. But it's easier to just say "it was designed by amateurs" than actually explore why design decisions were made in a particular way, so shrug.

izacus · on Dec 24, 2019

I'm a bit confused about the type system rant - and someone correct me if I'm wrong: The whole point of protobuffs is that they're easily usable in multiple programming languages, so it seems to me that they kinda have to end up being the smallest common subset of typing features. If you try to do it strongly, they'll be hard to use in some languages (e.g. Java, the favorite beating horse of the OP and other language purists) or they'd have to restrict the amount of programming language targets.

Where am I wrong?

tjpnz · on Dec 24, 2019

>The whole point of protobuffs is that they're easily usable in multiple programming languages,

While that's true in theory there are issues in practice. It's especially true if you've had the misfortune of working with the Python or PHP compilers. Documentation isn't particularly great and I do recall a time when the Python compiler was generating code that was broken and required manual tweaks. Again in the case of Python things go even further downhill if you're trying to get everything working in a Docker container.

Things are of course significantly better if you're working in Go or Java.

kortex · on Dec 24, 2019

Python3 relative import syntax was flat out broken for years [0][1]. It's workable finally (not sure which PR fixed it) but it's still a monster to try to get protoc/grpc plugin to emit _pb2.py files which a) have correct import syntax b) have a top-level package name c) are readily packageable for pip install d) do all the above in a reproducible and uninstallable manner e) also be able to import 3rd party protos f) all the above without any post-processing.

Like, yes python constrains package names to the folder names. But why not check a flag so I can let the emitted python structure dominate the folder structure? Or, easier, just let me ignore the folder structure and specify a dang top-level-package in the emitted _pb2.py and let me wrangle it with setup.py?

[0] - https://github.com/protocolbuffers/protobuf/issues/90

[1] - https://github.com/protocolbuffers/protobuf/issues/1491

sosodev · on Dec 24, 2019

I had the displeasure of working with gRPC and Python as an intern. I ended up writing a makefile that would generate the files and then immediately run sed on them to fix their imports. It felt like a terrible hack and I hated having to do it.

What’s worse is my overall task was to come up with ways of doing this type of thing reproducibly in a bunch of different languages that had poor gRPC support so that the team could distribute consistent (and verifiably working) API bindings to other teams. At that point it felt like we probably should have conceded that gRPC sucks and not used it at all. I’m 99% sure it was just resume driven development by the lead dev.

zomglings · on Dec 25, 2019

Yep, that make + sed combination is exactly what I do every time I use protobuf in Python.

ves · on Dec 24, 2019

You could have the type model mentioned in this post in nearly all languages as well.

izacus · on Dec 24, 2019

Yes, you could and that would certanly improve things (I'm not a fan of these restrictions either).

But you'd still have a rather Java-ish type system right?

colanderman · on Dec 24, 2019

Don't forget also the Protobuf C++ compiler's failure to properly namespace user-level identifiers vs. library-level identifiers.

For example, if your Protobuf has both a "foo" and "has_foo" field (which is perfectly legal by the Protobuf language definition! and works fine with e.g. the Python binding!), you will get a C++ compiler error due to a "has_foo()" method being generated on behalf of both "foo" and "has_foo".

This naming clash could have been avoided simply by prepending all generated method names with a defined prefix, but the implementors either didn't recognize this issue, or chose not to do anything about it.

(Everything else in the article rings true for me. I've been hoping years for someone to write this article.)

kentonv · on Dec 24, 2019

Yes, we were very much aware of this, and chose not to do anything about it.

This problem almost never manifests in practice. The issue is raised all the time, but it's basically always observed only as a theoretical problem (by someone who invariably thinks they are sooooo smart for discovering it), not as a real problem preventing compilation of a real schema.

Prepending all generated method names with a prefix would be a rather extreme solution that no one would like. Have you ever tried to read libstdc++'s STL implementation, where absolutely everything is prefixed with __? It's really quite awful. I wouldn't want to use a serialization framework that did that.

The right solution, in my opinion, is to provide annotations that allow the developer to rename a particular field for the purpose of a particular target language, so that e.g. you can say that "has_foo" should be renamed to "has_foo_" (or whatever) in C++ generated code. Yeah, it's an ugly hack, but it gets the job done.

I can't remember if this ever got implemented in Protobuf, because, again, it's almost never actually needed. Cap'n Proto does have such annotations, though.

(Disclosure: I'm the author of Protobuf v2 and Cap'n Proto.)

colanderman · on Dec 25, 2019

> (by someone who invariably thinks they are sooooo smart for discovering it), not as a real problem preventing compilation of a real schema

That's a pretty dismissive view of your users. This has actually bitten me in practice, so consider their foresight vindicated.

(Notably, it was actually the inability to easily distinguish between a missing and empty array, which caused us to resort to using "has_foo" fields, only later to hit the issue with the C++ compiler.)

If you dismiss this as a valid concern, how can I be confident that there are not other similar issues you simply dismissed as unimportant?

Say what you will about STL, but the level of attention to detail there assures me that I'm not likely to get bitten by some weird issue the developers chose to turn a blind eye to.

> you can say that "has_foo" should be renamed to "has_foo_" (or whatever) in C++ generated code.

This is fine, even if it's a transformation predetermined by the language.

kentonv · on Dec 26, 2019

Sorry for the snark. This issue is a sore spot for me because so many people have reported it without having actually been affected by it, and because they tend to assume the designers were stupidly unaware of the issue, rather than that the issue is actually rather hard to solve in a satisfying way.

However, if you actually were affected by it, then you are right to be annoyed by it.

The particular case where someone developed a protocol mostly in one language and then later on started targeting a new language is indeed a case that I do worry about. The idea of language-specific annotations defining language-specific renames was designed for that use case.

I haven't worked on Protobuf in almost a decade, but Cap'n Proto does address this issue as I said -- without making everyone's code horribly ugly.

xyzzyz · on Dec 24, 2019

> This naming clash could have been avoided simply by prepending all generated method names with a defined prefix, but the implementors either didn't recognize this issue, or chose not to do anything about it.

I'd much rather live with not being able to have fields name "has_foo" in my protobuf than to have to prepend a prefix to every single access method.

colanderman · on Dec 24, 2019

While arbitrary, that restriction would be fine if it were part of the language definition, and enforced by the Protobuf compiler.

(I say "arbitrary" because "has_" is just the nomenclature the C++ bindings happen to use. The syntactic peculiarities of other host languages may dictate a different prefix. Which then forces the question of whether to amend the list of "prohibited" field names, and potentially break existing code.)

Regardless, the restriction you propose is not currently (to my knowledge) part of the language, so you can get into the situation where you've been developing with Protobufs in Python for years, and then decide to add some C++ code, and everything breaks because now you have naming clashes which force you to rename the field and go through and edit all the existing use sites of said field.

jonbronson · on Dec 24, 2019

"The solution is as follows:

Make all fields in a message required. This makes messages product types."

Except it also breaks backwards compatibility, one of the most powerful and sought-after features of protobufs.

naasking · on Dec 24, 2019

> Except it also breaks backwards compatibility, one of the most powerful and sought-after features of protobufs.

It doesn't have to. Just add row types to handle unknown content, ie. if an intermediary knows only of fields foo and bar, then they can process any data with such fields if given a type like "type SomeRecord = { foo : int, bar : string | r }", where 'r' represents the remainder of the record.

The article's criticisms are valid and there are typed solutions to most of the objections that have been raised against it.

SpicyLemonZest · on Dec 24, 2019

I'm not sure that's simple enough to be a "just", but in any case the primary problem is the other direction. If I add `required baz: int` to my service's definition of a protobuf, all protobufs that have ever been generated before become invalid because they don't contain a value for baz.

naasking · on Dec 24, 2019

That fact doesn't change if you eschew types. Backward-compatible schema evolution has rules.

SpicyLemonZest · on Dec 24, 2019

Right, that's the point. The article's suggestion to "make all fields in a message required" fundamentally misunderstands the issues at hand, because no matter how appealing it is from a type theory perspective, following that suggestion would make it impossible to ever add a field in a backwards compatible manner.

naasking · on Dec 24, 2019

> The article's suggestion to "make all fields in a message required" fundamentally misunderstands the issues at hand, because no matter how appealing it is from a type theory perspective, following that suggestion would make it impossible to ever add a field in a backwards compatible manner.

You absolutely could in multiple ways:

1. You make every accepted product type have a row type at your service interface if you expect schema evolution.

2. If you have to add a field unexpectedly, ie. where you did not have a row type, then you must deprecate the old API. If this seems onerous to you, then your service infrastructure is probably insufficiently flexible.

SpicyLemonZest · on Dec 25, 2019

Option 1 seems like it defeats the point. If you're going to declare a field with a more permissive type than currently allowed, aren't you just hacking weak types back into your strong type system?

Option 2... look. I've seen a lot of API deprecations, across multiple teams in multiple companies, and every one of them was very onerous in ways that had little to do with the service infrastructure. If you've done easy API deprecations, more power to you, but I don't think your experience is representative.

humbledrone · on Dec 24, 2019

Protocol buffers already do that; serialized fields that are not recognized by an older message definition are parsed and can be accessed via the "unknown fields" API, exactly as "r" above. Intermediaries can pass these through trivially, or inspect them to see what they didn't understand.

The problem with making fields required is that older serialized protocol buffers parsed by newer message definitions may be missing newly added required fields, which will break things.

naasking · on Dec 24, 2019

Protobuf does not do this via a typed interface, but via runtime checking.

joshuamorton · on Dec 25, 2019

You can't statically typecheck deserialized data. You must validate that deserislized value matches the schema, and you can only do so at runtime.

In other words, proto has a typed interface, but you must runtime check that a given bag of bytes conforms to that typed interface.

This is true for any io.

naasking · on Dec 25, 2019

> You must validate that deserislized value matches the schema, and you can only do so at runtime

I assume you mean serialised data, not deserialized. And yes, deserializing includes type checking. The point is that this happens once and the need for a separate API for dynamic data shouldn't be needed.

joshuamorton · on Dec 25, 2019

What do you mean by a separate api for dynamic data?

The data under discussion isn't "dynamic", it's still static, it just isn't known to the schema in question at runtime (since it's only known to a different schema). That means you can't access it by name, since the field names aren't known.

nabla9 · on Dec 24, 2019

The lesson is: when you start wrong, you stay wrong.

Ozzie_osman · on Dec 24, 2019

Unfortunately I was turned off by the angry and obnoxious tone. Seems to be getting more common to get traction on HN homepage. But yeah, even though author makes some good points, the argument loses effectiveness in my book because of things like calling people amateurs.

mikestew · on Dec 24, 2019

The angry, pissed-off coder rant is occasionally pulled off well, but in general it grew tiresome for me fifteen years ago. Not everyone is Hunter S Thompson (well, no one is, now), and not every technical annoyance is the Kentucky Derby and thus worthy of such treatment.

To this day, I’ll still forgive a well-crafted MongoDB rant, though.

evmar · on Dec 24, 2019

Previous discussion, leading with a comment from one of the protobuf authors: https://news.ycombinator.com/item?id=18188519

finnthehuman · on Dec 24, 2019

The way dweis never responded to batmansmk was disappointing to say the least.

kentonv was willing to engage on the points and came out much more reasonable in the whole thing.

joshuamorton · on Dec 25, 2019

dweis isn't a designer, so he'd be the wrong person to answer those things.

Sanjay, Jeff, and Kenton are probably the three best to answer such questions.

Presumably the top few concerns for protos are wire performance (decode/encode speed and cost, wire size), compatibility for changes (what this suggestion just totally breaks), and cross language usability.

Some other tradeoffs might be non-wire perf (I believe protos beat flatbuffers here, at the cost of worse on wire perf), but it's not clear that that was intentional.

kentonv · on Dec 24, 2019

I guess I'll copy/paste the comment I made last time this was posted: https://news.ycombinator.com/item?id=18190005

--------

Hello. I didn't invent Protocol Buffers, but I did write version 2 and was responsible for open sourcing it. I believe I am the author of the "manifesto" entitled "required considered harmful" mentioned in the footnote. Note that I mostly haven't touched Protobufs since I left Google in early 2013, but I have created Cap'n Proto since then, which I imagine this guy would criticize in similar ways.

This article appears to be written by a programming language design theorist who, unfortunately, does not understand (or, perhaps, does not value) practical software engineering. Type theory is a lot of fun to think about, but being simple and elegant from a type theory perspective does not necessarily translate to real value in real systems. Protobuf has undoubtedly, empirically proven its real value in real systems, despite its admittedly large number of warts.

The main thing that the author of this article does not seem to understand -- and, indeed, many PL theorists seem to miss -- is that the main challenge in real-world software engineering is not writing code but changing code once it is written and deployed. In general, type systems can be both helpful and harmful when it comes to changing code -- type systems are invaluable for detecting problems introduced by a change, but an overly-rigid type system can be a hindrance if it means common types of changes are difficult to make.

This is especially true when it comes to protocols, because in a distributed system, you cannot update both sides of a protocol simultaneously. I have found that type theorists tend to promote "version negotiation" schemes where the two sides agree on one rigid protocol to follow, but this is extremely painful in practice: you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code. Inevitably, developers are pushed towards hacks in order to avoid protocol changes, which makes things worse.

I don't have time to address all the author's points, so let me choose a few that I think are representative of the misunderstanding.

> Make all fields in a message required. This makes messages product types.

> Promote oneof fields to instead be standalone data types. These are coproduct types.

This seems to miss the point of optional fields. Optional fields are not primarily about nullability but about compatibility. Protobuf's single most important feature is the ability to add new fields over time while maintaining compatibility. This has proven -- in real practice, not in theory -- to be an extremely powerful way to allow protocol evolution. It allows developers to build new features with minimal work.

Real-world practice has also shown that quite often, fields that originally seemed to be "required" turn out to be optional over time, hence the "required considered harmful" manifesto. In practice, you want to declare all fields optional to give yourself maximum flexibility for change.

The author dismisses this later on:

> What protobuffers are is permissive. They manage to not shit the bed when receiving messages from the past or from the future because they make absolutely no promises about what your data will look like. Everything is optional! But if you need it anyway, protobuffers will happily cook up and serve you something that typechecks, regardless of whether or not it's meaningful.

In real world practice, the permissiveness of Protocol Buffers has proven to be a powerful way to allow for protocols to change over time.

Maybe there's an amazing type system idea out there that would be even better, but I don't know what it is. Certainly the usual proposals I see seem like steps backwards. I'd love to be proven wrong, but not on the basis of perceived elegance and simplicity, but rather in real-world use.

> oneof fields can't be repeated.

(background: A "oneof" is essentially a tagged union -- a "sum type" for type theorists. A "repeated field" is an array.)

Two things:

1. It's that way because the "oneof" pattern long-predates the "oneof" language construct. A "oneof" is actually syntax sugar for a bunch of "optional" fields where exactly one is expected to be filled in. Lots of protocols used this pattern before I added "oneof" to the language, and I wanted those protocols to be able to upgrade to the new construct without breaking compatibility.

You might argue that this is a side-effect of a system evolving over time rather than being designed, and you'd be right. However, there is no such thing as a successful system which was designed perfectly upfront. All successful systems become successful by evolving, and thus you will always see this kind of wart in anything that works well. You should want a system that thinks about its existing users when creating new features, because once you adopt it, you'll be an existing user.

2. You actually do not want a oneof field to be repeated!

Here's the problem: Say you have your repeated "oneof" representing an array of values where each value can be one of 10 different types. For a concrete example, let's say you're writing a parser and they represent tokens (number, identifier, string, operator, etc.).

Now, at some point later on, you realize there's some additional piece of data you want to attach to every element. In our example, it could be that you now want to record the original source location (line and column number) where the token appeared.

How do you make this change without breaking compatibility? Now you wish that you had defined your array as an array of messages, each containing a oneof, so that you could add a new field to that message. But because you didn't, you're probably stuck creating a parallel array to store your new field. That sucks.

In every single case where you might want a repeated oneof, you always want to wrap it in a message (product type), and then repeat that. That's exactly what you can do with the existing design.

The author's complaints about several other features have similar stories.

> One possible argument here is that protobuffers will hold onto any information present in a message that they don't understand. In principle this means that it's nondestructive to route a message through an intermediary that doesn't understand this version of its schema. Surely that's a win, isn't it?

> Granted, on paper it's a cool feature. But I've never once seen an application that will actually preserve that property.

OK, well, I've worked on lots of systems -- across three different companies -- where this feature is essential.

xyzzyz · on Dec 24, 2019

> But I've never once seen an application that will actually preserve that property.

I wonder if author uses Chrome, which depends heavily on this in its Sync feature.

kentonv · on Dec 24, 2019

Yeah, most big Google services -- including Search -- rely pretty heavily on unknown field retention. Google has been building large services out of microservices since a decade before anyone ever said the word "microservice". When one service is updated to emit a new field, and another service is updated to consume it, it's important that the feature can then work, without updating all the middlemen.

akalin · on Dec 25, 2019

When I worked on Chrome Sync, we spent some time making sure that unknown fields were preserved properly. Glad to see that someone noticed, cheers!

xyzzyz · on Dec 25, 2019

I did notice that when I was an owner of protobuf in Chromium :) Custom patches to support unknown field preservation in lite mode sure brought me some hassle when updating to version 3 of the library.

tgsovlerkhgsel · on Dec 25, 2019

"Make all fields in a message required" would defeat one of the main benefits of protobufs: The ability to retroactively add/remove fields while still keeping the message compatible with implementations using the previous version of the proto definition.

The other issues (e.g. that you cannot make a repeated oneof) are annoying, but many of them are consequences of upgrading the "language" (if you want to call it that) without introducing incompatibilities and/or changing the wire format. Having a new, incompatible version would likely be a lot more annoying. Simply not having these features at all and having to write your own ugly hack as a workaround would definitely be a lot more annoying.

traverseda · on Dec 24, 2019

I'd be interested to hear their thoughts on capnproto.

kentonv · on Dec 24, 2019

I would expect he has the same issues with Cap'n Proto. Aside from some aesthetic cleanups, Cap'n Proto's type system is extremely similar to Protobuf -- because, frankly, Protobuf got that part right. Cap'n Proto's main difference from Protobuf is the encoding, which it doesn't seem like this guy cares too much about.

(I'm the author of Cap'n Proto, and Protobuf v2, though I did not design Protobuf's type system.)

daxorid · on Dec 24, 2019

The author isn't wrong about protobuf's shortcomings, but to say:

and solve a problem that nobody but Google really has

Is pretty absurd. There are plenty of projects that serialize a LOT of data between different runtimes/platforms (e.g. Go and Java) such that built-in serialization is not possible and JSON/XML is 3-10 times slower.

wellpast · on Dec 24, 2019

> The dynamic typing guys complain about it being too stifling, while the static typing guys like me complain about it being too stifling without giving you any of the things you actually want in a type-system. Lose lose.

Type system purists are blinded by their commitment to purity. All context is thrown out the window — it’s purism or bust.

The absurdity here is profound; it’s “Lose Lose” unless you go all typing or none.

And yet I completely understand the lament here. I think what the (smarter) type purists realize is that if they lose the purism position, static types do become much less of a tyrant tool and more like any other tool in our toolkit: a nominally useful one to be applied judiciously.

Then they’d have to turn their attention to the unforgivingly dynamic outside world and market.

based2 · on Dec 24, 2019

https://www.reddit.com/r/programming/comments/eezvhp/protobu...

ronnier · on Dec 24, 2019

> Fields with scalar types are always present. Even if you don’t set them. Did I mention that (at least in proto31) all protobuffers can be zero-initialized with absolutely no data in them? Scalar fields get false-y values—uint32 is initialized to 0 for example, and string is initialized as "".

> It’s impossible to differentiate a field that was missing in a protobuffer from one that was assigned to the default value. Presumably this decision is in place in order to allow for an optimization of not needing to send default scalar values over the wire.

I believe there’s a trick you can do if you mark it as a “oneof” with only one field.

humbledrone · on Dec 24, 2019

> It’s impossible to differentiate a field that was missing in a protobuffer from one that was assigned to the default value. Presumably this decision is in place in order to allow for an optimization of not needing to send default scalar values over the wire.

Isn't this just flat incorrect? You can tell the difference between set-to-default and not-set with buffer.has_some_field().

ronnier · on Dec 24, 2019

Not for things like ints and strings.

kentonv · on Dec 24, 2019

It depends on which version you're using.

In proto1 and proto2, every field had a "has" method and "explicitly set to default value" was different from "absent".

In proto3, they tried to remove this feature, and instead said that for basic types, "set to default" and "absent" are the same thing.

(I wrote proto2. I left Google before proto3 came about.)

humbledrone · on Dec 25, 2019

Thanks. I must be only used to proto2.

altmind · on Dec 24, 2019

Its 2019 and protobuf js compiler still only support commonjs modules and google-developed closure imports. No AMD/UMD and no ES6 modules.

How are we supposed to use it in a browser environment if we are not using browserify or webpack?

an_d_rew · on Dec 24, 2019

The sad thing is that, rather than forward this to the small "decision team" at work, where we can ponder the merits of the author's points...

... I'm going to just close my browser tab due to the puerile ranting at the beginning (and sprinkled throughout). A few good points, and perhaps a great basis for "proto4" or whatnot, but that my "OMG they're so dumb" ranting?

If that was a peer-reviewed paper, I'd have rejected it after reading the first paragraph, if I even made it that far. That's just not how you make a technical argument or win people over.

choppaface · on Dec 24, 2019

One important thing missing from the current criticism is Protobuf’s lack of a facility for serializing a sequence of messages to a file. There’s RecordIO internally at Google, yet they markedly declined to open-source the C++ lib for it. There’s hints of it in Protobuf Java and then Amazon has open-sourced their own implementation of it with the same name.

Lack of public RecordIO is partially to blame for creation of TFRecords, which are in many ways inferior to (for example) tar archives of string-serialized protobuf messages. (tar supports index, streaming, compression, etc).

dekhn · on Dec 25, 2019

I requested that the RecordIO format (bytes on the disk and codfe implementation) be opensourced (for ease of interoperability between Google datasets and open source/scientific work). It wasn't because there were some 'flaws' in the design, but it was pointed out that leveldb open sourced a format very similar to it (but which never got used outside of leveldb).

stabbles · on Dec 24, 2019

I only have experience with flatbuffers in C++ (it seemed easier to be integrated in a project back then). Can anyone comment on the pros and cons of flatbuffers vs protobuffers?

seriesf · on Dec 24, 2019

I worked on trying to make flatbuffers work at google and it just never was as fast as proto2/c++. I guess the author of this piece would describe me as an amateur because like the authors of protocol buffers I only have about thirty years of industry experience. AMA.

kentonv · on Dec 24, 2019

I'd be really interested in hearing why it wasn't faster! I expect the answer is along the lines of: "Well theoretically the zero-copy design should be faster, but in practice factors X and Y dominate performance and Protobuf wins on those." I'd love to know exactly what X and Y are...

(I'm the original author of proto2/c++, but I'm mostly interested for any lessons that might apply to my work on Cap'n Proto...)

seriesf · on Dec 24, 2019

The C++ proto implementation is just already tuned to an absurd degree and it is hard to beat. Any place where copying was an important problem has already been eradicated with aliasing (ctype declarations) so flatbuffers' supposed advantage isn't there to begin with. It's much more important to eliminate branches, stores, and other costs in generated code.

kentonv · on Dec 24, 2019

I'm guessing you were trying to use it with Stubby?

Admittedly the networked-RPC use case is not a particularly compelling one for zero-copy (the mmaped-file case is much more so, and maybe even shared-memory RPC).

Still, I'd expect that not having to parse varints nor allocate memory would count for something. Wish I could see the test setup.

acvny · on Dec 24, 2019

"but unfortunately, literally nobody considers Java to have a well-designed type-system". What? That is mildly put a lie.

toolslive · on Dec 24, 2019

Indeed. I imagine some people do think Java has a well-designed type system. However, you probably don't consider those people to be authorities on the subject.

isopede · on Dec 24, 2019

What would be an appropriate replacement for embedded systems? I've looked at the "tiny" versions of protobuf (nanopb, etc), but haven't tried them yet.

Are protobuf competitors (flatbuffers, capnproto) appropriate for small embedded systems (microntrollers, mostly <64K RAM).

kentonv · on Dec 24, 2019

I think an implementation of Cap'n Proto that's actually optimized for embedded systems would likely be smaller than any implementation of Protobuf could be. However, I'd have to admit that the current Cap'n Proto C++ library is not so optimized.

Here's a GitHub comment where I outlined what we'd need to do to fix that, FWIW: https://github.com/capnproto/capnproto/issues/844#issuecomme...

Caveat: I don't have any personal experience with embedded systems.

rhacker · on Dec 24, 2019

My main problem with protobuf isn't the actual serialization or the proto files. It's the use case. They actually pitted this up against REST. REST is slowly going out of favor, so of course it makes sense to start gap filling. But when we look at two major competing technologies: GraphQL or Protobuf to fill that "I don't want to use REST anymore feeling" GraphQL actually solved something useful and pushed the notch forward. Protobuf really just said, hmmm, let's put TONS of constraints down on top of REST to make it more reliable and faster. Basically Swagger 4.0 maybe?

I keep seeing people saying things like well, protobuf can be used to make your GraphQL faster, etc... So you're actually trying to argue for "some" usefulness for protobuf for someone that made it to the next level. That might last, what, 1 month? The only thing that should be responsible for adding a binary encapsulation format would be something built into Http specs, not some kind of custom Rest->graphql->protobuf stack.

/rant over

edem · on Dec 24, 2019

It is either Protobuf or Protocol Buffers, not Protobuffers. This kinda upsets my OCD.

shadowgovt · on Dec 24, 2019

It's an interesting article. I was hoping for some alternative suggestions, because proto is "just good enough" at structure and wire to become the one tool a project will reach for so it doesn't need two tools.

jamesu · on Dec 24, 2019

I've been working on a project which requires writing ~40 different packet types in a custom protocol, but always thought something like protobuf would be a great fit for standardizing the packet serialization routines.

atombender · on Dec 24, 2019

You could look at ASN.1, which was pretty much created for that purpose.