Hacker News new | past | comments | ask | show | jobs | submit login

> Despite map fields being able to be parameterized, no user-defined types can be. This means you'll be stuck hand-rolling your own specializations of common data structures.

What a pain! Well, at least Google won't make that mistake again!




My contention with the quoted text is that you probably shouldn't be using elaborate data structures in streams/files. Using DTOs as heap/stack data has bitten me enough times that I'm fairly certain that it's an anti-pattern.

It doesn't matter if you're using a quantum binomial tree in a black hole: save it as a 'stupid' map<k,v> when it hits the network. That way everyone who interacts with your service can decide how they want to represent that structure. You can compose any in-memory data structure you can dream of, you can validate the data with more richness than "missing" or "present", and Protobuf doesn't contaminate your codebase.

Option 3 is the correct choice. Serialization libraries should be amateurish by design.


> That way everyone who interacts with your service can decide how they want to represent that structure.

Protobufs aren't a message format for publicly specced standard wire protocols. They're a serialization layer for polyglot RPC request and response messages. The whole point of them is that you're transporting the same typed data around between different languages, rather than there having to be two conversions (source language -> "standard format"; "standard format" -> dest language).

In this sense, they're a lot like, say, Erlang's External Term Format—driven entirely by the needs of an RPC-oriented distributed-computation wire protocol, doing interchange between nodes that aren't necessarily running the same version of the same software. Except, unlike ETF, Protobufs can be decoded without the backing of a garbage-collected language runtime (e.g. in C++ nodes.)

I'm not saying Protobufs are the best, but you have to understand the problem they're solving—the "rivals" to Protobufs are Cap'n Proto and Thrift, not JSON. JSON can't even be used to do a lot of the stuff Protobufs do (like talk to memory-constrained embedded kernels running on SDN infra.)

> and Protobuf doesn't contaminate your codebase

Just like in HTTP web-app codebases, a codebase properly designed to interact with an https://en.wikipedia.org/wiki/Enterprise_service_bus will tend toward Hexagonal architecture—you have your own logic, and then you have an "RPC gateway" that acts as an MVC system, exposing controllers that decode from the RPC format and forward requests into your business logic; and then build responses back into the RPC "view" format.

Once you have that, there's no point in having your application's parsed-out representation of the RPC format be different than the on-the-wire RPC format. The only thing that's touching those RPC messages is your RPC gateway's controller code anyway. So why not touch them with as little up-front design required as possible?


In the context of gRPC, it's true that Protobuf is intended to be a wire protocol, but the argument would be more convincing if the code generator toolchain Google fostered created code that integrated better with the languages that people use.

Typically, the data types and interfaces generated by these tools are so poor that you need to build another layer on top that translates between "Protobuf types" and "native types", as you describe in your comment, and shields the application from the nitty-gritty details of gRPC. So investing in gRPC means that when you've generated, say, Go files from your .proto files, you're only halfway done. Protobuf in itself introduces a kind of impedance mismatch, a kind of stupid in-bred cousin whose limited vocabulary has to be translated back and forth into proper language.

So gRPC/Protobuf solves something important at the wire level, but developers really want to productively communicate via APIs, and so what you have is just half the solution.

I wish the Protobuf/gRPC toolchain were organized in such a way that the generated code could actually be used as first-class code. Maybe similar to how parser generators like Lex/YACC or Ragel work, where you provide implementation code that is threaded through the output code.


> you probably shouldn't be using elaborate data structures in streams/files.

I don't consider Option<T> a particular "elaborate" data structure, but protobuf would benefit heavily from. Instead, as the author alludes to, you're forced to do different things depending on which side of the message/scalar dichotomy T falls on; if it's a message, you get Option<T> essentially for free, as a submessage can always be omitted. (Which is bad in the case that you just want a T; protobufs will happily let you forget to encode that attribute, and happily decode the lack thereof at the other end, all without error, despite the message being malformed, as it lacks a way to express this — rather common — constraint.) If T is a scalar, you get to use one of the WKTs.

See this GitHub issue for more context: https://github.com/protocolbuffers/protobuf/issues/1606

I work with data where most attributes outside of the attributes that compose the key identifying some object are quite possibly unknown. Our data comes from messy datasets, and while we clean it up as best we can, sometimes an attribute's value is unintelligble in the source data, and it is more pragmatic to move on without it. In SQL, we represent this as NULL, in JSON, null, but protobuf makes it rather difficult, and this frustration is evident from the other posters in that GitHub issue.


You can always write a code-generator to ease the pain.


That's what I hear from Go devs all the time, doesn't sound convicing to me.


I think both comments are sarcastic jabs at Go.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: