RPC Olympics – The Search for the Perfect RPC Protocol

silasb · on March 19, 2020

I recently spent some time between raw UDP and GRPC and learned a ton. This was for a hardware project so it's different from what web devs typically work on.

The most important thing that I found out while doing my research is it's not the fastest bytes that win. Well, that does matter, but it's not that important. It's reducing performance variance [1]. While my project wasn't optimizing for speed, it was optimizing for zero-copy de/serialization, but this often ends up as a solution for high-speed transfers. SBE, Flattbuffers, Cap'n Proto all had their places, but I ended up not using any of those and just hand-rolling something similar to what SBE would do. If this was a $DAYJOB project I'd probably end up doing something with SBE.

[1]: https://speice.io/2019/07/high-performance-systems.html

a_t48 · on March 20, 2020

At my last job, I had a need for high bandwidth low latency messaging over UDP - we started out with raw C structs over a home built reliable UDP library. We ended up laying protobuf on top a year after because it was honestly a headache not having a serialization library as message types got more complex. It didn't end up causing enough slowdown to impact the run speed of the application (all serialization was done on the app thread rather than the very performance sensitive network thread) as we had other bottlenecks, and the developer workflow (and additional message validation) made it worth it. SBE looks neat, though, I didn't know about that one.

H8crilA · on March 20, 2020

Apples to oranges.

TCP has no concept of a message, you need to build it on top of TCP.

Which HTTP does, but it doesn't have a standard concept of a message format, you need to build it top of HTTP.

Which gRPC does, but it closes the connection after a successful exchange.

Which is avoided by streaming gRPC - makes sense if you know you'll be talking more over this channel.

None of those protocols are really comparable, and most of the differences boil down to the serialisation protocol (binary/proto or textual, like JSON).

tweenagedream · on March 20, 2020

Usually grpc connections are left open after a message is sent and received. And since it's based on h2, multiple message streams can be multiplexed over a single connection. That is part of the benefit of grpc, establish a long lived connection and send your messages as needed. Similar to using keep-alive with http.

EdSchouten · on March 20, 2020

I think it comes to no surprise that sending data through a plain TCP socket is faster than using gRPC. The only downside is that over time requirements will likely start to stack up:

- Suddenly you want to enable TLS between one or more components, meaning you need to wrap the socket inside a TLS channel.

- You discover that your client behaves poorly when servers go offline, so you add your own logic for keepalives/pings.

- At some point you want to add metrics to all of this, so you decide to manually add Prometheus metrics to the client/server.

- Later on you want to attach OAuth2 tokens to requests as well, so that you can do credential passing.

- In order to get more insight in your setup, you decide that you want to use this in combination with OpenTracing/Jaeger.

Once all of those features are added to your Redis-like protocol, you discover that you've basically reinvented gRPC... poorly.

fulafel · on March 20, 2020

In the conclusions the author ends up recommending Redis protocol and the plain TCP conclusion is least attractive of all, so I think you and the article are agreement that plain TCP is not the way to go.

parhamn · on March 19, 2020

Cool! I don't know if a comparison of these really makes sense. You'd be conflating transport, serialization, and stream format (unary/stream).

For fullstack dev, I've been immensely happy with grpc/protobuf over http because of the type safety I get communicating between Golang and Typescript. This eliminates a whole class of bugs but is only a serialization benefit.

Generally: use the thing with the best tooling!

NewJazz · on March 19, 2020

I'd love to learn more about your stack, use cases, and development workflow. My employer is starting a golang, grpc/protobuf, and sveltejs web application from scratch and they are new technologies for almost everyone on my team!

I want to write my integration test suite for the back-end service in typescript to hopefully be able to reuse some of the test suite code for the front-end. But I'm struggling to set up a functional CI pipeline.

parhamn · on March 19, 2020

Sure! We are actually currently building a platform on this stack (for eventual open sourcing) called Core [1]. Protobuf generally definitely needs some tooling and environment management work to make things smooth. But it is achievable and when you get there things work better a lot more often.

Drop me a line at parham@cloudsynth.com and happy to give guidance where I can.

[1] https://www.cloudsynth.com/products/core

thedance · on March 19, 2020

There is no type safety in grpc or any other protobuf RPC scheme, full stop. The recipient of a message makes an assumption about the meaning of the message and decodes it accordingly. Any encoded protobuf might successfully decode as any other protobuf.

To repeat, grpc has no "type safety" whatsoever.

parhamn · on March 19, 2020

Give me whatever terminology you prefer for: "I can ensure I maintain backwards compatibility (with a spec checker). Also with most of the generated client code you can ensure to a high degree of safety that the data I'm accessing exists and is of the correct type".

Happy to consider using that term instead. For most folks, the one I chose is good enough.

thedance · on March 20, 2020

Quick reminder that rebuttals are more valuable than downvotes.

pjc50 · on March 20, 2020

Rebuttals take much longer to write.

Assuming that two endpoints are using the same or forward-compatible schemas, there's no "assumption" involved: if you sent a message of type T, it gets decoded as type T. There isn't protection against inauthentic messages, admittedly.

Would you like to cite an RPC scheme that meets your definition of type safety.

thedance · on March 20, 2020

If you just assume that all the participating programs are correct, then type safety has no meaning. Part of type safety is being able to check, statically or dynamically, that a program isn’t going to interpret a value of type t as an incompatible type T. GRPC has no such facilities. It does not pass around or assert anything about the scheme that was used to encode the message such that the decoder can check safety. Nothing prevents you from sending a wire-compatible EmployeeVacationMessage to the ReactorSelfDestructService.

alfalfasprout · on March 19, 2020

Having just looked into this recently, I can also safely say that moving to REST HTTP/2 on its own already provides a significant speed benefit. Closing the gap to GRPC (or even substantially beating it) is possible by switching to a fast serialization format (eg; flatbuffers, msgpack, capnproto).

While GRPC has its place it also comes with headaches like pretty lousy generated interfaces, horrible debuggability, and unpredictable scaling.

gravypod · on March 20, 2020

Came here to say something similar. The interfaces and tooling around gRPC in the public sphere is pretty bad. If you maintain a code base with multiple languages and want to compile gRPC for it you can choose between:

1. Having a complex build system that is aware of gRPC

2. Massive migraines

Unfortunately build systems integrations, IDE integrations, and generated code is unanimously awful for gRPC.

Every company I've seen use grpc has unfortunately adopted the practice of basically manually running protoc and committing the generated code into repo - sometimes modifying the code manually to make it import successfully (Python).

I hope Bazel evolves into a state where it's usable by the average engineer and has first class support for gRPC.

pjc50 · on March 20, 2020

Not really - we have a product that builds C#/Java/Python/C++ versions of a grpc API, and we just do it by invoking 'protoc' at the relevant point?

treve · on March 19, 2020

It's kinda weird to compare REST to RPC protocols though. It's a completely different paradigm. Maybe you're just building a HTTP API without the REST part? (JSON-RPC is a thing)

e12e · on March 20, 2020

Actually traditional RPC is one of the paradigms considered in Fielding's thesis coining the term. I don't know why people seem to insist there are no other models than REST if you happen to have browser on one end. Sending Javascript down the asynchronous pipe designed for XML was at the very least a step back away from REST toward something more traditional (moving code, not data).

Section 3.5 "Moving code styles" (and 3.4 for RPC):

https://www.ics.uci.edu/~fielding/pubs/dissertation/net_arch...

_8ljf · on March 19, 2020

Indeed; true REST interactions tend to be quite coarse-grained; essentially requests to update a remote state machine to match the described state. Most of what gets called “REST” isn’t; it’s just ad-hoc RPC sent over HTTP with arguments encoded as XML/JSON, which is probably what parent really means. Take casual claims of RESTfulness with the requisite bucket of salt. (If the term “REST API” is used, you can toss the bucket entirely as the very phrase is itself an oxymoron.)

pdpi · on March 19, 2020

HTTP is a reasonable enough transport layer for RPC frameworks (e.g. gRPC is built on top of HTTP/2)

pkage · on March 19, 2020

Strong recommendation for ZeroMQ as well. It's a thin layer on TCP, is very fast, offers bindings in many languages, and is fairly flexible as well.

https://zeromq.org/

alfalfasprout · on March 19, 2020

ZeroMQ and its bindings are well-proven, extremely well written, and the ZeroMQ guide is a work of art. That said, the bindings are not created equal. We've seen some unreliability with the Java package even though the official C/C++ and Python bindings work perfectly (the Go package too).

X-Istence · on March 21, 2020

The thin layer on TCP leaves a lot of things to be desired. Such as knowing if the other end has disappeared and won't ever come back.

Last time I used ZeroMQ was ~7 months ago at my previous job, so maybe things have changed since then, but having to write hacks to see if the other side is still there or not is absolutely terrible.

Also, the Python bindings did not like fork() without exec, which yes is bad and all, but is still something that is done every so often in real world large programs.

excerionsforte · on March 20, 2020

I second this recommendation. I use MsgPack over ZeroMQ with a custom RPC protocol. Protobuf, Capn Proto, Flatbuffers are good serialization mechanisms as well. Can't wait until the new Socket Types are out of draft stage.

I believe GRPC is pluggable so if one wanted to invest time in building a GRPC ZeroMQ Transport then that is a feasible route. GRPC bring really good RPC mechanisms to the table.

nurettin · on March 19, 2020

especially when you use the router pattern on server side in order to have multiple sessions on the same socket.

nerdponx · on March 20, 2020

Got any references on the router pattern? I searched but came up with a lot of IT related articles.

nurettin · on March 20, 2020

https://netmq.readthedocs.io/en/latest/router-dealer/

_8ljf · on March 19, 2020

Surprised article didn’t include this in its comparison. Seems an obvious contender.

pjc50 · on March 20, 2020

Marketing, I guess: message queueing and RPC are conceptually quite different, although it looks like zeromq has a request-response mode designed for this.

wallyqs · on March 19, 2020

For speed, NATS.io also has RPC semantics and protocol is fairly straightforward like Redis: https://docs.nats.io/developing-with-nats/sending/request_re... || https://docs.nats.io/nats-protocol/nats-protocol

dpipemazo · on March 20, 2020

We did a similar analysis in the early days at Elementary after rolling something custom based on ZMQ. We would up creating Atom: https://github.com/elementary-robotics/atom

Atom is an easy, Redis Streams-based RPC that also emphasizes docker containerization of microservices. We support plug-ang-play serialization with msgpack and Apache Arrow currently supported and more on the roadmap. You can also send raw binary if you please.

Another nice thing about Redis is that if you're running microservices on the same instance you can connect to redis through a linux socket on tmpfs and bypass the TCP stack to get even better performance.

brozaman · on March 20, 2020

This isn't a proper comparison because the writer compares a few things that aren't really the same...

Anyway the problem with comparing rpc protocols is that you need to do a per case benchmark if you really care about performance. Pretty much every decent solution will be better than another equally good solution depending on the use case.

Years ago, one of my customers told me he used to work doing high frequency trading in a bank, and the bank had several tailor made solutions for data serialization and RPC made for very specific cases, and they were just better than any generic solution.

mappu · on March 19, 2020

I performed the exact opposite migration at $DAYJOB recently - swapping an internal text-based protocol to HTTP (over a local IPC pipe).

The main benefit was we could suddenly reuse all the codegenerated routers/docs/authentication from the HTTP ecosystem. It significantly simplified/standardised our IPC layer and reduced the "weirdness" in the codebase.

pathseeker · on March 19, 2020

Yep. Unless you're doing thousands of requests per second+, the right RPC protocol is likely just HTTP due to the massive improvement in mature tooling, debugging, etc.

cogman10 · on March 19, 2020

Today's toasters can handle 1k http requests per second. :)

The overhead of Http (especially 2/3) matters so little on modern hardware.

flyinprogrammer · on March 20, 2020

Folks might want to checkout RSocket: https://github.com/rsocket/rsocket-go

lulf · on March 20, 2020

Couple of protocols that would be nice to benchmark for comparison:

* AMQP 1.0 - can also be used for RPC without a broker in between client and server. See https://qpid.apache.org/proton/

* Aeron - low latency, UDP based, see https://github.com/real-logic/aeron

zvrba · on March 20, 2020

Re proton: I tried to use it in point-to-point mode, but haven't been able to figure out how; Javadoc reference is useless for that. There exist only Python examples but Python APIs don't map 1:1 to Java APIs.

lulf · on March 20, 2020

For Java, have a look at vertx-proton which builds on top of proton-j and is a bit more intuitive (still not great) than proton-j for creating servers and clients. :)

Example “blocking” client https://github.com/EnMasseProject/enmasse/blob/master/amqp-u... , but might give an idea of how to set “dynamic source” required for rpc.

In general though I think the Qpid python and c++ examples might be better.

asimpletune · on March 19, 2020

I wish capnproto was included in this.

otabdeveloper2 · on March 20, 2020

No such thing as "RPC Protocol", there's a huge insurmountable gulf between "I want my Python and Javascript services to pass ad-hoc messages between themselves" and "I want to send C++ structs over the network with typesafety and minimal overhead".

_khhm · on March 19, 2020

Damn it, I just learned gRPC and now we're doing something different?

This is just as bad as front-end web dev!

ngneer · on March 19, 2020

For perspective, RPC was invented in the 50s and 60s.

heavenlyblue · on March 19, 2020

So it was invented twice?

enjoy-your-stay · on March 21, 2020

It was invented once in the 50s, but they didn't get the message, so they invented it again in the 60s.

pjc50 · on March 20, 2020

The failure to use the same vertical scale for the graphs or provide an overlaid one is infuriating. So all the bar charts look identical and you have to read the scale to pick up subtle factor-of-ten differences?

nurettin · on March 19, 2020

My current rpc model for python is to pickle function name and args/kwargs and publish to redis for any subscribers that might be listening.

If the channel is marked as persisting, I put the message to a hashset with current nanosecond as the timestamp and just send a ping instead.

It is pretty fast with asyncio redis client clocking at an avg 15k rpc/s on a single core with uvloop on a lowly i7

JoshTriplett · on March 19, 2020

Be careful using the pickle format; it's not considered safe to unpickle untrusted data. https://docs.python.org/3/library/pickle.html

nurettin · on March 20, 2020

I did see the big warning labels everywhere. However, there is simply no replacement that is equally fast (protocol 5), easy to use (copyreg) and imports necessary modules when deserializing. So tradeoffs were made.

JoshTriplett · on March 20, 2020

> imports necessary modules when deserializing

This is one of the fundamental security issues.

nurettin · on March 21, 2020

Most of the security issues are mitigated if you are only running the software internally. But it would be interesting to see a hacker who managed to get into the production systems somehow figure out your RPC scheme and try to craft packets to exploit it instead of going directly for the user/password database.

ngneer · on March 19, 2020

You beat me to it. Sounds like you would be opening yourself up to a variant of CSRF. One user could upload untrusted data that would be fed into an unsuspecting user. You should never feed or consume an untrusted pickle.

philsnow · on March 20, 2020

Pickles can also be time bombs, especially around python upgrades. Sometimes (ok, rarely) the serialization / deserialization of some types changes between versions of python.

Another issue is painting yourself into a corner: when you use pickles, you make it harder to either switch away from python in the future or consume the same serialized objects from any non-python (micro)service. This can delay or prevent transitions away from python that would otherwise make sense.

the8472 · on March 19, 2020

If you're on the same machine then pipes, unix sockets and shared memory are even faster options.

fulafel · on March 20, 2020

Those are transport layers, a layer beneath a RPC protocol.

numbsafari · on March 19, 2020

It wouldn’t really be an RPC, then would it?

the8472 · on March 19, 2020

It's not a normal procedure call within the same process.

numbsafari · on March 20, 2020

Right. Those are methods for IPC. IPC doesn't have to deal with all the same issues as RPC, and so those techniques are going to be faster as a result. You can't really compare them.

It's like comparing communication between people physically located in the same room and people located on different continents.

the8472 · on March 20, 2020

RDMA shared memory muddies that distinction.

teddyh · on March 19, 2020

No mention of authentication or encryption? I can’t believe that everyone is either using HTTPS with either client certificates or plain HTTP authentication, or using IPsec (or some form of IP tunneling) to make encryption transparent to the application.

pjc50 · on March 20, 2020

Don't know why this was downvoted - it's one of those things that's a huge pain to retrofit.

tenebrisalietum · on March 19, 2020

That's a Layer 5 or 6 issue, or lower if you want to talk about IPSec, not a Layer 7 issue.

anonu · on March 20, 2020

Curious to hear anyone's experience with RPC over Nameko in Python. We use it extensively on top of RabbitMQ and its been pretty robust.

prakhunov · on March 20, 2020

RPC over Rabbit can work (and does) but you do have to remember that it in the end it is a queue and a couple of bad messages can stop your entire system.

RPC over Rabbit is fine if you don't care about the result, or you can guarantee that each message gets processed in a short constant time.

monus21 · on March 20, 2020

I have a similar implementation of RPC over AMQP in Node.js and 2 things i'd advise. (Similar to my sibling poster)

1) Don't requeue on error. One bad message could bring down your entire service. Better to just push it to Sentry and make a fix for it.

2) Have a timeout in your message handler.

sparcotan · on March 19, 2020

Should include smf[0].

[0] https://github.com/smfrpc/smf

ysleepy · on March 20, 2020

A niche cpp-only lib that is looking for a maintainer.

I would not consider using it for production code either.

MichaelMoser123 · on March 20, 2020

i think rpc solutions should be able to gain in performance by having host byte order/little endian as the wire format. I mean spending all those cycles on bit swapping is pointless if both endpoints are in host byte order.

seemslegit · on March 20, 2020

Wait what ? Why is emulating a redis server (and to whom ?) is even a thing ?