Agreed. The headline leads people towards a deceptive conclusion. I've worked wi...

jlouis · on July 20, 2023

Parsing overhead is sensitive to the amount of work the service is doing. If the service is doing relatively little, and there's a lot of junk in your request, you are forced to parse a lot of data which you then throw away. JSON is particularly nasty because there's no way to skip over data you don't care about. To parse a string, you have to look at every byte. This can be parallelized (via SIMD), but there's limits. In contrast, protobuf sprays length encoded tags all over the place which allows you to quickly handle large sequences of strings and bytes.

If your cache is hot and has a good hit-rate, the majority of your overhead is likely parsing. If you microbatch 100 requests, you have to parse 100 requests before you can ship them to the database for lookup (or the machine learning inference service). If the service is good at batch-processing, then the parsing becomes the latency-sensitive part.

Note the caveat: the 60% is for large payloads. JSON contains a lot of repetition in the data, so you often see people add compression to JSON unknowingly, because their webserver is doing it behind their back. A fairly small request on the wire deflates to a large request in-memory, and takes way more processing time.

That said, the statistician in me would like to have a distribution or interval rather than a number like "60%" because it is likely to vary. It's entirely possible that 60% is on the better end of what they are seeing (it's plausible in my book), but there's likely services where the improvement in latency is more mellow. If you want to reduce latency in a system, you should sample the distribution of processing latency. At least track the maximal latency over the last minute or so, preferably a couple of percentiles as well (95, 99, 99.9, ...).

jongjong · on July 20, 2023

Either way, the conclusion made by the headline is cherry-picked and is misleading for anyone who is concerned about average situations.

From the article:

> The result of Protocol Buffers adoption was an average increase in throughput by 6.25% for responses and 1.77% for requests. The team also observed up to 60% latency reduction for large payloads.

It's very sneaky to describe throughput improvements using average requests/responses (which is what most people are interested in) but then switch to the 'worst case' request/response when describing latency... And doubly sneaky to then use that as the headline of the article.

jlouis · on July 20, 2023

I agree.

There's also a lot of alarm bells going on when you have reports of averages without reports of medians (quartiles, percentiles) and variance. Or even better: some kind of analysis of the distribution. A lot of data will be closer to a Poisson-process or have multi-modality, and the average is generally hiding that detail.

What can happen is that you typically process requests around 10ms but you have a few outliers at 2500ms. Now the average is going to be somewhere between 10ms and 2500ms. If you have two modes, then the average can often end up in the middle of nowhere, say at 50ms. Yet you have 0 requests taking 50ms. They take either 10 or 2500.

lallysingh · on July 20, 2023

Tail latency is really important. Usually more than average, because it's what drives timeouts in your system.

IanCal · on July 20, 2023

> Does any developer here on HN really believe that JSON parsing (plus schema validation) is what adds the most latency to a request? It just doesn't add up that just switching to PB would deliver that speedup.

I've absolutely had times when json serialising/deserialising was the vast majority of the request time.

felixgallo · on July 20, 2023

100%. In fact, I've usually found in my career that serialization/deserialization is the largest latency contributor, except when the API call makes remote database requests or something equally expensive. For the critical path of, say, game state updates, it's best to keep as much as you can in-memory on-box, so ridiculously expensive operations like json deserialization really stand out by comparison. Even for enterprise web crud calls, it's quite important when you're part of a giant ensemble of web calls, some of which may be serially dependent, to worry about things like this.

Cthulhu_ · on July 20, 2023

Other sources say protobuf is 6x as fast than JSON though. I mean with REST / JSON you get the overhead of converting data to JSON objects, JSON objects to text representation, text to gzip or whichever, then the transfer over TCP / HTTP (establish connection, handshake, etc), the actual transfer, and then gzip -> text -> json -> data again. Protobuf is a lot more direct and compact, you get a lot of those things for free, and that's even before considering the schema and code generation which is still far from ideal in the REST/JSON world.

I don't believe that for the purposes of inter-service communication, REST/JSON can compete with protobuf.

Caveat: I only have experience with REST/JSON and a little GraphQL, I haven't had the opportunity to work with protobuf yet. I'm more of a front end developer unfortunately, and I try to talk people out of doing microservices as much as possible.

jongjong · on July 20, 2023

I don't buy these arguments about 'This cheap computation is 6x faster than this other alternative cheap computation'.

For example, in JavaScript, using standard 'for loops' with index numbers is a LOT faster (over 16 times faster for basic use cases) than using Array.forEach(), yet all the linters and consultants recommend using Array.forEach() instead of the standard for loop... What about latency??? Suddenly nobody cares about latency nor performance.

The reason is that these operations which use marginal amounts of resources are pointless to refactor. If a function call which uses 1% of CPU time* (to service a standard request) has its performance improved by 'a whooping' 50%, then the program as a whole will use only 0.5% less CPU time than it did before.

* (where all the other operations use the remaining 99%)

brvsft · on July 20, 2023

> For example, in JavaScript, using standard 'for loops' with index numbers is a LOT faster (over 16 times faster for basic use cases) than using Array.forEach(), yet all the linters and consultants recommend using Array.forEach() instead of the standard for loop... What about latency???

Do you have proof of this claim? It smells like bs.

byroot · on July 20, 2023

Not the person you are asking but, 16 times seem like a lot, however it's going to heavily depends on how good the VM or JIT is at inlining.

Because `Array.forEach` is one method call for each iteration, when `for loops` stay on the same frame. If the compiler can't inline that, it's a "major" overhead compared to simply jumping back at the top of the loop.

Googling for it I find some benchmark showing a 3x performance: https://leanylabs.com/blog/js-forEach-map-reduce-vs-for-for_..., but they call `Array.push` in the loop, so it's possible the difference is even bigger in practice.

jongjong · on July 27, 2023

If you have Node.js installed, you can run it yourself. I did run the experiment, that's how I came up with the 16x number. I just had the two kinds of loops iterating about 10 million times and just assigned a variable inside them (in the same way). I recorded the start and end times before and after the 10 million iterations (in each case) and found that Array.forEach took 16x longer to execute. The difference could be even higher if the operation inside the loop was cheaper since much of the cost of the loop depends on what logic is executed inside it. I kept it light (variable assignment is a cheap operation) but it's still an overly generous comparison. The real difference in performance of just the two kinds of loops are almost certainly greater than 16x.

Ideally you should compare running the loops without any logic inside them but I was worried that optimizations in the JavaScript engine would cause it to just skip over the loops if they performed no computations and without any memory side effects. Anyway this was beside the point I was trying to make. My point is already proven; it makes no sense to optimize cheap operations.

jlouis · on July 20, 2023

forEach() incurs a function-call overhead unless you optimize it away in a compiler. So the idea a loop would be faster by some multiplier is quite plausible.

To get an order of magnitude in difference, you'd have to construct a special case. I've seen a multiplier of about 3, but not 16.

As an aside: you use forEach() over for loops because most array processing operates on small arrays, where the marginal improvement of using a loop is limited. If you have a large array, you will eventually switch to a for loop if the loop body is relatively small. Likewise, when the request size is small, JSON works fine. But when your requests grows in size, the JSON overhead will eventually become a problem which needs a solution.

The underlying consideration is Amdahl's law.

mkl · on July 20, 2023

forEach is faster than indexing for me on Firefox Android. Tried it with this: https://www.measurethat.net/Benchmarks/Show/13207/0/performa...

selimnairb · on July 20, 2023

Devil’s advocate: I would generally expect that a rewrite would increase bugs, at least at first.

HankB99 · on July 20, 2023

I would expect that the target for a rewrite would be better known than the during the first write and that the morphing to hit a moving target would provide more opportunities for bugs.

OTOH if there was a lot of pressure to get the rewrite done, that would be conducive to producing buggy code. I think management would be a bigger factor than any technical issues.

akdor1154 · on July 20, 2023

As per TFA, the 60% was for huge payloads, maybe p99 or something. The mean benefit was much lower, but it's arguable that it's the slow ones that you care about speeding up the most.

lallysingh · on July 20, 2023

> Does any developer here on HN really believe that JSON parsing (plus schema validation) is what adds the most latency to a request? It just doesn't add up that just switching to PB would deliver that speedup.

50k API endpoints means that probably a lot of them are pretty simple. The simpler the API, the higher percentage of it's time is spent in call overhead, which with JSON is parsing the entire input.

EMM_386 · on July 20, 2023

> I've worked with both JSON and Protocol Buffers and there is no way that merely changing from JSON to PB would reduce latency by 60%; not on an average request. Clearly other refactorings were undertaken to achieve that speedup.

The article notes that this is only for "large payloads", likely an edge case, and the average performance improvement is 6.25% for responses and 1.77% for requests. 1.77%!

I get that this is "at scale" but is the additional complexity of all this engineering worth that? How much more difficult is the code to work with now? If it's at all more difficult to reason about, that is going to add to more engineering hours down the road to work with it.

I assume tradeoffs like this were taken into account, and it was deemed that a <7% response improvement was worth it.

dropofwill · on July 20, 2023

60% was just one request on average, could have been a huge payload that tended to be between data centers or something.

3cats-in-a-coat · on July 20, 2023

There's no good reason to use JSON for internal communication long-term. Although I'm not a big fan of Protobuffs as well (too fragile and inflexible).

topicseed · on July 20, 2023

So what's your third option, if any

3cats-in-a-coat · on July 20, 2023

There are many options, and I'd be uncomfortable to suggest any in specific without knowing the project's details (I'll list a few below). The reason why switching JSON to Protobuff make me raise an eyebrow is because it represents a switch from one extreme to another. A mainstream, flexible, text-based protocol, to a specialized, rigid, binary protocol. When people do sudden moves like these, it's often misguided. I can almost hear the evangelists bashing how everything LinkedIn did with JSON is wrong, or something.

In terms of formats, you'd get an easier transition and more balance between flexibility and efficiency out of BSON, Avro, Thrift, MessagePack. There are also alternatives to Protobuff like FlatBuffers and Cap'n Proto. There's also CBOR, which is interesting.

There are also other ways of looking at the problem. How does Erlang serialize messages? It doesn't because it messages itself, so the message format is native to itself. And in fact I mostly lean in that direction, but it's not for everyone. Erlang is also dynamically typed, not the kind of language Protobuff and Cap'n Proto is aimed at I suppose.

ramchip · on July 20, 2023

> How does Erlang serialize messages? It doesn't because it messages itself, so the message format is native to itself.

I don't get the difference you're drawing... the in-memory and on-the-wire representation of terms are different, so there's still serialization involved (term_to_binary/1). The format is documented and there are libraries for other languages.

3cats-in-a-coat · on July 20, 2023

They're different, but not that different as the functional message-oriented nature of Erlang means your entire state is typically in transferable terms, and the serialization format directly maps terms back and forth.

Technically Erlang could go much further, but much like multicore support took forever, I guess due to lack of funding, it doesn't. Things like:

1. When transferring between two compatible nodes, or processes on the same node, 'virtually' serialize/deserialize skipping both operations and transferring pointer ownership to the other process instead.

2. When transferring between compatible nodes on different servers, use internal formats closer to mem representation rather than fully serializing to standard ETF/BERT