Edit: article also doesn't mention about keep-alive, in fact it explicitly states that HTTP/1.1 opens one connection per request which is not true. Makes me think that the HTTP/1.1 demo disables KA to make effect even more dramatic.
I have never seen it removed from a server. It is an option that some website operators may disable. However most sites enable it, or the httpd they use has it enabled by default.
Ok, sure, but if the clients don't implement it... does it matter that some servers do?
EDIT: Well, maybe it does matter. E.g., if it creates request smuggling vulnerabilities, say, in the presence of reverse proxies that don't interpret the spec the same as the servers.
I like pipelining as a 1.1 feature. I have successfully used TCP/TLS clients like netcat and openssl to retrieve text/html in bulk for many years. The servers always worked great for me.
However I never liked chunked encoding as a feature. In theory it sounds reasonable but in practice it is a hassle. As TE chunked became more widespread eventually I had to write a filter to process it. It is not perfect but it seems to work.
Not surprised chunked encoding can cause problems on the server-side. NetBSD has a httpd that tries to be standards-compliant but still has not implemented chunked requests.
HTTP/2 only has chunked encoding (it's not called that, but it's what it is). Chunked is much much nicer than not, because sometimes you don't know the length a priori.
Chunked is also unnecessary sometimes. For me, that happens to be most of the time. Sometimes I can avoid it by specifying HTTP/1.0 but not all servers respect that.
As an Opera user you were not very likely to be behind some middlebox. Middleboxes interfering with pipelined traffic was the reason it was never safe to enable by default.
Why being Opera user made me less likely to be behind a middlebox? I installed Opera wherever I was accessing the Internet. I think only place that they had heavy restrictions was a community college which restricted browser to IE, but I was able to use Opera from a thumb drive. Never had issues.
Has anybody ever given any even halfway compelling evidence of pipelining breaking websites?
Google didn't, Mozilla didn't.
I believe there was one small banking site using a ten year old IIS that didn't load - but maybe pipelining was the least of the problems there. Another that sent the wrong images to mobile Safari, or mobile Safari displayed them wrong, but was fixed.
Pipelining may not have been the technically best solution, but it certainly would have taken the impetus away from SPDY. If Mozilla had shown the courage to default to pipelining maybe the industry as a whole could have had some input on HTTP/2 instead of just rubber-stamping SPDY.
This probably could have been solved for HTTPS though, if you negotiate to http/1.1 via ALPN, then pipelining could be OK? Otherwise, http/1.1 with keepalive only should be used?
But now you're going to say that's a bug in a specific brain dead server, and it should have been fixed. I'm sure there are other bugs tracking other server's issues with pipelining, but that's the only one I remember seeing (mostly because of the amazingly terrible nature of the bug).
The point of the Bugzilla link is that enabling pipelining in Firefox would trigger bad behavior in servers. It's not some idle fear of unknown problems, it's fear informed by actual problems -- if you turn on the feature mostly things will work fine, but some stuff is going to break, and user interpretation when Firefox gets a broken page and IE (or Chrome, whatever) works, is that it must be Firefox's fault. There's not necessarily a way to detect an error in this case, so it's hard to gracefully degrade. The exciting world of (mostly) transparent proxies for plaintext http makes this even worse.
There were hundreds of millions of Opera and mobile Safari users with pipelining enabled, and the only problems you can point to are a few small things, in this case in something that doesn't even seem to be normally accessed from a browser.
I wrote a somewhat performance oriented web server some years ago, and wondered whether I should implement pipelining.
My conclusion was that pipelining is unusable in HTTP 1.x, and a potentially harmful behaviour for the server.
See, the problem is that there is no way in HTTP 1.x to map server responses to client requests. That is why the server has to send responses in the same order it received requests. This is disastrous for the server, because it means each response should be buffered until it can be sent at its correct order.
If you send 100 requests to the server, and the first one is slower to compute than the others, then you effectively need to buffer 99 responses until the first one finishes to send everything.
I'm not even sure the client would benefit much from pipelining versus concurrent requests, because the client would have to know how to order requests so that a GET for a big image does not block the whole pipeline because it was made before the small images.
If there was a way in HTTP 1.x to identify which response is for which request, then the server would be able to send responses in any order it can, and it would have been usable.
> I'm not even sure the client would benefit much from pipelining versus concurrent requests
Microsoft studied this when they evaluated SPDY and the found that pipelining does benefit, and scored nearly identically with SPDY (aka HTTP/2). So you don't have to guess; it does.
> If you send 100 requests to the server, and the first one is slower
That's not how pipelining worked. You still have your 6 concurrent connections, and you spread out the first 18 or so resources across them. If the first is slow then the other 82 are spread out among the other 5 connections, the page overall loads much faster, except for maybe a couple resources loading later than otherwise.
> because the client would have to know how to order requests
It's funny because Google found out that a priority inversion caused Maps to display much slower with HTTP/2 than with HTTP/1.1 because a key resource was put behind other less important data. Their proposed solution was to embed priorities into the JavaScript and let it fix the browser's guess, but I think they ended up just tweaking something specific to Maps.
But guess what, that would have worked for pipelining as well.
The problems with pipelining like the head-of-line blocking you described were largely overblown and easily fixable -- but Google didn't want to possibly because Chrome being extra fast only on google.com favored both their browser and services. Firefox was losing to Chrome, and Mozilla couldn't even take a minuscule risk for fear of losing more ground.
HTTP pipelining isn't intended to provide concurrency (and doesn't, unlike, say IMAP pipelining which allows for out of order responses). It helps with queuing, for when client to server round trip time is significant with respect to overall response time (including server response generation time). This can reduce the amount of connection idle time, at the risk of running into broken implementations.
Multiplexing (either with multiple tcp connections or http/2's tcp inside tcp construction, or quic's multiple tcp inside udp construction) addresses concurrency, and can address idle channels, depending on the amount of channels available.
As an HTTP server, it doesn't make a lot of sense to run pipelined processing, unless you're implementing a proxy and the origin servers are pretty far away from the intermediate. That way, you can keep requests flowing to the origin.
I call BS on the benchmarks AND the theoretical analysis. Every time I read those HTTP/X benchmarks, people don't mention TCP's congestion control and seem to just ignore it in their analysis. Well, you can't. At least if you call "realistic conditions". Congestion control will introduce additional round trips, depending on your link parameters (bandwidth, latency, OS configuration like initcwnd etc.) and limit your bandwidth on the transport layer. And based on your link parameter, 6 parallel TCP connections might achieve a higher bandwidth on cold start because the window scaling in tcp slow start is superior than in a single tcp connection used by HTTP/2.
Additionally, the most common error people do while benchmarking (and i assume the author did too) is to ignore the congestion control's caching of the cwnd in the tcp stack of your OS. That is, once the cwnd is raised from usually 10 mss (~14,6kb for most OSes and setups), your OS will cache the value and re-use the larger cwnd as initcwnd when you open a new tcp socket to the same host. So if you do 100 benchmark runs, you will have one "real" run, and 99 will re-use the cwnd cache and produce non-realistic results. Given the author didn't mention tcp congestion control at all and didn't mention any tweaks to the tcp metrics cache (you can disable or reset it through changing /proc/sys/net/ipv4/tcp_no_metrics_save on linux) I assume all the measured numbers are BS.
I have use http2 and http1 with domain shardind extensively and the advantage of http2 multiplexing cannot be denied. TCP connection overhead is negligible (other than the tsl negotiation maybe) and is not even really a factor.
TCP congestion window scaling is not negligible, especially not in high-latency environments. The initcwnd is usually 14,6kb(10x MSS of 1460 byte in common DSL setups), meaning a server can only send those 14,6 kb until the connection is stalled and an ACK is received (=1x roundtrip). Google realized this and tried to get a kernel patch into Linux where the browser (or any userspace program) can manipulate the initcwnd. The patch got rejected, and thats basically why they came up with QUIC: To be able to build on UDP and implement congestion control and other stuff TCP usually does (ordering, checksumming etc) in userspace.
Would you be interested in helping me do another run of this test, but with more realistic settings? I care about the results and getting this right but you clearly have a better understanding of how to set this up
Sorry to disappoint you, but my own opinion is: it can't really be done. There is dozens of variables in such a benchmark (latency, bandwidth, tcp settings on each node such as tcp fast open, selective ack present etc., http & tcp connection caches etc.) and unclear optimization targets.
It's not just a matter of sending data to the client and measuring when it arrived, there is a ton of other stuff to consider regarding how the HTML and its linked resources (CSS, Scripts etc.) are structured. There are even random effects at play here (what if you have a packet loss in important packets that carry ACKs etc.).
Some 6 years ago I did this for a study thesis (ungraded undergraduate essay) at university (back then HTML/2 was still SPDY) and it was an eye-opener for me - that by carefully picking your parameters you can basically conclude and simulate whatever you want the outcome to be. If you are interested, the study thesis is here: https://www.dropbox.com/s/6vu91yvfzxbzdue/studythesis.pdf?dl...
The theoretical analysis is still valid today, but the resulting benchmarks should be taken with a grain of salt as they are very test-setup and implementation specific.
Out of curiosity, I implemented the same using websockets. You'll obviously need a real implementation of a request-response interface, but this will do for the test:
// server
ws.on('message', msg => {
switch (msg) {
case 'collection':
return ws.send([...generate 500 links ...])
case 'item':
return ws.send(...generate item...)
}
})
//client
ws.send('collection');
ws.onmessage = e => {
const msg = JSON.parse(e.data)
switch (msg.type) {
case 'collection':
collection.forEach(item => ws.send('item'))
case 'item':
items.push(item);
}
}
It finishes in 140-150ms, around the same time as the best HTTP version. That's without any caching and the client requesting each of the 500 items, when in fact the server could just push them. I'm surprised WS is not used more often for data exchange.
EDIT: actually ~50ms after removing the console.log() I added for each message...
Load balancing websockets can be tricky because an individual socket could last many hours. That means when you do a software update, you need to leave the old version running for many hours.
Same applies with http, but an individual request tends not to take many hours.
Possible issue: have I misread it, or is the explanation of HTTP/3's advantages entirely missing from this section[1]? It's in the section header, but HTTP/3 isn't mentioned again until "The perfect world"[2] and that section doesn't clarify how HTTP/3 matters to the discussion.
"Compound requests are by far the fastest. This indicates that my original guess was wrong. Even when caching comes into play, it still can’t beat just re-sending the entire collection again in a single compounded response."
TCP is heavily optimized for shoving a single stream of bits across the wire. The other options have more overhead from multiple requests.
Many of my clients are behind MITM proxies that force a downgrade to 1.1, so I end up having to optimize for 1.1 and minimize total requests anyway. We've started down the path of GraphQL to aggregate requests, but it feels so counter productive (with respect to performance/effort).
I think the most interesting thing about this test is that it shows HTTP/2 isn't always faster than 1.1 --- in the first test, h1 compound is just as fast as h2, and in the second, even slightly faster. Considering all the additional complexity that HTTP/2 introduces, that doesn't seem like any improvement.
Also, "relative speeds" should really be "relative times" because otherwise the percentages imply the exact opposite. I was a little confused for a bit by that.
> in the first test, h1 compound is just as fast as h2
That's because HTTP/2 auto-compounds multiple requests to send in the same connection, thus there is no surprise there.
The main advantage of HTTP/2 is that it ensures that we can get the same benefit of compound data transfers in a completely transparent manner, without bothering to rearchitect how clients and servers operate.
> Considering all the additional complexity that HTTP/2 introduces, that doesn't seem like any improvement.
This assertion completely misses the whole point of HTTP/2, and why it's a huge improvement over HTTP/1.1.
> The main advantage of HTTP/2 is that it ensures that we can get the same benefit of compound data transfers in a completely transparent manner, without bothering to rearchitect how clients and servers operate.
Yea this assertion is spot-on, but sadly HTTP/2 still has enough overhead that if speed really is the most important requirement, compounding data is still better. I was hoping the difference would be at least smaller.
Also of note is having to terminate a single TCP/TLS connection per client vs one for every concurrent client request! Even if you are using a middle man like a CDN doing HTTP 1.1 pooling this is still reducing the established TCP/TLS connections by an order of magnitude or more.
When this demo first came out, it didn't have the right reply header to enable pipelining in the browser. Now it does, but browsers have removed pipelining support. Pipelining is a part of HTTP/1.1.
If you download a firefox from before they removed pipelining you can enable it and you'll see that there's essentially no advantage to HTTP/2 for this demo.
This demo could measure connection establishment time and bandwidth, and simulate pipelines. But I think they don't actually want you to know that even a 6-connection 1-deep pipeline is just about as good as HTTP/2.
It's interesting to see the impact that having the developer tools open has on performance, and whether the network tab is being shown or not. While HTTP2 is generally faster for me, opening the developer tools makes both significantly slower, and seems to have a larger impact on HTTP2, usually making it slower than 1.1, in both Firefox and Chrome.
Developer tools internally proxies every request. The proxy interface it uses is a custom devtools protocol, and can't handle every part of http - for example, at assumes all post request bodies are delivered in one go rather than streamed.
It also uses the same single JavaScript thread to proxy the requests as is used for rendering the devtools UI, which can't be great for performance.
https/2 is ~0.1s (1.02s /2 vs 1.11s /1.1) faster on first load and ~0.2s faster on future loads. I'm tempted to say there are diminishing returns as your internet gets faster (latency is ~11ms atm).
Very interesting article! I wonder how compound requests scale as body size increases. Would we see individual HTTP2 requests become faster after items become "large enough"?
The thing that jumps out at me about HTTP/2: even if your browser can send N parallel requests for different entities, you're still making N database queries and incurring N request overheads.
This means that every layer of your stack has to be reorganized and optimized around HTTP/2 traffic patterns...or you can just batch requests as before, and save that overhead. It makes me think that the N entities problem isn't actually the most promising use case for HTTP/2...
I don't think most resources in todays web are database-dependent. It is mostly static content (images, videos, scripts, stylesheets etc.) that are large in filesize. A single resource, however, might trigger a multitude of db queries (think calling a REST service endpoint).
Anyways, HTTP/2 allows multiplexing and waiting a single resource won't stall any other requests, which can be sent whenver they are ready.
Yes there is such a thing as an HTTP connection. So now you learned something valuable. Each connection consists of one or more HTTP requests. In HTTP/1.1 in practice you must complete an entire request and response before beginning another.
(Edited to add, since you did) There are two things you might mean by QUIC. Google's test protocol QUIC (these days often called 'gQUIC') was developed and deployed in-house, and has all sorts of weird Google-isms, it's from the same period as their SPDY which inspired HTTP/2. gQUIC is no longer under further development. They handed the specification over to the IETF years ago, and the IETF's QUIC Working Group is working on an IETF standard QUIC which will replace TCP for applications that want a robust high performance encrypted protocol.
HTTP over QUIC will be named HTTP/3 and will offer most of the same benefits as HTTP/2 (which is HTTP over TLS thus over TCP/IP) but improve privacy and optimise some performance scenarios where head-of-line blocking in TCP was previously a problem - probably some time in 2020 or 2021. The HTTPbis (bis is a French word which has similar purpose in addresses as the suffix letter a would in English e.g. instead of 45a you might live at 45bis) working group is simultaneously fixing things in HTTP/2 and preparing for HTTP/3.
"... where head-of-line blocking in TCP was previously a problem..."
Has anyone ever shared a demo where we can see this occuring with pipelined HTTP/1.1 requests
I have been using HTTP/1.1 pipelining -- using TCP/TLS clients like netcat or openssl -- for many years and I have always been very satisfied with the results
Similar to the demo in the blog post, I am requesting a series of pages of text (/articles/1, /articles/2, etc.)
I just want the text of the article to read, no images or ads. Before I send the requests I put the URLs in the order in which I want to read them. With pipelining, upon receipt I get automatic catenation of the pages into one document. Rarely, I might want to split this into separate documents using csplit.
HTTP/1.1 pipelining gives me the pages in the order I request them and opens only one TCP connection. It's simple and reliable.
If I requested the pages in separate connections, in parallel, then I would have to sort them out as/after I receive them. One connection might succeed, another might fail. It just becomes more work.
TCP blocking is seen the most on mobile data connections. The LTE could be delivering whole megabytes of correct data but if that TCP connection is missing just one packet it must buffer all the rest until it can get that SACK / ACK back to the server and get the missing packet.
If there's congestion, there is a random chance of any packets being dropped, because that's how you signal congestion reliably. If there's neither congestion nor wireless links on the route between you and the server neither this, nor most other performance considerations matter to you, that's nice you're already getting the best possible outcome.
Possibly so obvious it's not worth pointing out, but client side caching is also important for server side scaling. If your clients cache X% of responses, you can generally get away with ~X% fewer API servers.
So even if client performance doesn't improve much you should probably still cache.
[1] https://en.wikipedia.org/wiki/HTTP_pipelining
Edit: article also doesn't mention about keep-alive, in fact it explicitly states that HTTP/1.1 opens one connection per request which is not true. Makes me think that the HTTP/1.1 demo disables KA to make effect even more dramatic.
Edit2: demos don't make actual network requests.