HTTP/3 and QUIC

August 2020

The jump from HTTP/1 to HTTP/2 brought noticeable improvement to performance and reliability. It took over 15 years to get there.

Some five years later, the draft spec for HTTP/3 exists, and few top players have already rolled out initial support.

To understand what changes will HTTP/3 bring, let's look in the past and see the problems HTTP/2 was trying to solve.

But first, a few definitions:

HTTP

HTTP is a HyperText Transfer Protocol and is an application layer of the Internet Protocol Suite. It defines the exchange of information on the web and what happens between the moment the client requests a resource and the resource is delivered and processed. HTTP is a high-level protocol and relies on transport and network layers to deliver the data.

Transport protocols

TCP

TCP stands for Transmission Control Protocol. It's a sophisticated, reliable, and robust transport layer. TCP does not transmit any data between the client and the server before a connection can be confirmed and a handshake performed. TCP guarantees a packet delivery order, and if one of the packets in the transmission is lost, TCP resends the entire bundle. This overhead is an obvious downside of the protocol.

UDP

UDP is a User Datagram Protocol. It is a transport layer, sending datagrams (packets) without confirming that connection exists, packets were received by the server, or caring about the order of segments in the transmission. The typical use cases for UDP value speed over accuracy (for example, VoIP, where the small glitch is not detrimental to functionality). The downside (aside from obvious) is being susceptible to flooding attacks and the fact that it is commonly blocked for security reasons. Because of the connectionless state of UDP, it's hard to create a secure firewall policy that would allow only specific UDP packets.

From HTTP/1 to HTTP/2

From the late '90s to 2015 web had changed a lot. Wider acceptance and adoption of the Internet resulted in more sites and a substantial amount of content.

In the beginning, HTTP/1.0 was a plain text request/response protocol that would deliver one piece of content at a time. If the page contained an image and the text block, the client would request an image, wait until received, close the connection - and only then initiate a new request for the text. This approach worked great on tiny web pages, but with the page size increase, the latency became a noticeable problem.

In an attempt to address the performance lag, HTTP/1.1 introduced the concept of pipelining, where requests for more resources could be made immediately without waiting for the previous response to come back. Pipelining had never gained full acceptance, and the processing was still synchronous, processing requests in FIFO order. It took one slow response to block the entire queue behind it.

This behavior is known as HOL (head-of-line) blocking, and it occurs because the packets are processed in FIFO order.

Few more popular solutions to HTTP/1 HOL blocking were resource bundling and using multiple connections.

Spriting is bundling multiple small images in a single larger file. Spriting improved performance because instead of requesting numerous tiny resources, the client requested and downloaded a single file, saving round trips.

The downsides of spriting were wasting resources as the entire file had to be downloaded even if just one tiny image was needed or changed. And increased complexity to create and maintain graphic resources didn't help either.

Domain sharding was another creative solution, taking advantage of loading resources over multiple connections. Because browsers were limiting the number of connections per domain, sharding (i.e., serving static resources of the subdomain) allowed to increase the limit.

The downside to sharding was latency. It took a while to initiate a connection (requiring multiple round trips for initial handshakes and data exchange), so creating multiples introduced a significant overhead.

Both of these approaches sounded like hacks, where instead of working with technology, you had to fight against it. There were simply no reasonable solutions to HTTP/1.0 (HTTP/1.1) shortcomings.

HTTP/2 was intended to resolve these issues after the experimental SPDY protocol developed at Google showed that performance improvements are possible.

HTTP/2 Performance Improvements

While HTTP/1 is processing requests synchronously, HTTP/2 is capable of multiplex processing. It uses the same connection to spawn a stream for each request/response pair, processing responses as they come in.

How would we know which request the packet belongs to? By moving to a binary protocol and labeling each frame with the stream id.

Using a single connection also removes the connection count limitation that domain sharding meant to address.

Stream prioritization is another significant improvement - instead of FIFO, more important requests are given priority if bandwidth limits are being reached. Prioritization helps to prevent the situation where the least important resource blocks more important ones from being rendered.

The server push saves repeated traffic by allowing the server to send multiple requests for a single response. For example, when the page starts loading, scripting and CSS are required resources to render it correctly, so it doesn't make sense to wait for a new request. Instead, the server can push them to the client, saving a trip.

The last major difference between HTTP/2 and HTTP/1 is header cross-request compression, as information in response headers is often redundant.

HTTP/2 design flaw

Under normal conditions, HTTP/2 is faster than HTTP/1. With the lossy connection - where a certain percentage of packets is lost - HTTP/2 can be slower.

How could that be?

HTTP/2 solves the problems in the application layer but still relies on TCP as the underlying transport protocol.

Congestion Control

Congestion Collapse is a case where the amount of traffic send exceeds the network bandwidth. As a result, packets are dropped, which triggers retransmission. In turn, that increases traffic congestion, even more, causing network collapse.

TCP is designed to prevent this condition by congestion avoidance. Instead of bursting all packets after the initial connection is established, the algorithm slowly ramps up the amount of transmitted data until full bandwidth capacity is reached. The congestion window size is scaled back if packets are starting to drop - or if the connection is idle.

Congestion avoidance is a good thing. Aside from preventing collapse, it ensures the fair distribution of available resources. But it also can cause inefficiencies - by not utilizing the full possible bandwidth for the first few exchanges. Because web traffic comes in bursts (after the user finished reading the current resource and clicked the link), the problem is recurring. The dial back will also occur if the packets were lost due to a bad data-link connection, for example, glitchy Wi-Fi.

HOL Blocking, but different.

While the application layer of HTTP/2 had solved HOL problem of HTTP/1.x by utilizing streams, the underlying TCP is not aware of streams separation. TCP contract is to deliver packets in order, and that means triggering entire bundle retransmission if a single packet is lost - even if this packet belonged to a different stream. Aside from redundancy and wasted resources, that creates the same situation as with HTTP/1, where the successfully delivered packets cannot be processed until the previous one is resent successfully.

HTTP/3 - Solving Old Problems

The next logical step in improving web performance would be to take a look at TCP.

However, TCP is hard to change, because the code is located in the kernel, so any updates would be slow to roll out. It would be easier to build something on top of the existing transport layer.

Enter QUIC. HTTP over QUIC is called HTTP/3.

QUIC

QUIC is a Quick UDP Internet Connections (although QUIC is not an acronym and reliance on UPD may be changed in the future). It's a relatively new and still being developed transport protocol intended to find the middle ground between slow and reliable (TCP) and quick and unpredictable (UDP).

QUIC is currently based on UDP. Google had developed an initial version, and IETF is currently working on standardizing it. QUIC introduces the following concepts:

Connection Startup

QUIC almost halves the number of round trips required to establish the connection by combining TCP and TLS 1.3 handshake in one request (iQUIC)

Connection migration

Once the connection is established, TCP treats any changes to client IP as a new connection, triggering further setup delay. Of course, it's no longer practical in the world of mobile devices, Wi-Fi, and cellular networks.

QUIC allows for connection migration by introducing a connection id and allowing the peer to announce a new path.

Solving HOL on the transport layer level

QUIC supports the multiplex concept of HTTP/2 but can tell the streams apart, meaning the loss of one packet does not trigger the entire bundle retransmission. While the lost frame is being retransmitted, the rest of the streams can continue.

Improved congestion control and loss recovery

When a packet loss occurs, QUIC assigns it a new packet ID (which grows from smaller IDs at the beginning of a transmission to larger numbers at the end).

That provides the useful data points: telling apart the initial packet vs. retransmitted packet and figuring out the average round trip time.

Why HTTP/3?

QUIC requires modifications for the HTTP header compression format (HPACK). That makes it not backward compatible with HTTP/2. This modification is necessary to address yet again the problem of HOL blocking as request information is stored in the header.

Other differences are handshake protocol and negotiation - for example, to determine if the host supports HTTP/3, the client needs to examine Al-Svc header, specifically <protocol-id>

The value of h2 indicates HTTP/2, and h3-xx means draft xx of HTTP/3 protocol.

Is this the ultimate solution?

No, it's just the next step, not without its challenges. QUIC/HTTP/3 acceptance rate would depend on many parameters.

UDP is commonly blocked (or throttled), which may slow down the adoption (when UDP is not accessible, the connection will fall back to TCP). It's possible that in the future, UDP will not be used.

Performance gains for the client are noticeable, but some argue it's not enough.

UDP is not as efficient as TCP under certain conditions causing higher CPU load on the server - although careful implementation can solve this.

Per usage trends, there is a slow but steady adaption of QUIC and HTTP/3.

Preparing for HTTP/3

Developers had their plate full with HTTP/2 rollout - standard practices like domain sharding and spriting became antipatterns that would cause performance downgrade. Fortunately, from a development standpoint, HTTP/2 and HTTP/3 are similar.

However, it never hurts to test and see how your site will perform when HTTP/3 is enabled.

Currently, server support is limited to LightSpeed and Nginx.