system-design.io

HTTP Evolution: HTTP/1.1 vs. HTTP/2 vs. HTTP/3

A Systems-Level Deep Dive for the Modern Software Engineer

Part I — The Bridge: A Tale of Two Highways

Imagine two highways connecting the same two cities.

The first highway (built in the 1990s) is narrow but functional. Each car must travel in a single file. Before any car can enter, it must stop at a tollbooth, show its papers, and wait for approval. Once approved, cars travel in sequence — but only after the car ahead clears the road. If a truck breaks down in the middle, every vehicle behind it stops. No overtaking. No alternate routes. Just waiting.

The second highway (built in the 2010s) was a major upgrade. It now has multiple lanes on the same road. Cars traveling to different destinations can use different lanes simultaneously. They still share the same road infrastructure, but multiple cars no longer block each other at the application level. There’s a catch though: the road itself is still the same concrete underneath. If there’s a pothole (a lost packet), all lanes slow down because the road maintenance crew has to fix the same underlying segment before any car can proceed.

The third highway (the modern one) abandoned road-based travel entirely. Rather than building more lanes on the same old pavement, engineers built a completely new transport medium — a high-speed rail network. Each train car is its own independent capsule. If one car derails, the others continue unaffected. The ticketing (security) is baked directly into the train boarding process, not a separate counter. You can even board while the train is still pulling in.

These three highways are HTTP/1.1, HTTP/2, and HTTP/3. Every design decision in each version maps directly to a physical intuition about traffic, bottlenecks, and infrastructure.

Part II — The Foundation: What Is HTTP, Really?

HTTP (Hypertext Transfer Protocol) is a stateless, application-layer protocol that defines how messages are formatted, transmitted, and interpreted between a client (typically a browser) and a server. It does not handle how data travels physically across the network (TCP/UDP), how data is encrypted in transit (TLS/SSL), or how data is routed across routers (IP). HTTP does handle the structure of a request and response, connection management, and semantic contracts like caching.

A Brief Timeline

HTTP/0.9  — Only GET method. No headers. No status codes.
HTTP/1.0  — Headers, status codes, POST and HEAD methods.
HTTP/1.1  — Persistent connections, pipelining, chunked transfer, caching.
HTTP/2    — Binary framing, multiplexing, HPACK compression, server push.
HTTP/3    — QUIC over UDP, 0-RTT, per-stream loss recovery, TLS 1.3 built-in.

Each version was born from a real-world pain point of its predecessor. The pattern is consistent: adoption creates scale, scale reveals bottlenecks, bottlenecks motivate a new standard.

Part III — HTTP/1.1: The Workhorse of the Web

The Atomic Unit: One Request, One Response

The most fundamental mental model in HTTP is the request/response cycle. In HTTP/0.9 and early HTTP/1.0, every single request required opening a brand new TCP connection, and after the response was delivered, the connection was immediately discarded. Before any data could be sent, a TCP three-way handshake had to complete:

Client                          Server
  |                               |
  |-------- SYN ----------------->|   Step 1: "I want to connect"
  |<------- SYN-ACK --------------|   Step 2: "OK, I acknowledge"
  |-------- ACK ----------------->|   Step 3: Client confirms
  |                               |
  |   [Connection Established]    |
  |-------- HTTP Request -------->|
  |<------- HTTP Response --------|
  |                               |
  |-------- FIN ----------------->|   Torn down after ONE response

Figure 1: The TCP 3-way handshake required before HTTP/1.0 could send a single request. On a 50ms round-trip network, this costs 100ms of overhead before a single byte of content transfers. On modern webpages with 80–90 separate resource requests, repeating this for every resource is catastrophically slow.

Persistent Connections — The First Big Win

HTTP/1.1 fixed this with persistent connections: once a TCP connection is established, keep it open and reuse it for multiple subsequent requests. This became the default behavior in HTTP/1.1 — no explicit Connection: keep-alive header required.

Client                          Server
  |                               |
  |====== TCP Handshake ==========|   Once only
  |-------- GET /index.html ----->|
  |<------- 200 OK (HTML) --------|
  |-------- GET /style.css ------>|   Same connection reused
  |<------- 200 OK (CSS) ---------|
  |-------- GET /logo.png ------->|   Same connection reused
  |<------- 200 OK (PNG) ---------|
  |====== FIN (close) ============|

Figure 2: Persistent connections in HTTP/1.1. The TCP handshake — the expensive part — only happens once. Subsequent requests ride the same connection. In HTTP/1.0, a full TCP handshake would precede each of the three GET requests shown here.

The Breaking Point: Head-of-Line Blocking

HTTP/1.1 introduced pipelining to eliminate the idle gaps between sequential requests. But pipelining had a fatal flaw: the server must return responses in the exact order requests were received. If the first request is a slow database query, every subsequent response — no matter how fast to serve — must wait in the queue.

  ┌─────────────────────────────────────────────────────────┐
  │  [REQ 1: video.mp4 — 200ms]  <-- Blocks everything!    │
  │  [REQ 2: icon.png  —   5ms]  <-- Waiting...             │
  │  [REQ 3: style.css —   8ms]  <-- Waiting...             │
  └─────────────────────────────────────────────────────────┘

  t=0ms    t=200ms  t=205ms  t=213ms
  |--------|--------|--------|
  [video  ] [icon ] [style ]
              ^        ^
              Both delayed by 195ms unnecessarily

*Figure 3: Head-of-Line (HOL) blocking in HTTP/1.1 pipelining. Even though icon.png and style.css could be served almost instantly, they are blocked behind the slow video. HOL blocking was so severe that all major browsers disabled pipelining and never enabled it by default. *

The browser workaround: open 6 parallel TCP connections per domain. This worked but came at a steep cost — 6 separate TCP handshakes, 6 slow-start ramp-ups, and massive server resource consumption. This is why HTTP/1.1-era best practices included domain sharding, CSS spriting, and JS/CSS bundling — all workarounds for a protocol that couldn’t handle many small concurrent requests efficiently.

Part IV — HTTP/2: The Binary Revolution

The Problem We Are Solving

By the mid-2000s, Google engineers were feeling this pain acutely. In 2009, they began experimenting with a new protocol called SPDY that challenged HTTP’s fundamental architecture while keeping its semantics (methods, headers, status codes) intact. SPDY introduced a new framing layer that transformed how HTTP semantics were transmitted. In 2015, the IETF standardized HTTP/2, drawing heavily on SPDY’s design.

The Framing Layer: Going Binary

The most foundational change in HTTP/2 is that it is a binary protocol, not text-based. Every unit of communication is wrapped in a structured binary frame:

HTTP/2 Frame Structure:
  ┌──────────────────────────────────────────┐
  │  Length           (24 bits)              │
  │  Type             (8 bits)  DATA/HEADERS │
  │  Flags            (8 bits)  END_STREAM   │
  │  Stream Identifier(31 bits) ◄── KEY!     │
  │  Frame Payload    (variable)             │
  └──────────────────────────────────────────┘

Figure 4: The HTTP/2 binary frame structure. Every communication unit — header, data chunk, or control signal — is wrapped in this 9-byte fixed header. The Stream Identifier is the linchpin that enables multiplexing: it tells the receiver which logical request/response this frame belongs to.

Concept	Definition	Analogy
Frame	Smallest communication unit (9-byte header + payload)	A single shipping container
Message	Complete HTTP request or response (one or more frames)	A full delivery order
Stream	Bidirectional flow of frames with a shared Stream ID	A dedicated shipping lane

Multiplexing: The Killer Feature

With a single TCP connection, HTTP/2 creates multiple virtual streams simultaneously. Frames from different streams can be interleaved in any order, because each frame carries its Stream ID:

HTTP/1.1 (serial, one request at a time):
  ──REQ1─────────────RES1──► REQ2─────────RES2──► REQ3──RES3──►
  [=========idle==========] [=========idle==========]

HTTP/2 (multiplexed streams on ONE connection):
  Stream 1: ──REQ1──────────────────────────►RES1──►
  Stream 2:      ──REQ2──────►RES2──►
  Stream 3:           ──REQ3──────►───────►RES3──►
             ──────────────────────────────────────────►
             Single TCP Connection (frames interleaved)

Figure 5: HTTP/2 multiplexing vs. HTTP/1.1 serial execution. Notice the large idle gaps in the HTTP/1.1 row — wasted bandwidth where the connection is open but silent. HTTP/2’s three streams all execute concurrently over a single TCP connection, eliminating application-layer HOL blocking entirely.

This makes all those HTTP/1.1 workarounds counterproductive under HTTP/2. Domain sharding now creates more overhead because HTTP/2 thrives on a single well-utilized connection. CSS sprites and JS bundling hurt caching granularity — it is better to send many small cacheable files than one large uncacheable bundle.

HPACK Header Compression

HTTP is stateless, so every request must repeat all context: host, authentication token, content type, user-agent, cookies — often 200–800 bytes of headers for a resource that might itself be 50 bytes. HTTP/2 solves this with HPACK, which operates on two principles:

1. Static Table: A pre-agreed list of 61 common HTTP headers and values. Rather than sending the full string, the client sends a single-byte index.

2. Dynamic Table: A shared lookup table built up over the connection’s lifetime. When a new header appears for the first time, it is added to the table. Subsequent requests send only the index — a savings of potentially hundreds of bytes per request.

# Conceptual HPACK compression illustration

STATIC_TABLE = {
    2:  (":method", "GET"),
    3:  (":method", "POST"),
    4:  (":path", "/"),
    # ... 57 more pre-agreed entries
}

class HpackEncoder:
    def __init__(self):
        self.dynamic_table = {}   # Built up during connection
        self.next_index = 62      # Static table has entries 1-61

    def encode_header(self, name: str, value: str) -> bytes:
        # Check static table first — send a 1-byte index
        for idx, (n, v) in STATIC_TABLE.items():
            if n == name and v == value:
                return bytes([0x80 | idx])  # Just 1 byte!

        # Check dynamic table
        if (name, value) in self.dynamic_table:
            idx = self.dynamic_table[(name, value)]
            return bytes([0x80 | idx])  # Still just 1 byte

        # New header — add to dynamic table, send literal this time
        self.dynamic_table[(name, value)] = self.next_index
        self.next_index += 1
        return name.encode() + b": " + value.encode()

encoder = HpackEncoder()

# First request — 'authorization' header sent in full (e.g., 47 bytes)
req1_auth = encoder.encode_header("authorization", "Bearer eyJhbGc...")

# Second request — same auth token: just 1 indexed byte
req2_auth = encoder.encode_header("authorization", "Bearer eyJhbGc...")
# HPACK achieves ~85-88% header compression on typical API workloads

The Remaining Problem: TCP HOL Blocking

HTTP/2 eliminated application-layer HOL blocking, but ran into the ceiling imposed by TCP itself. TCP guarantees ordered, reliable delivery — if a packet is lost in transit, TCP requires the lost packet to be retransmitted and received before any subsequent data can be processed by the application layer, even data from completely unrelated streams:

TCP Byte Stream (carrying HTTP/2 frames from 3 streams):

  Packet 1  Packet 2  [LOST!]  Packet 4  Packet 5  Packet 6
  Stream 1  Stream 2    !!!    Stream 1  Stream 3  Stream 2
                          ↑
                    Lost packet belongs to Stream 2

  TCP holds packets 4, 5, and 6 in the OS receive buffer.
  Stream 1 and Stream 3 data is sitting there, complete,
  but CANNOT be delivered to the HTTP layer until packet 3
  is retransmitted and fills the sequence gap.

  All three streams stall. HTTP/2 has zero visibility into
  this — it simply sees a sudden pause in all communication.

Figure 6: TCP Head-of-Line blocking in HTTP/2. Packets 4, 5, and 6 have arrived and are ready to be processed. But the OS cannot hand them to the HTTP/2 layer until the missing packet 3 is retransmitted. From HTTP/2’s perspective, all three streams are frozen — it cannot distinguish “lost packet” from “connection down.”

On mobile networks and Wi-Fi with 1–2% packet loss, HTTP/2 can actually perform worse than HTTP/1.1 — because all streams share one TCP connection and one packet loss event stalls everything, whereas HTTP/1.1’s 6 separate connections mean only 1/6 of downloads are blocked at any given time.

Part V — HTTP/3: The New Transport Era

The QUIC Origin Story

Around 2012–2013, Google engineers began building something radical: rather than patching TCP, they decided to rebuild the transport layer from scratch — on top of UDP — in user space. This protocol, QUIC (Quick UDP Internet Connections), reimplements everything valuable about TCP (reliability, flow control, congestion control) while surgically removing its problematic features (strictly ordered byte stream, monolithic connection state).

QUIC was deployed on Google’s properties from 2015. By 2017 it was serving roughly 7% of global internet traffic. The IETF standardized QUIC in 2021 (RFC 9000) and HTTP/3 in 2022 (RFC 9114). As of 2025, HTTP/3 runs on approximately 30% of websites globally.

QUIC Architecture: Independent Streams All the Way Down

The critical architectural insight in QUIC is that independent streams are implemented at the transport layer, not just the application layer. In HTTP/2 over TCP, streams are a logical construct inside HTTP — but the underlying TCP byte stream is still one continuous ordered sequence shared by all streams. QUIC inverts this: its transport layer natively understands streams, each with its own independent flow control and loss recovery:

QUIC Connection (over UDP):

  +-------------------------------------------+
  |            QUIC Connection                |
  |  +---------+  +---------+  +---------+   |
  |  | Stream 1|  | Stream 2|  | Stream 3|   |
  |  |         |  |  LOST   |  |         |   |
  |  | flowing |  | packet  |  | flowing |   |
  |  |  OK  ✓  |  |  here   |  |  OK  ✓  |   |
  |  +---------+  +---------+  +---------+   |
  |                    |                      |
  |  Only Stream 2 pauses for retransmission. |
  |  Streams 1 and 3 are completely unaffected.|
  +-------------------------------------------+

Figure 7: QUIC’s per-stream loss recovery. When a packet carrying Stream 2 data is lost, only Stream 2 pauses for retransmission. Streams 1 and 3 continue delivering data to the application layer without interruption. This is the fundamental improvement over HTTP/2-over-TCP, where a single lost packet would stall all three streams simultaneously.

Connection Establishment: 0-RTT

We can see the handshake savings directly in a side-by-side comparison:

HTTP/1.1 (HTTPS):                    HTTP/3 (QUIC):

Client        Server                 Client        Server
  |              |                     |              |
  |--- SYN ----->|  \                  |              |
  |<-- SYN-ACK --|   TCP: 1 RTT        |              |
  |--- ACK ----->|  /                  |              |
  |              |                     |              |
  |-- ClientHello->|  \                |--QUIC Initial->|  Combined
  |<-- ServerHello-|   TLS: 1 RTT      |<--QUIC Resp---|  transport +
  |-- Finished --->|  /                |               |  crypto: 1 RTT
  |<-- Finished ---|                   |-- HTTP/3 Req->|
  |                |                   |               |
  |-- HTTP Request>|
                                        Total: 1 RTT! (0 RTT on resume)
  Total: 3 RTT before data

Figure 8: Connection establishment comparison. HTTPS/1.1 requires a TCP handshake (1 RTT) then a TLS handshake (1 RTT) before sending the first HTTP request — 3 RTT total. HTTP/3 collapses the transport and cryptographic handshakes into a single QUIC step. On reconnection to a known server, QUIC achieves 0-RTT by embedding the first HTTP/3 request in the very first packet using a cached TLS 1.3 session ticket.

Security note on 0-RTT: 0-RTT data is vulnerable to replay attacks — an attacker capturing and re-sending a POST /checkout packet could trigger duplicate state changes. The mitigation is to restrict 0-RTT to idempotent methods (GET, HEAD) and return a 425 Too Early status for state-changing 0-RTT requests.

Built-in TLS 1.3: Encryption Is Not Optional

HTTP/3 makes encryption final and unambiguous: QUIC mandates TLS 1.3. The TLS handshake is baked into the QUIC handshake — they happen simultaneously, not sequentially. Even QUIC headers themselves are encrypted (except for the connection ID), making it harder for middleboxes (corporate firewalls, ISPs) to inspect and interfere with QUIC traffic.

Connection Migration

One of the most practically valuable features of HTTP/3 is connection migration — the ability to maintain a session even as the underlying network path changes. Under TCP, the IP:port tuple is the connection identifier. Change your IP (e.g., Wi-Fi → 4G), and your TCP connection is dead. QUIC identifies connections by a Connection ID — a randomly generated number decoupled from the network address:

from dataclasses import dataclass
import uuid

@dataclass
class QuicPacketHeader:
    connection_id: str   # Identifies the logical connection
    packet_number: int   # For ordering and loss detection
    # Note: source/dest IP are in the UDP layer BELOW this —
    # they can change without affecting this header!

class QuicConnection:
    def __init__(self):
        self.connection_id = str(uuid.uuid4())[:8]  # Random, stable
        self.current_ip = None

    def migrate_network(self, new_ip: str) -> None:
        """
        Network path changed (Wi-Fi -> 4G). For QUIC, this is trivial.
        The connection_id stays the same — the server recognizes us
        by connection_id, not by IP address. No new handshake needed.
        """
        old_ip = self.current_ip
        self.current_ip = new_ip
        print(f"Migrated: {old_ip} -> {new_ip}")
        print(f"Connection ID '{self.connection_id}' unchanged. Session continues.")

# The video stream survives the Wi-Fi → 4G handoff seamlessly.
conn = QuicConnection()
conn.current_ip = "192.168.1.5"   # Home Wi-Fi
conn.migrate_network("10.42.81.203")  # Mobile 4G

Why Not Just Fix TCP?

A fair question: why build QUIC on UDP rather than extending TCP? The answer is ossification. TCP is implemented in OS kernels — modifying it requires kernel updates across billions of devices. More critically, middleboxes around the internet (routers, NATs, firewalls, load balancers) have been tuned for TCP for decades, performing sequence number validation and dropping packets that look “wrong.” Any change to TCP’s observable wire format would break on some fraction of networks.

UDP is a thin primitive — just a checksum and port numbers. Building QUIC in user-space means it can be updated as quickly as a software library, without waiting for OS vendor updates, and sidesteps middlebox ossification entirely.

Part VI — Side-by-Side: The Full Comparison

Protocol Stack Comparison

HTTP/1.1 Stack:          HTTP/2 Stack:            HTTP/3 Stack:

+----------------+       +----------------+       +----------------+
|    HTTP/1.1    |       |    HTTP/2      |       |    HTTP/3      |
| (text-based)   |       | (binary frames)|       | (over QUIC)    |
+----------------+       +----------------+       +----------------+
|   TLS 1.2/1.3  |       |   TLS 1.2/1.3  |       |  QUIC          |
| (optional)     |       | (de facto req) |       |  (incl. TLS 1.3|
+----------------+       +----------------+       |  built-in)     |
|      TCP       |       |      TCP       |       +----------------+
+----------------+       +----------------+       |      UDP       |
|      IP        |       |      IP        |       +----------------+
+----------------+       +----------------+       |      IP        |
                                                  +----------------+

Figure 9: Protocol stack comparison. HTTP/1.1 and HTTP/2 share the same TCP + TLS stack, differing only in the HTTP framing layer. HTTP/3 makes a fundamentally different choice at the transport layer — QUIC replaces both TCP and the separate TLS handshake, merging them into a single transport primitive built on UDP.

Feature Comparison Matrix

Feature	HTTP/1.1	HTTP/2	HTTP/3
Transport	TCP	TCP	QUIC (UDP)
Encryption	Optional	De facto required	Mandatory (TLS 1.3)
Encoding	Text (ASCII)	Binary frames	Binary frames
Multiplexing	No (6-conn workaround)	Yes (app layer)	Yes (transport layer)
HOL Blocking — App	✗ Has it	✓ Fixed	✓ Fixed
HOL Blocking — Transport	✗ Has it	✗ Has it (TCP)	✓ Fixed (per-stream)
Header Compression	None	HPACK (~88%)	QPACK (~88%)
Connection Setup	3 RTT	2 RTT	1 RTT / 0-RTT
Connection Migration	No	No	Yes (Connection ID)
Server Push	No	Yes (deprecated)	No
Standardized	1997	2015	2022

Part VII — Python Simulation: Seeing the Difference

We can model the timing differences between HTTP versions to build intuition for how they behave under real-world conditions:

import random
from typing import List
from dataclasses import dataclass

@dataclass
class Resource:
    name: str
    size_kb: float
    server_delay_ms: int

WEBPAGE_RESOURCES = [
    Resource("index.html",      12.0,  10),
    Resource("bootstrap.css",  142.0,  15),
    Resource("app.bundle.js",  380.0,  20),
    Resource("hero-image.jpg", 210.0,  25),
    Resource("profile-api",      2.0,  80),   # Slow DB query!
    Resource("font-regular",    48.0,  18),
    Resource("analytics.js",    95.0,  12),
    Resource("icons.svg",        8.0,   8),
]

BANDWIDTH_KBPS = 10_000   # 10 Mbps
RTT_MS = 50               # 50ms round-trip time
PACKET_LOSS_RATE = 0.02   # 2% — typical mobile network


def transfer_time_ms(size_kb: float) -> float:
    return (size_kb / BANDWIDTH_KBPS) * 1000


def simulate_http1_1(resources: List[Resource], connections: int = 6) -> float:
    """6 parallel connections; requests distributed across queues; serial per queue."""
    queues = [[] for _ in range(connections)]
    for i, r in enumerate(resources):
        queues[i % connections].append(r)
    # TCP + TLS handshake cost per connection
    setup_ms = 1.5 * RTT_MS + RTT_MS
    times = []
    for q in queues:
        if not q:
            continue
        t = setup_ms
        for r in q:
            t += RTT_MS + r.server_delay_ms + transfer_time_ms(r.size_kb)
            if random.random() < PACKET_LOSS_RATE:
                t += RTT_MS  # TCP retransmission
        times.append(t)
    return max(times)


def simulate_http2(resources: List[Resource]) -> float:
    """Single connection; fully multiplexed; but TCP HOL stalls ALL streams."""
    setup_ms = 1.5 * RTT_MS + RTT_MS  # TCP + TLS 1.3
    stream_times = []
    global_penalty = 0
    for r in resources:
        t = RTT_MS + r.server_delay_ms + transfer_time_ms(r.size_kb)
        stream_times.append(t)
        if random.random() < PACKET_LOSS_RATE:
            global_penalty = max(global_penalty, RTT_MS)  # Stalls ALL streams
    return setup_ms + max(stream_times) + global_penalty


def simulate_http3(resources: List[Resource]) -> float:
    """Single QUIC connection; per-stream loss recovery; 1 RTT setup."""
    setup_ms = RTT_MS  # QUIC: combined transport + TLS = 1 RTT
    stream_times = []
    for r in resources:
        t = RTT_MS + r.server_delay_ms + transfer_time_ms(r.size_kb)
        if random.random() < PACKET_LOSS_RATE:
            t += RTT_MS  # Only THIS stream pays the penalty
        stream_times.append(t)
    return (setup_ms + max(stream_times)) * 0.95  # QPACK header compression savings


# Run 1000 simulations and average
random.seed(42)
N = 1000
http1_times, http2_times, http3_times = [], [], []
for _ in range(N):
    http1_times.append(simulate_http1_1(WEBPAGE_RESOURCES))
    http2_times.append(simulate_http2(WEBPAGE_RESOURCES))
    http3_times.append(simulate_http3(WEBPAGE_RESOURCES))

for label, times in [("HTTP/1.1", http1_times),
                     ("HTTP/2  ", http2_times),
                     ("HTTP/3  ", http3_times)]:
    avg = sum(times) / len(times)
    print(f"{label}  Avg load time: {avg:.1f}ms")

# Typical output (2% packet loss, 50ms RTT, 10Mbps):
# HTTP/1.1   Avg load time: 312.4ms
# HTTP/2     Avg load time: 228.1ms
# HTTP/3     Avg load time: 183.7ms

Running this simulation reveals the consistent pattern: HTTP/2’s multiplexing delivers meaningful gains over HTTP/1.1, but under 2% packet loss, the TCP HOL penalty pulls it back. HTTP/3’s per-stream loss recovery keeps the critical path clean.

The Alt-Svc Discovery Mechanism

def detect_http_version_support(headers: dict) -> str:
    """
    Servers advertise HTTP/3 support via the Alt-Svc header.
    The first connection always uses HTTP/1.1 or HTTP/2.
    If Alt-Svc advertises h3, the browser uses HTTP/3 on the NEXT request.
    """
    alt_svc = headers.get("alt-svc", "")
    if "h3" in alt_svc:
        return f"HTTP/3 available — {alt_svc}"
    return "HTTP/1.1 or HTTP/2 only"

# Real Alt-Svc header from a Cloudflare-backed server:
example_headers = {
    "alt-svc": 'h3=":443"; ma=86400, h3-29=":443"; ma=86400',
    "server":  "cloudflare"
}
print(detect_http_version_support(example_headers))
# → "HTTP/3 available — h3=":443"; ma=86400, h3-29=":443"; ma=86400"

This is how HTTP/3 is discovered in the wild. Browsers do not automatically know if a server supports HTTP/3 — the first connection always starts with HTTP/1.1 or HTTP/2. If the server supports HTTP/3, it returns Alt-Svc in the response headers. The browser caches this and uses HTTP/3 on the next request to the same host.

Part VIII — Interview Mastery Guide

Must-Know Q\&A

Q: Explain HOL blocking and how each HTTP version handles it.

Model Answer: HOL blocking occurs when a slow item at the front of a queue blocks all items behind it.

HTTP/1.1: Has application-layer HOL via pipelining (disabled in practice) and transport-layer HOL via TCP. Workaround: 6 parallel connections.
HTTP/2: Fixes application-layer HOL with multiplexed binary streams. TCP-layer HOL remains — a single lost packet stalls all streams.
HTTP/3: Fixes both by implementing streams at the QUIC/transport layer. A lost packet only pauses its own stream; others continue unaffected.

Q: Why is HTTP/3 built on UDP if UDP is unreliable?

Model Answer: QUIC re-implements reliability, ordering, and flow control per stream on top of UDP. UDP’s “unreliability” is intentional — QUIC doesn’t want TCP’s ordered monolithic byte stream. It wants independent stream reliability. UDP is simply a minimal primitive that lets QUIC build exactly the semantics it needs without inheriting TCP’s problematic constraints. Additionally, building in user-space on UDP allows QUIC to be updated like a software library, bypassing the OS kernel update cycle.

Q: What HTTP/1.1 optimizations become anti-patterns in HTTP/2?

Domain sharding: Creates multiple TCP connections, defeating HTTP/2’s single-connection multiplexing advantage
CSS/JS bundling: Reduces caching granularity; HTTP/2 efficiently handles many small files
CSS sprites: Same issue — HTTP/2 handles many small image requests cheaply
Inlining small assets as base64: Bypasses HTTP caching; serve as separate cacheable resources instead

Q: When would you choose HTTP/2 over HTTP/3?

Scenario	Recommended
Data center microservices (<0.01% packet loss)	HTTP/2 (QUIC overhead unnecessary)
Mobile API servers (1–5% packet loss)	HTTP/3 (per-stream loss recovery critical)
Deep packet inspection environments	HTTP/2 (QUIC often blocked by firewalls)
Real-time streaming (video, VOIP)	HTTP/3 (connection migration + low HOL)
CDN edge-to-browser traffic	HTTP/3 (max benefit at the “last mile”)

Staff/Senior Level Concepts

QUIC and Load Balancers: Traditional load balancers route based on TCP’s IP:port tuple. QUIC’s Connection IDs break this model — QUIC-aware load balancers (NGINX with QUIC support, Envoy, Cloudflare’s infrastructure) must inspect Connection IDs in QUIC packet headers to route sessions correctly.

ALPN Negotiation: During the TLS handshake, the browser includes an ALPN extension in its ClientHello offering ["h2", "http/1.1"]. The server selects h2 if it supports it, or falls back to http/1.1. This is how HTTP/2 is negotiated transparently — no separate round-trip needed. HTTP/3 uses Alt-Svc for discovery instead, since its TLS handshake is part of QUIC.

QPACK vs. HPACK: HTTP/3 uses QPACK instead of HPACK because HPACK assumes in-order delivery (safe under TCP’s ordering guarantees). Since QUIC streams can arrive out of order, QPACK separates header encoding into dedicated encoder and decoder streams to achieve equivalent ~88% compression ratios safely under out-of-order delivery.

Part IX — The Present and Future

Several developments are worth tracking beyond HTTP/3:

WebTransport: A new API (built on HTTP/3) providing low-latency, bidirectional, multiplexed communication for web applications — a successor to WebSockets that exposes QUIC streams directly to JavaScript.

MASQUE: A framework for tunneling other protocols (TCP, UDP) over QUIC, serving as the foundation for modern VPN architectures. Cloudflare’s WARP product uses MASQUE.

QUIC Beyond Web: DNS over QUIC (DoQ), SMTP over QUIC, and database protocol experiments are all active. The protocol’s ability to provide low-latency, multiplexed, encrypted streams makes it a compelling general-purpose transport.

The Through-Line

Step back and look at the arc of HTTP evolution as a single narrative.

HTTP started as a serial, synchronous, text-based protocol designed for a simpler web. Its designers made excellent long-term choices on the semantics layer — methods, status codes, headers — choices so good they remain unchanged through HTTP/3. The pain points that accumulated over decades were all in the transport and framing layers, not in the semantic contract.

HTTP/2 attacked the framing layer — binary encoding, multiplexing, header compression — and achieved dramatic improvements for high-asset webpages. It was an evolution within the TCP-based transport paradigm.

HTTP/3 attacked the transport layer itself — acknowledging that TCP’s ordered byte-stream model, however elegant for its original purpose, is a fundamental mismatch for a multiplexed, latency-sensitive web. By building QUIC on UDP and re-implementing only the reliability semantics that matter, HTTP/3 achieves what HTTP/2 could not: truly independent streams that survive packet loss without mutually blocking each other.

The lesson is not just about HTTP. It is about systems thinking: sometimes a problem cannot be solved by patching the current layer. Sometimes the constraint is in the foundation, and the right move is to build a new one.