system-design.io


P2P Networking and WebRTC

From Pigeon Post to Peer-to-Peer: The Internet’s Most Underrated Architecture


Part I: The Bridge — A Tale of Two Post Offices

Imagine two towns separated by a mountain range. In the first model — the centralized model — every letter from Town A to Town B must pass through a grand Central Post Office in the capital. The capital sorts it, stamps it, and routes it onward. This works beautifully when the capital is running smoothly. But when the capital goes down during a storm, no letters move. Everyone is cut off.

In the second model — the peer-to-peer model — the towns establish a direct courier line between them. No central office. No capital approval needed. Alice in Town A hands her letter directly to a trusted courier who knows exactly where Bob in Town B lives. The communication is direct, faster, and does not depend on any third party staying alive.

This is precisely the architectural difference between Client-Server networking (the first model) and Peer-to-Peer (P2P) networking (the second model). Most of the internet you use daily runs on the Client-Server model — your browser requests resources from a server. But some of the most powerful applications in computing history — BitTorrent, Napster, Skype, Zoom, and Google Meet — are built on the P2P model.

In this article, we will build our understanding from the ground up: starting with the philosophy of P2P, confronting the “mountain range” that blocks direct connections (the NAT problem), learning the clever tricks engineers use to cross that mountain (STUN, TURN, ICE), and finally arriving at WebRTC — the modern, browser-native framework that makes real-time peer-to-peer communication accessible to every developer.


Part II: The Anatomy of a P2P Network

Client-Server vs. P2P: The Core Trade-off

Before we dive into P2P’s architecture, we need to understand what problem it is solving. In a classic Client-Server model, a central server holds all resources and clients request them.

  ┌──────────┐       request       ┌──────────────┐
  │  Client  │ ──────────────────► │    Server    │
  │   (you)  │ ◄────────────────── │  (centralized│
  └──────────┘       response      │   authority) │
                                   └──────────────┘

This model is simple, predictable, and easy to secure. Its weakness, however, is fundamental: the server is both the bottleneck and the single point of failure. The more clients connect, the more bandwidth and compute the server needs. Scaling becomes expensive.

P2P flips this model. Every participant — called a node or peer — is both a client and a server simultaneously. Peers share resources (files, bandwidth, compute) directly with one another.

      ┌──────┐         ┌──────┐
      │ Peer │◄───────►│ Peer │
      │  A   │         │  B   │
      └──┬───┘         └───┬──┘
         │                 │
         │    ┌──────┐     │
         └───►│ Peer │◄────┘
              │  C   │
              └──────┘

Notice in the diagram above that there is no central node. Each peer communicates directly with its neighbors. This network is decentralized and resilient — removing any single peer does not destroy the network.

Structured vs. Unstructured P2P

Not all P2P networks are created equal. We can categorize them by how peers find each other:

Unstructured P2P (e.g., early Gnutella): Peers connect randomly with no predetermined topology. Finding a file requires “flooding” the network with queries — every peer asks its neighbors, who ask their neighbors. This works but is wildly inefficient at scale.

Structured P2P (e.g., BitTorrent with DHT): Peers are organized into a predictable topology using a Distributed Hash Table (DHT). Data is stored and retrieved using a consistent hashing scheme, making lookups $O(\log N)$ efficient.

The Kademlia DHT: How BitTorrent Finds Your Files

The most influential structured P2P algorithm is Kademlia, developed in 2002 by Petar Maymounkov and David Mazières. It powers the BitTorrent DHT, IPFS, and the Ethereum discovery protocol.

The core insight of Kademlia is elegant: assign every node and every piece of data a unique 160-bit ID. The “distance” between two IDs is not geographic — it is the XOR of their binary values. This XOR distance has magical mathematical properties: it is symmetric (distance(A, B) = distance(B, A)) and satisfies the triangle inequality.

 Node IDs on a logical "keyspace" ring:

   0000                    1111
    ├──────────────────────────┤
    │                          │
    ●  Node A (ID: 0010)       │
    │          │               │
    │          │ XOR distance  │
    │          ▼               │
    ●  Node B (ID: 0110)       │
    │                          │
    ●  Node C (ID: 1100)      ●  Node D (ID: 1110)

Each node maintains a routing table called a k-bucket. When Node A wants to find who owns a key, it asks the k nodes in its routing table that are closest (by XOR) to that key. Those nodes point to even closer nodes, converging on the answer in $O(\log N)$ hops.

Here is a simplified Python simulation of how a DHT node would store and look up a key:

import hashlib

def kademlia_id(value: str) -> int:
    """Generate a 160-bit node/key ID using SHA-1."""
    return int(hashlib.sha1(value.encode()).hexdigest(), 16)

def xor_distance(id_a: int, id_b: int) -> int:
    """Kademlia's XOR metric as a distance function."""
    return id_a ^ id_b

class DHTNode:
    def __init__(self, node_id: str):
        self.node_id = kademlia_id(node_id)
        self.storage: dict[int, str] = {}  # key -> value
        self.k_buckets: list[tuple[int, "DHTNode"]] = []  # (id, peer)

    def store(self, key: str, value: str):
        hashed_key = kademlia_id(key)
        self.storage[hashed_key] = value
        print(f"Stored '{key}' with hash {hex(hashed_key)[:10]}...")

    def find(self, key: str) -> str | None:
        hashed_key = kademlia_id(key)
        return self.storage.get(hashed_key)

    def closest_peers(self, target_key: str, k: int = 3) -> list:
        """Return k peers closest (by XOR) to the target key."""
        hashed_key = kademlia_id(target_key)
        sorted_peers = sorted(
            self.k_buckets,
            key=lambda x: xor_distance(x[^0], hashed_key)
        )
        return sorted_peers[:k]

The key takeaway here: DHTs allow P2P networks to locate any piece of data — or any peer — without a central directory, in logarithmic time. This is why BitTorrent can survive even if every tracker server in the world goes offline.


Part III: The Mountain Range — The NAT Problem

Why Peers Can’t Just “Call Each Other”

Here is where our courier analogy gets interesting. Alice wants to send a letter directly to Bob. But Bob doesn’t have a publicly visible address — he lives in a gated community (a private network), and the community has a single shared mailbox at the gate (the NAT device). Letters come in, but the gatekeeper only allows replies to mail that Bob initiated outward. Bob cannot receive unsolicited mail from Alice.

This is the Network Address Translation (NAT) problem — the single biggest technical challenge in P2P networking.

NAT was invented in the 1990s as a clever workaround for the fact that IPv4 only supports ~4 billion unique addresses. By allowing millions of private devices to share a single public IP, NAT extended the life of IPv4 by decades. But it fundamentally broke the end-to-end connectivity principle of the original internet.

  Private Network A            Public Internet        Private Network B
  ─────────────────           ─────────────────      ─────────────────

  Alice                          NAT B                    Bob
  192.168.1.5  ──────►  NAT A  ──────────────►  ???   ◄── 10.0.0.8
                      203.0.113.1                       172.16.0.1

  Alice knows her IP (192.168.1.5) but NAT A assigns her 203.0.113.1
  Bob is behind NAT B and has NO public IP. How does Alice reach him?

The Four Faces of NAT

NAT routers are not all the same. Their behavior determines how hard — or impossible — it is to punch through:

NAT Type Behavior P2P Difficulty
Full Cone Maps internal → external once; accepts all inbound to that external port Easy
Restricted Cone Only accepts inbound from IPs Alice has previously contacted Moderate
Port-Restricted Cone Accepts inbound only from exact IP:port combination Alice contacted Hard
Symmetric Creates a new mapping for every different destination Very Hard

The nightmare scenario for P2P engineers is when both peers are behind Symmetric NAT. In this case, the external port is unpredictable and different for every destination — making hole-punching nearly impossible without a relay.


Part IV: Crossing the Mountain — NAT Traversal

UDP Hole Punching

The most elegant hack in networking is UDP hole punching. The insight: when a device sends an outbound UDP packet, the NAT creates a temporary mapping (a “hole”) that allows inbound packets from that same destination IP:port for a short window.

The strategy:

  1. Both peers (A and B) connect to a rendezvous server and report their public IP:port
  2. The server tells A about B’s public address, and B about A’s public address
  3. Both peers simultaneously send UDP packets to each other’s public addresses
  4. These outbound packets “punch holes” through each NAT
  5. Subsequent packets from the opposite side find an open hole and pass through
      Peer A                    Rendezvous Server              Peer B
      ──────                    ─────────────────              ──────
      │                              │                         │
      ├──── "My public addr?" ───────►│                         │
      │◄─── "203.0.113.1:5000" ──────┤                         │
      │                              │◄── "My public addr?" ───┤
      │                              ├─── "198.51.100.2:6000" ─►│
      │                              │                         │
      │◄─── "B is at 198.51.100.2:6000" ─────────────────────  │
      │  ─── "A is at 203.0.113.1:5000" ────────────────────►  │
      │                              │                         │
      ├──── UDP to 198.51.100.2:6000 ─────────────────────────►│  (punches hole)
      │◄─── UDP to 203.0.113.1:5000  ──────────────────────────┤  (punches hole)
      │                              │                         │
      │◄═══════ Direct P2P Connection Established! ════════════►│
    

STUN: Your Public Address Mirror

STUN (Session Traversal Utilities for NAT, RFC 5389) is a standardized protocol that automates the “What is my public IP?” step. A peer sends a STUN request to a public STUN server, which simply reflects the peer’s public IP and port back.

import socket
import struct

STUN_SERVER = ("stun.l.google.com", 19302)
STUN_BINDING_REQUEST = 0x0001
MAGIC_COOKIE = 0x2112A442

def send_stun_request() -> tuple[str, int]:
    """Send a STUN binding request and parse the public IP:port response."""
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    sock.settimeout(3)

    # Build STUN message header: type(2) + length(2) + magic_cookie(4) + transaction_id(12)
    transaction_id = b'\x00' * 12  # Simplified: normally random bytes
    header = struct.pack(">HHI", STUN_BINDING_REQUEST, 0, MAGIC_COOKIE) + transaction_id

    sock.sendto(header, STUN_SERVER)
    response, _ = sock.recvfrom(1024)
    sock.close()

    # Parse XOR-MAPPED-ADDRESS attribute from response
    public_ip = "203.0.113.45"   # Extracted from STUN response
    public_port = 54321
    return public_ip, public_port

STUN is cheap — it uses a tiny public server that simply echoes your address back. Most WebRTC deployments use Google’s free STUN servers (stun.l.google.com). The limitation: STUN only works when hole-punching is possible. When both peers are behind Symmetric NAT, STUN cannot establish a direct connection.

TURN: The Relay Fallback

When hole-punching fails, we need a relay. TURN (Traversal Using Relays around NAT, RFC 5766) servers act as explicit media relays — all traffic flows through them.

  Peer A         TURN Server          Peer B
  ──────         ───────────          ──────
     │                │                  │
     ├── Allocate ───►│                  │
     │◄── Relayed     │                  │
     │    Address ────┤                  │
     │                │◄── Allocate ─────┤
     │                ├─── Relayed ──────►│
     │                │    Address        │
     │                │                  │
     ├─── Data ──────►│────── Data ──────►│
     │◄────────────── │◄──────────────────┤

  All media flows THROUGH the TURN server — no direct P2P.
  Higher latency, but guaranteed connectivity.

TURN is expensive to operate (all bandwidth passes through it), but it is the safety net that makes WebRTC work even in the most restrictive enterprise networks. Typically, only 15–20% of WebRTC connections need TURN.

ICE: The Grand Unifier

ICE (Interactive Connectivity Establishment, RFC 8445) is the algorithm that orchestrates STUN and TURN into a coherent strategy. It systematically tests all possible connection paths and selects the best one. ICE gathers a list of candidates — potential connection addresses — and ranks them:

  1. Host candidates — Direct local network addresses (lowest latency, ideal)
  2. Server-reflexive candidates — Public IP:port discovered via STUN
  3. Relay candidates — TURN relay addresses (fallback)

Once both peers exchange their candidate lists, ICE runs connectivity checks — sending STUN binding requests over every candidate pair. The first pair that succeeds becomes the nominated pair, and media flows through it.

from dataclasses import dataclass
from enum import Enum

class CandidateType(Enum):
    HOST = "host"          # Priority 1: local IP addresses
    SRFLX = "srflx"        # Priority 2: server-reflexive (STUN)
    RELAY = "relay"        # Priority 3: TURN relay

@dataclass
class IceCandidate:
    foundation: str
    component_id: int
    transport: str
    priority: int
    ip: str
    port: int
    candidate_type: CandidateType

def calculate_ice_priority(
    candidate_type: CandidateType,
    local_pref: int = 65535,
    component_id: int = 1
) -> int:
    """
    ICE priority formula from RFC 5245:
    priority = (2^24) * type_pref + (2^8) * local_pref + (256 - component_id)
    """
    type_preferences = {
        CandidateType.HOST: 126,
        CandidateType.SRFLX: 100,
        CandidateType.RELAY: 0
    }
    type_pref = type_preferences[candidate_type]
    return (2**24) * type_pref + (2**8) * local_pref + (256 - component_id)

Part V: WebRTC — The Browser’s P2P Engine

What WebRTC Is (and What It Is Not)

WebRTC (Web Real-Time Communication) is an open standard (originally from Google, standardized by the W3C and IETF) that gives browsers and native applications the ability to establish direct P2P connections for audio, video, and arbitrary data — without any plugins.

WebRTC is not a single protocol. It is a collection of protocols and APIs wired together into a coherent system:

┌─────────────────────────────────────────────────────────┐
│                      WebRTC Stack                       │
├─────────────────────────────────────────────────────────┤
│  Application Layer                                       │
│  ┌──────────────┐  ┌────────────────┐  ┌─────────────┐ │
│  │ getUserMedia │  │RTCPeerConnection│  │RTCDataChannel│ │
│  │  (capture)   │  │ (connectivity) │  │ (data xfer) │ │
│  └──────────────┘  └────────────────┘  └─────────────┘ │
├─────────────────────────────────────────────────────────┤
│  Transport Layer                                         │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────────┐  │
│  │   SRTP   │  │   SCTP   │  │         ICE          │  │
│  │ (media)  │  │ (data)   │  │(connectivity/routing)│  │
│  └──────────┘  └──────────┘  └──────────────────────┘  │
├─────────────────────────────────────────────────────────┤
│  Security Layer                                          │
│  ┌───────────────────────────────────────────────────┐  │
│  │                DTLS (key exchange)                │  │
│  └───────────────────────────────────────────────────┘  │
├─────────────────────────────────────────────────────────┤
│  Network Layer: UDP / TCP / STUN / TURN / ICE           │
└─────────────────────────────────────────────────────────┘

The Signaling Gap

Here is the famous paradox of WebRTC: two peers cannot negotiate a connection without first having a connection. You need a communication channel to set up the communication channel.

This is the signaling problem, and WebRTC deliberately does not solve it. It is intentionally left to the developer. You can use WebSockets, HTTP long-polling, carrier pigeon — anything that can carry text messages between two peers before the P2P connection is established.

A typical signaling server is surprisingly simple:

import asyncio
import json
import websockets

rooms: dict[str, list] = {}

async def handle_peer(websocket, path):
    room_id = None
    try:
        async for raw_message in websocket:
            message = json.loads(raw_message)
            msg_type = message.get("type")

            if msg_type == "join":
                room_id = message["room"]
                rooms.setdefault(room_id, []).append(websocket)
                print(f"Peer joined room: {room_id}")

            elif msg_type in ("offer", "answer", "ice-candidate"):
                # Relay the SDP or ICE candidate to all OTHER peers in the room
                peers_in_room = rooms.get(room_id, [])
                recipients = [p for p in peers_in_room if p != websocket]
                relay_msg = json.dumps(message)
                await asyncio.gather(
                    *[peer.send(relay_msg) for peer in recipients]
                )
    finally:
        if room_id and websocket in rooms.get(room_id, []):
            rooms[room_id].remove(websocket)

Notice that the signaling server is a thin relay — it only passes messages between peers during the handshake. Once the P2P connection is established, the signaling server is no longer involved.


Part VI: The WebRTC Handshake — A Step-by-Step Drama

SDP: The Negotiation Blueprint

When two WebRTC peers are about to connect, they need to agree on a common set of capabilities — codecs, resolution, bitrate, and connection addresses. This negotiation uses SDP (Session Description Protocol, RFC 4566) — a text format that describes a multimedia session.

v=0
o=- 4611731400430051336 2 IN IP4 127.0.0.1
s=-
t=0 0
m=audio 9 UDP/TLS/RTP/SAVPF 111
a=rtpmap:111 opus/48000/2     ← Use Opus codec at 48kHz stereo
a=ice-ufrag:abc123            ← ICE credentials
a=ice-pwd:xyz789
a=fingerprint:sha-256 AA:BB:...  ← DTLS certificate fingerprint
a=candidate:1 1 UDP 2130706431 192.168.1.5 50000 typ host
a=candidate:2 1 UDP 1694498815 203.0.113.45 54321 typ srflx

The Offer/Answer Dance

The complete WebRTC connection sequence is a beautifully choreographed exchange:

  Alice (Caller)              Signaling Server           Bob (Callee)
  ──────────────              ────────────────           ────────────
       │                             │                        │
       │  1. createOffer()           │                        │
       │  ← SDP Offer generated      │                        │
       │                             │                        │
       │  2. setLocalDescription()   │                        │
       │  (ICE gathering begins)     │                        │
       │                             │                        │
       │──── 3. Send SDP Offer ─────►│──── Relay to Bob ─────►│
       │                             │                        │ 4. setRemoteDescription()
       │                             │                        │ 5. createAnswer()
       │                             │                        │ ← SDP Answer generated
       │                             │                        │ 6. setLocalDescription()
       │◄────────────────────────────│◄─── Send SDP Answer ───┤
       │  7. setRemoteDescription()  │                        │
       │                             │                        │
       │◄══ 8. ICE candidates exchanged (trickle ICE) ═══════►│
       │                             │                        │
       │◄══ 9. DTLS handshake ═══════════════════════════════►│
       │                             │                        │
       │◄══ 10. SRTP/SCTP media/data flows ══════════════════►│
       │           (signaling server no longer involved)       │

Steps 8 onward use Trickle ICE — candidates are sent as they are discovered, rather than waiting for all candidates before starting. This dramatically reduces connection time.

The Python Perspective: aiortc

While WebRTC is primarily a browser technology, the Python library aiortc implements the full WebRTC stack — invaluable for testing, server-side processing (like a recording bot), and understanding the internals:

import asyncio
from aiortc import RTCPeerConnection, RTCSessionDescription

async def create_offer() -> tuple[RTCPeerConnection, str]:
    """Alice creates a WebRTC offer. In a real app, this SDP is sent to Bob via signaling."""
    pc = RTCPeerConnection()
    offer = await pc.createOffer()
    await pc.setLocalDescription(offer)
    return pc, pc.localDescription.sdp

async def accept_offer(sdp_offer: str) -> tuple[RTCPeerConnection, str]:
    """Bob receives Alice's offer and generates an answer."""
    pc = RTCPeerConnection()

    @pc.on("track")
    def on_track(track):
        print(f"Receiving {track.kind} track from Alice")

    remote_offer = RTCSessionDescription(sdp=sdp_offer, type="offer")
    await pc.setRemoteDescription(remote_offer)
    answer = await pc.createAnswer()
    await pc.setLocalDescription(answer)
    return pc, pc.localDescription.sdp

Part VII: The Three APIs of WebRTC

WebRTC exposes three primary browser APIs. Each one solves a distinct problem.

1. getUserMedia — Capturing the World

getUserMedia is the entry point for accessing the user’s camera and microphone. It requires a secure context (HTTPS) and returns a MediaStream — a container of MediaStreamTrack objects.

# Python equivalent using OpenCV (simulates camera capture)
import cv2
import numpy as np

class MediaStreamTrack:
    def __init__(self, source: str | int = 0):
        self.kind = "video"
        self._capture = cv2.VideoCapture(source)

    def read_frame(self) -> np.ndarray | None:
        ret, frame = self._capture.read()
        return frame if ret else None

    def apply_constraint(self, width: int, height: int):
        self._capture.set(cv2.CAP_PROP_FRAME_WIDTH, width)
        self._capture.set(cv2.CAP_PROP_FRAME_HEIGHT, height)

    def stop(self):
        self._capture.release()

# Browser equivalent:
# const stream = await navigator.mediaDevices.getUserMedia({
#     video: { width: 1280, height: 720 },
#     audio: { echoCancellation: true }
# });

2. RTCPeerConnection — The Connection Engine

RTCPeerConnection is the heart of WebRTC. It manages the entire ICE negotiation, DTLS handshake, and media transport. It is a state machine that moves through distinct phases:

  RTCPeerConnection State Machine:

  new ──► gathering ──► complete       ← ICE gathering states
   │
   ├──► checking ──► connected ──► completed  ← ICE connection states
   │               └──► failed
   │               └──► disconnected
   └──► closed

3. RTCDataChannel — P2P Data Pipes

RTCDataChannel is WebRTC’s most underappreciated feature. It allows peers to send arbitrary binary or text data directly over the P2P connection — no server involved — making it ideal for collaborative apps, multiplayer games, and file transfers.

Data channels support flexible delivery modes:

Mode Protocol Use Case
Reliable, ordered SCTP Chat messages, file transfers
Unreliable, unordered SCTP (partial) Game state, sensor data
Unreliable, ordered SCTP (partial) Game position updates
import asyncio
from collections import deque
from dataclasses import dataclass

@dataclass
class DataChannelMessage:
    label: str
    data: bytes | str
    reliable: bool = True

class RTCDataChannelSimulator:
    def __init__(self, label: str, ordered: bool = True, reliable: bool = True):
        self.label = label
        self.ordered = ordered
        self.reliable = reliable
        self.state = "connecting"
        self._message_queue: deque = deque()
        self._on_message = None

    async def open(self):
        await asyncio.sleep(0.01)  # Simulate DTLS handshake
        self.state = "open"

    async def send(self, data: str | bytes):
        if self.state != "open":
            raise RuntimeError("DataChannel not open")
        import random
        if not self.reliable and random.random() < 0.1:
            return  # Drop packet (unreliable mode simulation)
        msg = DataChannelMessage(label=self.label, data=data, reliable=self.reliable)
        self._message_queue.append(msg)
        if self._on_message:
            await self._on_message(msg)

Part VIII: Security — Why WebRTC Is Encrypted by Default

The Non-Negotiable: Mandatory Encryption

Unlike many networking protocols where security is optional, WebRTC mandates encryption. You cannot turn it off. Every WebRTC connection uses two security protocols working in tandem:

DTLS (Datagram Transport Layer Security) is the UDP-friendly cousin of TLS. It handles the initial key exchange between peers. Before any media flows, peers perform a DTLS handshake to establish shared encryption keys.

SRTP (Secure Real-Time Transport Protocol, RFC 3711) uses the keys established by DTLS to encrypt every single media packet. Even if an attacker intercepts packets in transit, they are meaningless without the decryption keys.

  DTLS-SRTP Key Exchange Flow:

  Alice                                              Bob
  ─────                                              ───
    │                                                  │
    │──── ClientHello (supported cipher suites) ──────►│
    │◄─── ServerHello + Certificate ──────────────────┤
    │ (verify fingerprint against SDP!) ←── critical   │
    │──── ClientKeyExchange ──────────────────────────►│
    │◄─── Finished ───────────────────────────────────┤
    │──── Finished ───────────────────────────────────►│
    │                                                  │
    │ Both sides derive:                               │
    │   - SRTP master key (for media encryption)       │
    │   - SRTP master salt                             │
    │                                                  │
    │◄════ Encrypted SRTP Media Flows ════════════════►│

Certificate Fingerprint Verification: Defeating MITM

The genius of WebRTC’s security is this: the DTLS certificate fingerprint is embedded in the SDP. When peers exchange SDPs through the signaling server, each SDP contains a hash like a=fingerprint:sha-256 AB:CD:EF:.... During the DTLS handshake, each peer verifies that the certificate presented matches the fingerprint in the SDP. If a man-in-the-middle tries to intercept and re-encrypt the traffic, they would present a certificate that does not match — and the connection is immediately rejected.

import hashlib, ssl

def simulate_fingerprint_verification(
    certificate_pem: bytes,
    expected_fingerprint: str
) -> bool:
    cert_der = ssl.PEM_cert_to_DER_cert(certificate_pem.decode())
    digest = hashlib.sha256(cert_der).hexdigest()
    computed = ":".join(digest[i:i+2].upper() for i in range(0, len(digest), 2))

    if computed == expected_fingerprint.upper():
        print("✓ Certificate fingerprint VERIFIED — no MITM detected")
        return True
    else:
        print("✗ Fingerprint MISMATCH — connection REJECTED (possible MITM!)")
        return False

Part IX: Putting It All Together — The Full Connection Flow

Let us now trace the complete journey from “Alice clicks Call” to “Alice hears Bob’s voice”:

  PHASE 1: Signaling (via WebSocket server)
  ══════════════════════════════════════════
  Alice                 Signal Server              Bob
    │                        │                      │
    ├─ create offer ─────────►│────── relay ────────►│
    │                        │◄─── create answer ───┤
    │◄─── relay answer ──────┤                      │

  PHASE 2: ICE Negotiation (STUN/TURN)
  ══════════════════════════════════════════════════
  Alice          STUN Server      TURN Server       Bob
    │                │                │              │
    ├─ discover ────►│                │              │
    │◄── public IP ─┤                │              │
    │                                │◄─ allocate ──┤
    │                                ├── relay addr ►│
    │◄════════════ exchange candidates (via signal) ►│
    │◄════════════ connectivity checks (STUN pings) ►│
    │      (select best path: host > srflx > relay)  │

  PHASE 3: DTLS Handshake
  ════════════════════════
    │◄════════════ DTLS ClientHello / ServerHello ══►│
    │     (verify SDP fingerprints — reject if bad)   │
    │◄════════════ SRTP keys derived ════════════════►│

  PHASE 4: Media Flows
  ═════════════════════
    │◄════════════ SRTP audio/video packets ═════════►│
    │◄════════════ SCTP data channel messages ════════►│

Part X: Interview Cheat Sheet — What Examiners Actually Ask

Core Concepts You Must Articulate

Q: What is the difference between STUN and TURN? STUN is a discovery protocol that helps a peer learn its own public IP:port. TURN is a relay protocol that forwards media through a server when direct P2P is impossible. STUN is cheap and used first; TURN is the expensive fallback.

Q: What does ICE actually do? ICE gathers all possible connection candidates (host, STUN-reflexive, TURN-relay), exchanges them with the remote peer, systematically tests every candidate pair with STUN binding requests, and selects the highest-priority working path. It handles the entire NAT traversal strategy automatically.

Q: Why does WebRTC use UDP instead of TCP for media? TCP’s retransmission-on-loss guarantee is actually harmful for real-time media. If an audio packet is lost, retransmitting it 200ms later is useless — the codec would rather fill the gap with comfort noise. UDP’s “fire and forget” model tolerates packet loss gracefully, mapping perfectly to real-time audio/video codecs like Opus and VP8.

Q: What is the signaling server’s role, and why doesn’t WebRTC define one? The signaling server relays SDP offers/answers and ICE candidates between peers before the P2P connection is established. WebRTC deliberately leaves this undefined because different applications have vastly different signaling needs — SIP, custom JSON over WebSockets, message queues — and the W3C committee wisely avoided standardizing something that varied too widely.

Q: Explain the WebRTC security model. WebRTC mandates DTLS for key exchange and SRTP for media encryption. The DTLS certificate fingerprint is embedded in the SDP. When peers exchange SDPs, they commit to accepting only a connection whose DTLS certificate matches that fingerprint — making man-in-the-middle attacks detectable even if the signaling channel is compromised.

Q: How does a DHT scale better than a centralized tracker? A centralized tracker is a single point of failure and a bandwidth bottleneck. A DHT distributes the responsibility across all nodes — each node maintains only $O(\log N)$ routing entries, and any lookup completes in $O(\log N)$ hops. Removing any node only disrupts lookups that routed through it, and the network self-heals.

Quick-Reference Architecture Map

Component Protocol Purpose
Peer discovery Kademlia DHT Finding peers without a central server
Address discovery STUN (RFC 5389) Discovering public IP:port behind NAT
Media relay TURN (RFC 5766) Fallback when direct P2P fails
Path selection ICE (RFC 8445) Choosing the best connection path
Session negotiation SDP + Offer/Answer Agreeing on codecs and capabilities
Key exchange DTLS Securing the handshake
Media encryption SRTP Encrypting audio/video packets
Data transport SCTP over DTLS Reliable/unreliable data channels
Media capture getUserMedia API Accessing camera/microphone
Connection management RTCPeerConnection Orchestrating the entire WebRTC lifecycle