Imagine two towns separated by a mountain range. In the first model — the centralized model — every letter from Town A to Town B must pass through a grand Central Post Office in the capital. The capital sorts it, stamps it, and routes it onward. This works beautifully when the capital is running smoothly. But when the capital goes down during a storm, no letters move. Everyone is cut off.
In the second model — the peer-to-peer model — the towns establish a direct courier line between them. No central office. No capital approval needed. Alice in Town A hands her letter directly to a trusted courier who knows exactly where Bob in Town B lives. The communication is direct, faster, and does not depend on any third party staying alive.
This is precisely the architectural difference between Client-Server networking (the first model) and Peer-to-Peer (P2P) networking (the second model). Most of the internet you use daily runs on the Client-Server model — your browser requests resources from a server. But some of the most powerful applications in computing history — BitTorrent, Napster, Skype, Zoom, and Google Meet — are built on the P2P model.
In this article, we will build our understanding from the ground up: starting with the philosophy of P2P, confronting the “mountain range” that blocks direct connections (the NAT problem), learning the clever tricks engineers use to cross that mountain (STUN, TURN, ICE), and finally arriving at WebRTC — the modern, browser-native framework that makes real-time peer-to-peer communication accessible to every developer.
Before we dive into P2P’s architecture, we need to understand what problem it is solving. In a classic Client-Server model, a central server holds all resources and clients request them.
┌──────────┐ request ┌──────────────┐
│ Client │ ──────────────────► │ Server │
│ (you) │ ◄────────────────── │ (centralized│
└──────────┘ response │ authority) │
└──────────────┘
This model is simple, predictable, and easy to secure. Its weakness, however, is fundamental: the server is both the bottleneck and the single point of failure. The more clients connect, the more bandwidth and compute the server needs. Scaling becomes expensive.
P2P flips this model. Every participant — called a node or peer — is both a client and a server simultaneously. Peers share resources (files, bandwidth, compute) directly with one another.
┌──────┐ ┌──────┐
│ Peer │◄───────►│ Peer │
│ A │ │ B │
└──┬───┘ └───┬──┘
│ │
│ ┌──────┐ │
└───►│ Peer │◄────┘
│ C │
└──────┘
Notice in the diagram above that there is no central node. Each peer communicates directly with its neighbors. This network is decentralized and resilient — removing any single peer does not destroy the network.
Not all P2P networks are created equal. We can categorize them by how peers find each other:
Unstructured P2P (e.g., early Gnutella): Peers connect randomly with no predetermined topology. Finding a file requires “flooding” the network with queries — every peer asks its neighbors, who ask their neighbors. This works but is wildly inefficient at scale.
Structured P2P (e.g., BitTorrent with DHT): Peers are organized into a predictable topology using a Distributed Hash Table (DHT). Data is stored and retrieved using a consistent hashing scheme, making lookups $O(\log N)$ efficient.
The most influential structured P2P algorithm is Kademlia, developed in 2002 by Petar Maymounkov and David Mazières. It powers the BitTorrent DHT, IPFS, and the Ethereum discovery protocol.
The core insight of Kademlia is elegant: assign every node and every piece of data a unique 160-bit ID. The “distance” between two IDs is not geographic — it is the XOR of their binary values. This XOR distance has magical mathematical properties: it is symmetric (distance(A, B) = distance(B, A)) and satisfies the triangle inequality.
Node IDs on a logical "keyspace" ring:
0000 1111
├──────────────────────────┤
│ │
● Node A (ID: 0010) │
│ │ │
│ │ XOR distance │
│ ▼ │
● Node B (ID: 0110) │
│ │
● Node C (ID: 1100) ● Node D (ID: 1110)
Each node maintains a routing table called a k-bucket. When Node A wants to find who owns a key, it asks the k nodes in its routing table that are closest (by XOR) to that key. Those nodes point to even closer nodes, converging on the answer in $O(\log N)$ hops.
Here is a simplified Python simulation of how a DHT node would store and look up a key:
import hashlib
def kademlia_id(value: str) -> int:
"""Generate a 160-bit node/key ID using SHA-1."""
return int(hashlib.sha1(value.encode()).hexdigest(), 16)
def xor_distance(id_a: int, id_b: int) -> int:
"""Kademlia's XOR metric as a distance function."""
return id_a ^ id_b
class DHTNode:
def __init__(self, node_id: str):
self.node_id = kademlia_id(node_id)
self.storage: dict[int, str] = {} # key -> value
self.k_buckets: list[tuple[int, "DHTNode"]] = [] # (id, peer)
def store(self, key: str, value: str):
hashed_key = kademlia_id(key)
self.storage[hashed_key] = value
print(f"Stored '{key}' with hash {hex(hashed_key)[:10]}...")
def find(self, key: str) -> str | None:
hashed_key = kademlia_id(key)
return self.storage.get(hashed_key)
def closest_peers(self, target_key: str, k: int = 3) -> list:
"""Return k peers closest (by XOR) to the target key."""
hashed_key = kademlia_id(target_key)
sorted_peers = sorted(
self.k_buckets,
key=lambda x: xor_distance(x[^0], hashed_key)
)
return sorted_peers[:k]
The key takeaway here: DHTs allow P2P networks to locate any piece of data — or any peer — without a central directory, in logarithmic time. This is why BitTorrent can survive even if every tracker server in the world goes offline.
Here is where our courier analogy gets interesting. Alice wants to send a letter directly to Bob. But Bob doesn’t have a publicly visible address — he lives in a gated community (a private network), and the community has a single shared mailbox at the gate (the NAT device). Letters come in, but the gatekeeper only allows replies to mail that Bob initiated outward. Bob cannot receive unsolicited mail from Alice.
This is the Network Address Translation (NAT) problem — the single biggest technical challenge in P2P networking.
NAT was invented in the 1990s as a clever workaround for the fact that IPv4 only supports ~4 billion unique addresses. By allowing millions of private devices to share a single public IP, NAT extended the life of IPv4 by decades. But it fundamentally broke the end-to-end connectivity principle of the original internet.
Private Network A Public Internet Private Network B
───────────────── ───────────────── ─────────────────
Alice NAT B Bob
192.168.1.5 ──────► NAT A ──────────────► ??? ◄── 10.0.0.8
203.0.113.1 172.16.0.1
Alice knows her IP (192.168.1.5) but NAT A assigns her 203.0.113.1
Bob is behind NAT B and has NO public IP. How does Alice reach him?
NAT routers are not all the same. Their behavior determines how hard — or impossible — it is to punch through:
| NAT Type | Behavior | P2P Difficulty |
|---|---|---|
| Full Cone | Maps internal → external once; accepts all inbound to that external port | Easy |
| Restricted Cone | Only accepts inbound from IPs Alice has previously contacted | Moderate |
| Port-Restricted Cone | Accepts inbound only from exact IP:port combination Alice contacted | Hard |
| Symmetric | Creates a new mapping for every different destination | Very Hard |
The nightmare scenario for P2P engineers is when both peers are behind Symmetric NAT. In this case, the external port is unpredictable and different for every destination — making hole-punching nearly impossible without a relay.
The most elegant hack in networking is UDP hole punching. The insight: when a device sends an outbound UDP packet, the NAT creates a temporary mapping (a “hole”) that allows inbound packets from that same destination IP:port for a short window.
The strategy:
Peer A Rendezvous Server Peer B
────── ───────────────── ──────
│ │ │
├──── "My public addr?" ───────►│ │
│◄─── "203.0.113.1:5000" ──────┤ │
│ │◄── "My public addr?" ───┤
│ ├─── "198.51.100.2:6000" ─►│
│ │ │
│◄─── "B is at 198.51.100.2:6000" ───────────────────── │
│ ─── "A is at 203.0.113.1:5000" ────────────────────► │
│ │ │
├──── UDP to 198.51.100.2:6000 ─────────────────────────►│ (punches hole)
│◄─── UDP to 203.0.113.1:5000 ──────────────────────────┤ (punches hole)
│ │ │
│◄═══════ Direct P2P Connection Established! ════════════►│
STUN (Session Traversal Utilities for NAT, RFC 5389) is a standardized protocol that automates the “What is my public IP?” step. A peer sends a STUN request to a public STUN server, which simply reflects the peer’s public IP and port back.
import socket
import struct
STUN_SERVER = ("stun.l.google.com", 19302)
STUN_BINDING_REQUEST = 0x0001
MAGIC_COOKIE = 0x2112A442
def send_stun_request() -> tuple[str, int]:
"""Send a STUN binding request and parse the public IP:port response."""
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(3)
# Build STUN message header: type(2) + length(2) + magic_cookie(4) + transaction_id(12)
transaction_id = b'\x00' * 12 # Simplified: normally random bytes
header = struct.pack(">HHI", STUN_BINDING_REQUEST, 0, MAGIC_COOKIE) + transaction_id
sock.sendto(header, STUN_SERVER)
response, _ = sock.recvfrom(1024)
sock.close()
# Parse XOR-MAPPED-ADDRESS attribute from response
public_ip = "203.0.113.45" # Extracted from STUN response
public_port = 54321
return public_ip, public_port
STUN is cheap — it uses a tiny public server that simply echoes your address back. Most WebRTC deployments use Google’s free STUN servers (stun.l.google.com). The limitation: STUN only works when hole-punching is possible. When both peers are behind Symmetric NAT, STUN cannot establish a direct connection.
When hole-punching fails, we need a relay. TURN (Traversal Using Relays around NAT, RFC 5766) servers act as explicit media relays — all traffic flows through them.
Peer A TURN Server Peer B
────── ─────────── ──────
│ │ │
├── Allocate ───►│ │
│◄── Relayed │ │
│ Address ────┤ │
│ │◄── Allocate ─────┤
│ ├─── Relayed ──────►│
│ │ Address │
│ │ │
├─── Data ──────►│────── Data ──────►│
│◄────────────── │◄──────────────────┤
All media flows THROUGH the TURN server — no direct P2P.
Higher latency, but guaranteed connectivity.
TURN is expensive to operate (all bandwidth passes through it), but it is the safety net that makes WebRTC work even in the most restrictive enterprise networks. Typically, only 15–20% of WebRTC connections need TURN.
ICE (Interactive Connectivity Establishment, RFC 8445) is the algorithm that orchestrates STUN and TURN into a coherent strategy. It systematically tests all possible connection paths and selects the best one. ICE gathers a list of candidates — potential connection addresses — and ranks them:
Once both peers exchange their candidate lists, ICE runs connectivity checks — sending STUN binding requests over every candidate pair. The first pair that succeeds becomes the nominated pair, and media flows through it.
from dataclasses import dataclass
from enum import Enum
class CandidateType(Enum):
HOST = "host" # Priority 1: local IP addresses
SRFLX = "srflx" # Priority 2: server-reflexive (STUN)
RELAY = "relay" # Priority 3: TURN relay
@dataclass
class IceCandidate:
foundation: str
component_id: int
transport: str
priority: int
ip: str
port: int
candidate_type: CandidateType
def calculate_ice_priority(
candidate_type: CandidateType,
local_pref: int = 65535,
component_id: int = 1
) -> int:
"""
ICE priority formula from RFC 5245:
priority = (2^24) * type_pref + (2^8) * local_pref + (256 - component_id)
"""
type_preferences = {
CandidateType.HOST: 126,
CandidateType.SRFLX: 100,
CandidateType.RELAY: 0
}
type_pref = type_preferences[candidate_type]
return (2**24) * type_pref + (2**8) * local_pref + (256 - component_id)
WebRTC (Web Real-Time Communication) is an open standard (originally from Google, standardized by the W3C and IETF) that gives browsers and native applications the ability to establish direct P2P connections for audio, video, and arbitrary data — without any plugins.
WebRTC is not a single protocol. It is a collection of protocols and APIs wired together into a coherent system:
┌─────────────────────────────────────────────────────────┐
│ WebRTC Stack │
├─────────────────────────────────────────────────────────┤
│ Application Layer │
│ ┌──────────────┐ ┌────────────────┐ ┌─────────────┐ │
│ │ getUserMedia │ │RTCPeerConnection│ │RTCDataChannel│ │
│ │ (capture) │ │ (connectivity) │ │ (data xfer) │ │
│ └──────────────┘ └────────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────┤
│ Transport Layer │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │ SRTP │ │ SCTP │ │ ICE │ │
│ │ (media) │ │ (data) │ │(connectivity/routing)│ │
│ └──────────┘ └──────────┘ └──────────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ Security Layer │
│ ┌───────────────────────────────────────────────────┐ │
│ │ DTLS (key exchange) │ │
│ └───────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ Network Layer: UDP / TCP / STUN / TURN / ICE │
└─────────────────────────────────────────────────────────┘
Here is the famous paradox of WebRTC: two peers cannot negotiate a connection without first having a connection. You need a communication channel to set up the communication channel.
This is the signaling problem, and WebRTC deliberately does not solve it. It is intentionally left to the developer. You can use WebSockets, HTTP long-polling, carrier pigeon — anything that can carry text messages between two peers before the P2P connection is established.
A typical signaling server is surprisingly simple:
import asyncio
import json
import websockets
rooms: dict[str, list] = {}
async def handle_peer(websocket, path):
room_id = None
try:
async for raw_message in websocket:
message = json.loads(raw_message)
msg_type = message.get("type")
if msg_type == "join":
room_id = message["room"]
rooms.setdefault(room_id, []).append(websocket)
print(f"Peer joined room: {room_id}")
elif msg_type in ("offer", "answer", "ice-candidate"):
# Relay the SDP or ICE candidate to all OTHER peers in the room
peers_in_room = rooms.get(room_id, [])
recipients = [p for p in peers_in_room if p != websocket]
relay_msg = json.dumps(message)
await asyncio.gather(
*[peer.send(relay_msg) for peer in recipients]
)
finally:
if room_id and websocket in rooms.get(room_id, []):
rooms[room_id].remove(websocket)
Notice that the signaling server is a thin relay — it only passes messages between peers during the handshake. Once the P2P connection is established, the signaling server is no longer involved.
When two WebRTC peers are about to connect, they need to agree on a common set of capabilities — codecs, resolution, bitrate, and connection addresses. This negotiation uses SDP (Session Description Protocol, RFC 4566) — a text format that describes a multimedia session.
v=0
o=- 4611731400430051336 2 IN IP4 127.0.0.1
s=-
t=0 0
m=audio 9 UDP/TLS/RTP/SAVPF 111
a=rtpmap:111 opus/48000/2 ← Use Opus codec at 48kHz stereo
a=ice-ufrag:abc123 ← ICE credentials
a=ice-pwd:xyz789
a=fingerprint:sha-256 AA:BB:... ← DTLS certificate fingerprint
a=candidate:1 1 UDP 2130706431 192.168.1.5 50000 typ host
a=candidate:2 1 UDP 1694498815 203.0.113.45 54321 typ srflx
The complete WebRTC connection sequence is a beautifully choreographed exchange:
Alice (Caller) Signaling Server Bob (Callee)
────────────── ──────────────── ────────────
│ │ │
│ 1. createOffer() │ │
│ ← SDP Offer generated │ │
│ │ │
│ 2. setLocalDescription() │ │
│ (ICE gathering begins) │ │
│ │ │
│──── 3. Send SDP Offer ─────►│──── Relay to Bob ─────►│
│ │ │ 4. setRemoteDescription()
│ │ │ 5. createAnswer()
│ │ │ ← SDP Answer generated
│ │ │ 6. setLocalDescription()
│◄────────────────────────────│◄─── Send SDP Answer ───┤
│ 7. setRemoteDescription() │ │
│ │ │
│◄══ 8. ICE candidates exchanged (trickle ICE) ═══════►│
│ │ │
│◄══ 9. DTLS handshake ═══════════════════════════════►│
│ │ │
│◄══ 10. SRTP/SCTP media/data flows ══════════════════►│
│ (signaling server no longer involved) │
Steps 8 onward use Trickle ICE — candidates are sent as they are discovered, rather than waiting for all candidates before starting. This dramatically reduces connection time.
While WebRTC is primarily a browser technology, the Python library aiortc implements the full WebRTC stack — invaluable for testing, server-side processing (like a recording bot), and understanding the internals:
import asyncio
from aiortc import RTCPeerConnection, RTCSessionDescription
async def create_offer() -> tuple[RTCPeerConnection, str]:
"""Alice creates a WebRTC offer. In a real app, this SDP is sent to Bob via signaling."""
pc = RTCPeerConnection()
offer = await pc.createOffer()
await pc.setLocalDescription(offer)
return pc, pc.localDescription.sdp
async def accept_offer(sdp_offer: str) -> tuple[RTCPeerConnection, str]:
"""Bob receives Alice's offer and generates an answer."""
pc = RTCPeerConnection()
@pc.on("track")
def on_track(track):
print(f"Receiving {track.kind} track from Alice")
remote_offer = RTCSessionDescription(sdp=sdp_offer, type="offer")
await pc.setRemoteDescription(remote_offer)
answer = await pc.createAnswer()
await pc.setLocalDescription(answer)
return pc, pc.localDescription.sdp
WebRTC exposes three primary browser APIs. Each one solves a distinct problem.
getUserMedia — Capturing the WorldgetUserMedia is the entry point for accessing the user’s camera and microphone. It requires a secure context (HTTPS) and returns a MediaStream — a container of MediaStreamTrack objects.
# Python equivalent using OpenCV (simulates camera capture)
import cv2
import numpy as np
class MediaStreamTrack:
def __init__(self, source: str | int = 0):
self.kind = "video"
self._capture = cv2.VideoCapture(source)
def read_frame(self) -> np.ndarray | None:
ret, frame = self._capture.read()
return frame if ret else None
def apply_constraint(self, width: int, height: int):
self._capture.set(cv2.CAP_PROP_FRAME_WIDTH, width)
self._capture.set(cv2.CAP_PROP_FRAME_HEIGHT, height)
def stop(self):
self._capture.release()
# Browser equivalent:
# const stream = await navigator.mediaDevices.getUserMedia({
# video: { width: 1280, height: 720 },
# audio: { echoCancellation: true }
# });
RTCPeerConnection — The Connection EngineRTCPeerConnection is the heart of WebRTC. It manages the entire ICE negotiation, DTLS handshake, and media transport. It is a state machine that moves through distinct phases:
RTCPeerConnection State Machine:
new ──► gathering ──► complete ← ICE gathering states
│
├──► checking ──► connected ──► completed ← ICE connection states
│ └──► failed
│ └──► disconnected
└──► closed
RTCDataChannel — P2P Data PipesRTCDataChannel is WebRTC’s most underappreciated feature. It allows peers to send arbitrary binary or text data directly over the P2P connection — no server involved — making it ideal for collaborative apps, multiplayer games, and file transfers.
Data channels support flexible delivery modes:
| Mode | Protocol | Use Case |
|---|---|---|
| Reliable, ordered | SCTP | Chat messages, file transfers |
| Unreliable, unordered | SCTP (partial) | Game state, sensor data |
| Unreliable, ordered | SCTP (partial) | Game position updates |
import asyncio
from collections import deque
from dataclasses import dataclass
@dataclass
class DataChannelMessage:
label: str
data: bytes | str
reliable: bool = True
class RTCDataChannelSimulator:
def __init__(self, label: str, ordered: bool = True, reliable: bool = True):
self.label = label
self.ordered = ordered
self.reliable = reliable
self.state = "connecting"
self._message_queue: deque = deque()
self._on_message = None
async def open(self):
await asyncio.sleep(0.01) # Simulate DTLS handshake
self.state = "open"
async def send(self, data: str | bytes):
if self.state != "open":
raise RuntimeError("DataChannel not open")
import random
if not self.reliable and random.random() < 0.1:
return # Drop packet (unreliable mode simulation)
msg = DataChannelMessage(label=self.label, data=data, reliable=self.reliable)
self._message_queue.append(msg)
if self._on_message:
await self._on_message(msg)
Unlike many networking protocols where security is optional, WebRTC mandates encryption. You cannot turn it off. Every WebRTC connection uses two security protocols working in tandem:
DTLS (Datagram Transport Layer Security) is the UDP-friendly cousin of TLS. It handles the initial key exchange between peers. Before any media flows, peers perform a DTLS handshake to establish shared encryption keys.
SRTP (Secure Real-Time Transport Protocol, RFC 3711) uses the keys established by DTLS to encrypt every single media packet. Even if an attacker intercepts packets in transit, they are meaningless without the decryption keys.
DTLS-SRTP Key Exchange Flow:
Alice Bob
───── ───
│ │
│──── ClientHello (supported cipher suites) ──────►│
│◄─── ServerHello + Certificate ──────────────────┤
│ (verify fingerprint against SDP!) ←── critical │
│──── ClientKeyExchange ──────────────────────────►│
│◄─── Finished ───────────────────────────────────┤
│──── Finished ───────────────────────────────────►│
│ │
│ Both sides derive: │
│ - SRTP master key (for media encryption) │
│ - SRTP master salt │
│ │
│◄════ Encrypted SRTP Media Flows ════════════════►│
The genius of WebRTC’s security is this: the DTLS certificate fingerprint is embedded in the SDP. When peers exchange SDPs through the signaling server, each SDP contains a hash like a=fingerprint:sha-256 AB:CD:EF:.... During the DTLS handshake, each peer verifies that the certificate presented matches the fingerprint in the SDP. If a man-in-the-middle tries to intercept and re-encrypt the traffic, they would present a certificate that does not match — and the connection is immediately rejected.
import hashlib, ssl
def simulate_fingerprint_verification(
certificate_pem: bytes,
expected_fingerprint: str
) -> bool:
cert_der = ssl.PEM_cert_to_DER_cert(certificate_pem.decode())
digest = hashlib.sha256(cert_der).hexdigest()
computed = ":".join(digest[i:i+2].upper() for i in range(0, len(digest), 2))
if computed == expected_fingerprint.upper():
print("✓ Certificate fingerprint VERIFIED — no MITM detected")
return True
else:
print("✗ Fingerprint MISMATCH — connection REJECTED (possible MITM!)")
return False
Let us now trace the complete journey from “Alice clicks Call” to “Alice hears Bob’s voice”:
PHASE 1: Signaling (via WebSocket server)
══════════════════════════════════════════
Alice Signal Server Bob
│ │ │
├─ create offer ─────────►│────── relay ────────►│
│ │◄─── create answer ───┤
│◄─── relay answer ──────┤ │
PHASE 2: ICE Negotiation (STUN/TURN)
══════════════════════════════════════════════════
Alice STUN Server TURN Server Bob
│ │ │ │
├─ discover ────►│ │ │
│◄── public IP ─┤ │ │
│ │◄─ allocate ──┤
│ ├── relay addr ►│
│◄════════════ exchange candidates (via signal) ►│
│◄════════════ connectivity checks (STUN pings) ►│
│ (select best path: host > srflx > relay) │
PHASE 3: DTLS Handshake
════════════════════════
│◄════════════ DTLS ClientHello / ServerHello ══►│
│ (verify SDP fingerprints — reject if bad) │
│◄════════════ SRTP keys derived ════════════════►│
PHASE 4: Media Flows
═════════════════════
│◄════════════ SRTP audio/video packets ═════════►│
│◄════════════ SCTP data channel messages ════════►│
Q: What is the difference between STUN and TURN? STUN is a discovery protocol that helps a peer learn its own public IP:port. TURN is a relay protocol that forwards media through a server when direct P2P is impossible. STUN is cheap and used first; TURN is the expensive fallback.
Q: What does ICE actually do? ICE gathers all possible connection candidates (host, STUN-reflexive, TURN-relay), exchanges them with the remote peer, systematically tests every candidate pair with STUN binding requests, and selects the highest-priority working path. It handles the entire NAT traversal strategy automatically.
Q: Why does WebRTC use UDP instead of TCP for media? TCP’s retransmission-on-loss guarantee is actually harmful for real-time media. If an audio packet is lost, retransmitting it 200ms later is useless — the codec would rather fill the gap with comfort noise. UDP’s “fire and forget” model tolerates packet loss gracefully, mapping perfectly to real-time audio/video codecs like Opus and VP8.
Q: What is the signaling server’s role, and why doesn’t WebRTC define one? The signaling server relays SDP offers/answers and ICE candidates between peers before the P2P connection is established. WebRTC deliberately leaves this undefined because different applications have vastly different signaling needs — SIP, custom JSON over WebSockets, message queues — and the W3C committee wisely avoided standardizing something that varied too widely.
Q: Explain the WebRTC security model. WebRTC mandates DTLS for key exchange and SRTP for media encryption. The DTLS certificate fingerprint is embedded in the SDP. When peers exchange SDPs, they commit to accepting only a connection whose DTLS certificate matches that fingerprint — making man-in-the-middle attacks detectable even if the signaling channel is compromised.
Q: How does a DHT scale better than a centralized tracker? A centralized tracker is a single point of failure and a bandwidth bottleneck. A DHT distributes the responsibility across all nodes — each node maintains only $O(\log N)$ routing entries, and any lookup completes in $O(\log N)$ hops. Removing any node only disrupts lookups that routed through it, and the network self-heals.
| Component | Protocol | Purpose |
|---|---|---|
| Peer discovery | Kademlia DHT | Finding peers without a central server |
| Address discovery | STUN (RFC 5389) | Discovering public IP:port behind NAT |
| Media relay | TURN (RFC 5766) | Fallback when direct P2P fails |
| Path selection | ICE (RFC 8445) | Choosing the best connection path |
| Session negotiation | SDP + Offer/Answer | Agreeing on codecs and capabilities |
| Key exchange | DTLS | Securing the handshake |
| Media encryption | SRTP | Encrypting audio/video packets |
| Data transport | SCTP over DTLS | Reliable/unreliable data channels |
| Media capture | getUserMedia API | Accessing camera/microphone |
| Connection management | RTCPeerConnection | Orchestrating the entire WebRTC lifecycle |