system-design.io

The Bridge: Think of a Restaurant

Imagine you walk into a restaurant. There are three ways you might interact with your waiter:

  1. You order one dish, the waiter brings it back. Simple. Done. This is the classic request-response.
  2. You order a tasting menu. The waiter takes your single order, disappears into the kitchen, and then sends out plate after plate over the next two hours. You don’t re-order — you just receive.
  3. You’re at a sushi conveyor belt. You continuously reach out and grab plates while the kitchen keeps adding more. There is constant, bidirectional flow.

Now imagine coordinating hundreds of thousands of these restaurants — across different cities, countries, and time zones — with a need for every order to be understood perfectly, instantly, and efficiently. That is the problem gRPC was designed to solve.

Welcome to gRPC: Google’s battle-hardened framework for building fast, reliable, cross-language communication between distributed services. By the end of this article, you’ll understand not just what gRPC does, but why every design decision exists — and how to confidently explain any of it in a senior engineering interview.


Part 1: The Problem gRPC Was Built to Solve

The World Before gRPC: REST and Its Limits

For most of the 2000s and early 2010s, REST over HTTP/1.1 with JSON was king. And for good reason — it was simple, human-readable, and worked in any browser. But as companies grew into microservice architectures (Netflix, Google, Amazon), cracks began to show:

Pain Point 1: Text serialization is expensive.
JSON is a human-readable text format. Every time Service A calls Service B, the data must be serialized to a string on one end and deserialized back to an object on the other. At millions of requests per second, this overhead adds up dramatically.

Pain Point 2: HTTP/1.1 is inefficient.
HTTP/1.1 is a text-based protocol with a fundamental limitation: it only allows one request per TCP connection at a time. To work around this, browsers and services open multiple connections. This is wasteful. It also suffers from head-of-line blocking — if one request is slow, everything else waits.

Pain Point 3: No enforced contract.
With REST + JSON, there is no formal contract between services. Documentation can drift. A field can be renamed. The server can start returning a string where a client expected an int. These breaking changes are silent until production.

Pain Point 4: No native streaming.
REST has no native model for continuous data flows. Workarounds like long-polling, WebSockets, or Server-Sent Events exist, but they are bolted-on, inconsistent, and non-trivial to implement across languages.

Google faced all four of these problems at massive scale internally. Their internal solution, called Stubby, handled billions of inter-service calls per second for over a decade. In 2015, they rebuilt Stubby on open standards and released it as gRPC — the “g” standing for a different theme each major release (including “good”, “green”, and “glorious”).


Part 2: What is RPC, and What Does gRPC Add?

The Atomic Unit: Remote Procedure Call

At its core, RPC (Remote Procedure Call) is a deceptively simple idea: make calling a function on a remote machine feel like calling a local function.


Local Call:
result = calculate_total(items)

RPC Call (same developer experience):
result = inventory_service.calculate_total(items)

The RPC framework hides all the networking complexity — serialization, connection management, error handling — behind a clean function call. From the developer’s perspective, it feels like calling a method in the same process.

gRPC takes this concept and layers on four key innovations:

Dimension Old RPC (e.g., CORBA, XML-RPC) gRPC
Serialization XML / custom binary Protocol Buffers (binary)
Transport HTTP/1.1, TCP HTTP/2
Streaming None (request-response only) 4 native streaming modes
Code generation Language-specific, fragile protoc generates for 10+ languages

Let’s unpack each layer.


Part 3: Protocol Buffers — The Language of gRPC

The Contract-First Philosophy

Before any code is written, gRPC requires you to define your API in a .proto file. This is not documentation — it is the source of truth. From this single file, tools generate client and server code in Go, Java, Python, C++, and more. Client and server can never disagree about the API shape because they are both generated from the same contract.

// analytics_service.proto
syntax = "proto3";

package analytics.v1;

service AnalyticsService {
  rpc GetReport(ReportRequest) returns (ReportResponse);
  rpc StreamEvents(EventFilter) returns (stream Event);
  rpc UploadBatch(stream RawEvent) returns (BatchSummary);
  rpc ProcessPipeline(stream RawEvent) returns (stream ProcessedEvent);
}

message ReportRequest {
  string report_id = 1;
  int64 start_timestamp = 2;
  int64 end_timestamp = 3;
}

message Event {
  string event_id = 1;
  string event_type = 2;
  int64 timestamp = 3;
  map<string, string> properties = 4;
}

Notice the stream keyword in the service definition. Those four rpc declarations map directly to gRPC’s four streaming modes — we’ll cover each in depth shortly.

Why Binary Beats JSON

Protocol Buffers serialize data into a compact binary format. Compared to JSON:

The practical result: Protobuf payloads are typically 3–10x smaller than JSON, and encoding/decoding is proportionally faster.

JSON representation:
{"event_id": "abc123", "event_type": "click", "timestamp": 1710000000}
= 67 bytes

Protobuf representation:
0a 06 61 62 63 31 32 33 12 05 63 6c 69 63 6b 18 80 f8 cb 9a 06
= 21 bytes

In Figure above, notice that the Protobuf encoding replaces the verbose JSON field names with compact numeric tags, and uses variable-length integer encoding for the timestamp — a technique that packs small numbers into fewer bytes than their fixed-size counterparts.


Part 4: HTTP/2 — The Engine Under the Hood

Why HTTP/1.1 Wasn’t Enough

HTTP/1.1 has a fundamental sequencing problem: one request per connection, one response per request, in order. This means if your client fires 10 requests, request #10 waits for requests #1–9 to complete (or you open 10 separate TCP connections, which wastes memory and time).

HTTP/2 was designed specifically to eliminate this constraint. gRPC runs entirely on HTTP/2 and leverages four of its critical features:

Feature 1: Multiplexing

HTTP/2 introduces the concept of streams — logical, independent channels within a single TCP connection. Multiple streams can be open simultaneously, their frames interleaved on the wire, and processed independently on the other end.

HTTP/1.1 — Sequential, One at a Time:
┌────────────────────────────────────────────┐
│  TCP Connection                            │
│  [Request 1] ──► [Response 1]              │
│  [Request 2] ──────────────► [Response 2]  │
│  [Request 3] ──────────────────► [Resp 3]  │
└────────────────────────────────────────────┘

HTTP/2 — Concurrent, Multiplexed:
┌─────────────────────────────────────────────┐
│  Single TCP Connection                      │
│  Stream 1: [Req]──►[Resp]                   │
│  Stream 3:    [Req]─────►[Resp]             │
│  Stream 5:       [Req]──►[Resp]             │
│  Stream 7:          [Req]────────►[Resp]    │
└─────────────────────────────────────────────┘

In the figure above, HTTP/1.1 forces requests into a single lane — like a one-lane road where everyone lines up. HTTP/2 turns it into a multi-lane highway where all cars travel simultaneously. Each gRPC call, regardless of type, maps to exactly one HTTP/2 stream.

Feature 2: Binary Framing

HTTP/1.1 sends headers and bodies as plain text strings. HTTP/2 switches to a binary framing layer: all communication is split into small, typed frames (HEADERS, DATA, SETTINGS, PING, etc.). This makes parsing faster and eliminates ambiguities.

Every gRPC call uses three frame types:

  1. HEADERS frame (call initiation): contains the gRPC path /package.Service/Method, metadata, and content-type application/grpc
  2. DATA frames: carry the actual Protobuf-encoded messages
  3. HEADERS frame (trailers): carries the final grpc-status code

Feature 3: Header Compression (HPACK)

HTTP/1.1 repeats headers on every single request. Authentication tokens, user-agent strings, content-types — all resent verbatim every time. HTTP/2’s HPACK compression maintains a header table on both client and server. Repeated headers are replaced with a small integer index. For high-frequency gRPC calls with consistent metadata, this yields dramatic bandwidth savings.

Feature 4: Flow Control

HTTP/2 has built-in flow control at both the connection and stream level. This prevents a fast sender from overwhelming a slow receiver — a critical property for gRPC’s streaming modes, where a server might stream thousands of messages per second to a client that can only process hundreds.


Part 5: The gRPC Architecture — Stubs and Channels

Before we dive into streaming modes, we need to understand how a gRPC call is physically set up.

┌──────────────────────────────────────────────────────────────┐
│                         CLIENT SIDE                          │
│                                                              │
│   Application Code                                           │
│         │                                                    │
│         ▼                                                    │
│   [Generated Stub]  ← protoc generates this                 │
│         │  "Looks like a local method call"                  │
│         ▼                                                    │
│   [gRPC Channel]                                             │
│         │  Manages connection pool, retries, LB              │
│         ▼                                                    │
│   [HTTP/2 Transport]                                         │
│         │                                                    │
└─────────┼────────────────────────────────────────────────────┘
          │  TCP/TLS
          │
┌─────────┼────────────────────────────────────────────────────┐
│         ▼                         SERVER SIDE                │
│   [HTTP/2 Transport]                                         │
│         │                                                    │
│         ▼                                                    │
│   [gRPC Server]                                              │
│         │  Routes by path /package.Service/Method            │
│         ▼                                                    │
│   [Service Implementation]  ← Your business logic           │
└──────────────────────────────────────────────────────────────┘

Figure: The gRPC stack. Notice that the Stub on the client side is a generated class that hides all networking. When your application code calls stub.GetReport(request), it feels like a local function call, but underneath, a full HTTP/2 negotiation, Protobuf serialization, and frame exchange is happening.

Stubs are generated by protoc (the Protobuf compiler) from your .proto file. They implement the exact same interface as your server-side service, so client code reads like any other object-oriented code.

Channels manage the underlying connection to a server. They handle:


Part 6: The Four Streaming Modes

This is the heart of gRPC’s design superiority. We’ll explore each mode with the problem it solves, the .proto definition, a lifecycle diagram, and a Python implementation.

Mode 1: Unary RPC — The Classic Function Call

The Problem it Solves: Most operations in a system are simple request-response: fetch a user, create an order, authenticate a token. Unary RPC is the bedrock pattern. It’s the gRPC equivalent of a REST GET or POST.

Lifecycle:

Client                    Server
  │                          │
  │──── HEADERS (request) ──►│  ← Call initiated
  │──── DATA (message) ─────►│  ← Request payload
  │                          │  ← Server processes
  │◄─── HEADERS (response) ──│  ← Response headers
  │◄─── DATA (message) ──────│  ← Response payload
  │◄─── HEADERS (trailers) ──│  ← grpc-status: 0 (OK)
  │                          │

Proto Definition:

rpc GetUserProfile(UserRequest) returns (UserProfile);

Python Server Implementation:

import grpc
from concurrent import futures
import user_pb2
import user_pb2_grpc

class UserServiceServicer(user_pb2_grpc.UserServiceServicer):
    def GetUserProfile(self, request, context):
        # context carries metadata, deadline, cancellation
        user_id = request.user_id
        
        # Simulate a database lookup
        user_data = db.fetch_user(user_id)
        
        if user_data is None:
            context.set_code(grpc.StatusCode.NOT_FOUND)
            context.set_details(f"User {user_id} not found")
            return user_pb2.UserProfile()
        
        return user_pb2.UserProfile(
            user_id=user_data["id"],
            username=user_data["username"],
            email=user_data["email"]
        )

def serve():
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    user_pb2_grpc.add_UserServiceServicer_to_server(UserServiceServicer(), server)
    server.add_insecure_port("[::]:50051")
    server.start()
    server.wait_for_termination()

Python Client:

import grpc
import user_pb2
import user_pb2_grpc

with grpc.insecure_channel("localhost:50051") as channel:
    stub = user_pb2_grpc.UserServiceStub(channel)
    
    request = user_pb2.UserRequest(user_id="u_12345")
    response = stub.GetUserProfile(request)
    
    print(f"Username: {response.username}")
    print(f"Email: {response.email}")

Interview Insight: Notice that context.set_code() is how you signal errors in gRPC. Unlike REST where you return HTTP 404, gRPC always returns HTTP 200 at the transport level. The actual application status lives in the trailers (grpc-status), not the HTTP status. This separation allows a server to stream 100 items successfully and then report an error at item 101.


Mode 2: Server Streaming RPC — The Firehose

The Problem it Solves: Imagine requesting “all logs from the last 24 hours” — potentially millions of records. With Unary RPC, the server would need to buffer the entire result set in memory before sending a single byte. This is both slow and memory-explosive.

Server Streaming allows the server to push data incrementally, as it becomes available, while the client consumes it in real time.

Perfect Use Cases:

Lifecycle:

Client                         Server
  │                               │
  │──── HEADERS (request) ───────►│
  │──── DATA (single request) ───►│
  │                               │ ← Server begins generating
  │◄─── DATA (message 1) ─────────│
  │◄─── DATA (message 2) ─────────│
  │◄─── DATA (message 3) ─────────│
  │           ...                 │
  │◄─── DATA (message N) ─────────│
  │◄─── HEADERS (trailers) ───────│ ← grpc-status: 0
  │                               │
  └───── Client reads until EOF ──┘

Proto Definition:

// The `stream` keyword before the return type is the only change
rpc StreamLogs(LogFilter) returns (stream LogEntry);

Python Server Implementation:

import time

class LogServiceServicer(log_pb2_grpc.LogServiceServicer):
    def StreamLogs(self, request, context):
        """
        This is a Python generator — `yield` turns it into a stream.
        gRPC will send each yielded message as a separate DATA frame.
        """
        log_cursor = db.get_log_cursor(
            service=request.service_name,
            start_time=request.start_timestamp
        )
        
        for log_entry in log_cursor:
            # Check if the client has cancelled
            if context.is_active() is False:
                break
            
            yield log_pb2.LogEntry(
                log_id=log_entry["id"],
                level=log_entry["level"],
                message=log_entry["message"],
                timestamp=log_entry["ts"]
            )
            
            # Optionally throttle — respect flow control
            time.sleep(0.001)

Python Client:

with grpc.insecure_channel("localhost:50051") as channel:
    stub = log_pb2_grpc.LogServiceStub(channel)
    
    filter_req = log_pb2.LogFilter(
        service_name="payment-service",
        start_timestamp=1710000000
    )
    
    # The stub returns an iterator — we just loop
    log_stream = stub.StreamLogs(filter_req)
    
    for log_entry in log_stream:
        print(f"[{log_entry.level}] {log_entry.message}")

The Elegant Design: Notice that on the server side, we use Python’s yield keyword to produce a generator. The gRPC framework takes care of framing each yielded message into a DATA frame and sending it over the HTTP/2 stream. The developer never thinks about TCP, HTTP/2, or byte framing — they just yield.


Mode 3: Client Streaming RPC — The Collector

The Problem it Solves: Now flip the firehose around. What if the client has a massive amount of data to send, and only needs a single summary back? Uploading a large file, ingesting a batch of sensor readings, or submitting bulk records for processing — these are all cases where the client streams to the server.

The alternative (unary with huge payloads) would require the client to buffer the entire dataset in memory before sending, and would time out on large inputs.

Perfect Use Cases:

Lifecycle:

Client                         Server
  │                               │
  │──── HEADERS (request) ───────►│
  │──── DATA (chunk 1) ──────────►│
  │──── DATA (chunk 2) ──────────►│
  │──── DATA (chunk 3) ──────────►│
  │        ...                    │ ← Server accumulates
  │──── DATA (chunk N) ──────────►│
  │──── END_STREAM ──────────────►│ ← Client signals done
  │                               │ ← Server finalizes
  │◄─── HEADERS (response) ───────│
  │◄─── DATA (single response) ───│
  │◄─── HEADERS (trailers) ───────│
  │                               │

Proto Definition:

// The `stream` keyword is now on the INPUT type
rpc UploadSensorData(stream SensorReading) returns (UploadSummary);

Python Server Implementation:

class SensorServiceServicer(sensor_pb2_grpc.SensorServiceServicer):
    def UploadSensorData(self, request_iterator, context):
        """
        request_iterator is a Python iterator over incoming messages.
        We collect them all and return a single summary.
        """
        total_count = 0
        error_count = 0
        readings_buffer = []
        
        for reading in request_iterator:
            total_count += 1
            
            # Validate each reading as it arrives
            if reading.value < -273.15:  # Below absolute zero
                error_count += 1
                continue
            
            readings_buffer.append(reading)
            
            # Flush to database in batches of 100
            if len(readings_buffer) >= 100:
                db.bulk_insert_readings(readings_buffer)
                readings_buffer.clear()
        
        # Final flush
        if readings_buffer:
            db.bulk_insert_readings(readings_buffer)
        
        return sensor_pb2.UploadSummary(
            total_received=total_count,
            total_stored=total_count - error_count,
            error_count=error_count
        )

Python Client:

def generate_sensor_readings(device_id: str):
    """A generator that produces sensor readings."""
    for i in range(10_000):
        yield sensor_pb2.SensorReading(
            device_id=device_id,
            value=random.uniform(-10.0, 50.0),
            timestamp=int(time.time())
        )

with grpc.insecure_channel("localhost:50051") as channel:
    stub = sensor_pb2_grpc.SensorServiceStub(channel)
    
    # Pass the generator directly — gRPC streams it automatically
    summary = stub.UploadSensorData(generate_sensor_readings("sensor_042"))
    
    print(f"Uploaded: {summary.total_stored} / {summary.total_received}")
    print(f"Errors: {summary.error_count}")

The Key Pattern: The client passes a Python generator function directly to the stub. gRPC iterates the generator and sends each item as a separate DATA frame. This lazy evaluation means even a 10-million-item upload never loads all items into memory simultaneously — items are generated, sent, and freed one at a time.


Mode 4: Bidirectional Streaming RPC — The Full Duplex

The Problem it Solves: The previous three modes all have one “stream direction” that is unidirectional. Bidirectional streaming opens two independent streams — one client-to-server, one server-to-client — that operate simultaneously. Neither side waits for the other to finish. They read and write in whatever order the application requires.

This is the most powerful and complex streaming mode. It enables genuinely interactive, real-time communication patterns that REST simply cannot replicate.

Perfect Use Cases:

Lifecycle:

Client                              Server
  │                                    │
  │──── HEADERS (method init) ────────►│
  │                                    │
  │──── DATA (message A1) ────────────►│
  │◄─── DATA (response R1) ────────────│ ← Server responds to A1
  │──── DATA (message A2) ────────────►│
  │──── DATA (message A3) ────────────►│ ← Client doesn't wait
  │◄─── DATA (response R2) ────────────│
  │◄─── DATA (response R3) ────────────│
  │──── END_STREAM ───────────────────►│ ← Client signals done
  │◄─── DATA (response R4) ────────────│ ← Server can still send
  │◄─── HEADERS (trailers) ────────────│ ← Server closes its stream
  │                                    │

In the figure above, notice the interleaved pattern. Message A2 and A3 are sent before the server has even responded to A1. The two streams are completely independent. This is the defining property of bidirectional streaming — and the core reason gRPC requires HTTP/2 (HTTP/1.1 has no concept of independent concurrent streams over the same connection).

Proto Definition:

// Both input and output are streams
rpc ProcessPipeline(stream AnalyticsEvent) returns (stream ProcessedResult);

Python Server Implementation:

import queue
import threading

class AnalyticsServiceServicer(analytics_pb2_grpc.AnalyticsServiceServicer):
    def ProcessPipeline(self, request_iterator, context):
        """
        Both arguments are iterables/iterators.
        We process input and yield output concurrently.
        """
        for event in request_iterator:
            # Process each event as it arrives
            result = ml_model.classify_event(event)
            
            # Yield a response for each input (ping-pong pattern)
            yield analytics_pb2.ProcessedResult(
                event_id=event.event_id,
                category=result.category,
                confidence=result.confidence,
                processing_time_ms=result.latency_ms
            )
            
            # Note: The server could also batch, delay, or
            # yield multiple results per input — completely flexible

Python Client:

def event_generator():
    """Simulates a stream of live analytics events."""
    events = [
        {"id": "e001", "type": "purchase", "amount": 99.99},
        {"id": "e002", "type": "page_view", "url": "/home"},
        {"id": "e003", "type": "click",    "element": "buy_btn"},
    ]
    for e in events:
        yield analytics_pb2.AnalyticsEvent(
            event_id=e["id"],
            event_type=e["type"]
        )
        time.sleep(0.1)  # Simulate real-time event pace

with grpc.insecure_channel("localhost:50051") as channel:
    stub = analytics_pb2_grpc.AnalyticsServiceStub(channel)
    
    # Pass generator, get back an iterator of results
    result_stream = stub.ProcessPipeline(event_generator())
    
    for result in result_stream:
        print(
            f"[{result.event_id}] → {result.category} "
            f"(confidence: {result.confidence:.2f})"
        )

Part 7: The Wire Format — What Actually Travels on the Network

Length-Prefixed Framing

At the byte level, every gRPC message (in any streaming mode) is wrapped with a simple 5-byte header before being placed into an HTTP/2 DATA frame:

┌──────────┬──────────────────────────────────────┬──────────────────────┐
│  Byte 0  │        Bytes 1-4                     │   Bytes 5 onward     │
│          │                                      │                      │
│  Compress│   Message Length                     │   Protobuf Payload   │
│  Flag    │   (4-byte big-endian integer)         │                      │
│  0 or 1  │   e.g., 0x0000000F = 15 bytes        │   Binary encoded     │
└──────────┴──────────────────────────────────────┴──────────────────────┘

This length-prefixed framing is the secret sauce that enables streaming. Because each message carries its own length prefix, the receiver always knows exactly how many bytes to read for the next complete message. It can decode it, process it, and immediately begin reading the next prefix — creating a fluid stream of independent, self-describing messages.

A concrete example: sending a weight: 150, name: "Apple" Protobuf message (which encodes to 10 bytes) produces this 15-byte gRPC message on the wire:

00  00 00 00 0a  08 96 01 12 05 41 70 70 6c 65
│   └───┬────┘  └──────────────────────────┘
│       │                │
│       │           Protobuf payload (10 bytes)
│       │
│  Message length = 0x0A = 10
│
Compression flag = 0 (no compression)

Status in Trailers, Not HTTP Status

This is one of gRPC’s most counterintuitive properties — and a common interview question.

In REST: A failed request returns HTTP 404 Not Found or 500 Internal Server Error.

In gRPC: The HTTP status is almost always 200 OK — even on errors. The actual result is in the HTTP/2 trailers (a final HEADERS frame sent after all DATA frames):

grpc-status: 0        ← 0 = OK, 5 = NOT_FOUND, 13 = INTERNAL
grpc-message: OK

Why? Because in streaming, the HTTP status is determined at connection establishment — before any data flows. A server might successfully stream 500 log entries and then fail on entry 501. If the error were in the HTTP status, there would be no way to signal success-then-failure. Trailers solve this: they arrive after all data, carrying the final verdict.

The gRPC Status Codes

gRPC defines 16 status codes, distinct from HTTP status codes:

┌────────┬─────────────────────┬────────────────────────────────────────┐
│ Code   │ Name                │ Meaning                                │
├────────┼─────────────────────┼────────────────────────────────────────┤
│ 0      │ OK                  │ Success                                │
│ 1      │ CANCELLED           │ Call was cancelled (by client or server)│
│ 2      │ UNKNOWN             │ Unknown error from server              │
│ 3      │ INVALID_ARGUMENT    │ Client sent bad arguments              │
│ 4      │ DEADLINE_EXCEEDED   │ Timed out before completing            │
│ 5      │ NOT_FOUND           │ Resource doesn't exist                 │
│ 7      │ PERMISSION_DENIED   │ Auth check failed                      │
│ 8      │ RESOURCE_EXHAUSTED  │ Rate limit hit, quota exceeded         │
│ 13     │ INTERNAL            │ Server-side error                      │
│ 14     │ UNAVAILABLE         │ Server is down or overloaded           │
└────────┴─────────────────────┴────────────────────────────────────────┘

Part 8: Cross-Cutting Concerns

Deadlines and Timeouts

gRPC has first-class support for deadlines — a client can declare “I’m willing to wait at most 5 seconds for this call, total”:

import grpc
from datetime import datetime, timedelta

with grpc.insecure_channel("localhost:50051") as channel:
    stub = user_pb2_grpc.UserServiceStub(channel)
    
    # Set a deadline: 5 seconds from now
    deadline = datetime.utcnow() + timedelta(seconds=5)
    
    try:
        response = stub.GetUserProfile(
            user_pb2.UserRequest(user_id="u_001"),
            timeout=5.0  # equivalent: timeout in seconds
        )
    except grpc.RpcError as e:
        if e.code() == grpc.StatusCode.DEADLINE_EXCEEDED:
            print("Call timed out!")

On the server side, the deadline propagates through all downstream calls. If Service A calls Service B with a 5-second deadline, and B calls C, C automatically inherits a budget equal to whatever time remains. This deadline propagation prevents cascading hangs across microservice chains — a problem that plagued early REST microservice architectures.

Metadata — gRPC’s Header System

Metadata is gRPC’s equivalent of HTTP headers: key-value pairs attached to a call (not part of the business logic payload). Common uses:

# Client attaching authentication metadata
metadata = [
    ("authorization", "Bearer eyJhbGci..."),
    ("x-request-id", "req_abc123"),
    ("x-trace-id",   "trace_xyz789"),
]

response = stub.GetUserProfile(
    user_pb2.UserRequest(user_id="u_001"),
    metadata=metadata
)

Metadata keys must not start with grpc- (reserved for internal use). Binary values must use keys ending in -bin, and the framework handles base64 encoding automatically.

Interceptors — The gRPC Middleware Layer

Interceptors are gRPC’s equivalent of middleware (like Express.js middleware or Django middleware). They let you inject cross-cutting logic — logging, auth, metrics — without polluting business logic:

class AuthInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        metadata = dict(handler_call_details.invocation_metadata)
        token = metadata.get("authorization", "")
        
        if not self._validate_token(token):
            def abort(ignored_request, context):
                context.abort(
                    grpc.StatusCode.UNAUTHENTICATED,
                    "Invalid or missing auth token"
                )
            return grpc.unary_unary_rpc_method_handler(abort)
        
        return continuation(handler_call_details)
    
    def _validate_token(self, token: str) -> bool:
        return token.startswith("Bearer ") and len(token) > 50

# Register interceptor at server creation
server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=10),
    interceptors=[AuthInterceptor()]
)

Part 9: gRPC vs REST — A Structured Comparison

We’ve covered a lot of ground. Let’s crystallize the key tradeoffs in a structured comparison that you can use directly in system design interviews:

Dimension REST + JSON gRPC + Protobuf
Protocol HTTP/1.1 or HTTP/2 HTTP/2 (required)
Payload format JSON (text) Protobuf (binary)
Payload size Larger (verbose keys) 3–10x smaller
Schema enforcement Optional (OpenAPI) Required (.proto)
Code generation Optional (Swagger codegen) Built-in (protoc)
Streaming support Via WebSocket/SSE (bolted on) Native (4 modes)
Browser support Full native Via gRPC-Web proxy only
Human readability JSON is readable Binary not readable
Latency Higher (text parsing overhead) Lower (binary, multiplexed)
Ecosystem maturity Very mature (decades) Mature (2015–present)
Best for Public APIs, CRUD, browser clients Microservices, internal APIs, streaming

The gRPC-Web Caveat

One important limitation: standard browsers cannot speak gRPC directly. Web browsers do not expose the low-level HTTP/2 framing controls that gRPC requires — specifically, reading HTTP/2 trailers. The solution is gRPC-Web: a protocol adaptation layer that encodes trailers inside the response body and uses a proxy (like Envoy) between the browser and the backend gRPC service. For microservice-to-microservice communication, this is irrelevant — but it matters for any gRPC endpoint a browser must call directly.


Part 10: Choosing the Right Streaming Mode

Here’s a decision framework for selecting the appropriate gRPC pattern:

                    Does the client send
                    multiple messages?
                          │
              ┌───────────┴───────────┐
              │ NO                    │ YES
              ▼                       ▼
   Does server send         Does server send
   multiple messages?       multiple messages?
         │                          │
    ┌────┴────┐               ┌─────┴──────┐
    │ NO │ YES│               │ NO  │  YES │
    ▼         ▼               ▼            ▼
  UNARY    SERVER          CLIENT     BIDIRECTIONAL
  RPC      STREAMING       STREAMING  STREAMING
           RPC             RPC        RPC

 (CRUD,   (Dashboard,  (File upload, (Chat, gaming,
  auth,    stock feed,  bulk ingest, real-time ML
  lookup)  log stream)  IoT batch)   pipelines)

Quick Reference Summary

Pattern Request Response Best For
Unary Single Single CRUD, auth, lookups
Server Streaming Single Multiple Live feeds, large exports
Client Streaming Multiple Single File uploads, bulk ingestion
Bidirectional Multiple Multiple Chat, gaming, collaborative tools

Part 11: Interview Readiness — Key Concepts to Know Cold

Senior engineering interviews at companies like Google, Meta, Stripe, and Nvidia often test gRPC knowledge at multiple levels. Here’s what you should be able to discuss fluently:

System Design Questions

“Design a real-time analytics pipeline for 1M events/second.” → Use bidirectional streaming between event collectors and the processing engine. gRPC’s HTTP/2 multiplexing means thousands of event producers share a single TCP connection per backend. Each producer is a stream.

“How would you handle a client uploading a 10GB file?” → Client streaming RPC with chunked bytes messages. The server processes and stores each chunk on arrival. No memory buffering of the full file on either side.

“Why does gRPC always return HTTP 200 even on errors?” → Because in streaming, HTTP status is committed at stream open time — before data flows. Application errors in the middle of a stream couldn’t be expressed via HTTP status. gRPC uses trailers (grpc-status) to carry the final application-level result after all data has been exchanged.

Deep Technical Questions

“What is length-prefixed framing and why does gRPC need it?” → HTTP/2 DATA frames can carry arbitrary bytes. Without framing, the receiver can’t tell where one Protobuf message ends and the next begins. The 5-byte prefix (1 byte compression flag + 4 bytes length) gives the receiver exactly the right byte count to decode each message.

“How does deadline propagation work across services?” → The deadline is sent as a grpc-timeout header in the initial HEADERS frame. Downstream services receive this deadline and can check context.time_remaining(). Well-designed services propagate the remaining deadline to their own downstream calls, creating a cascade of coordinated timeouts.

“What is the difference between a channel and a stub?” → A channel represents the logical connection to a server (managing TCP connections, TLS, load balancing state). A stub is the generated client object that exposes your service’s methods. You create one channel and can create multiple stubs from it. The stub translates method calls into gRPC frames; the channel transmits them.


Closing Thoughts

gRPC is not just a performance optimization — it is a communication architecture philosophy. Its design embodies three core convictions:

  1. Contracts are not optional: The .proto file as source of truth eliminates entire classes of integration bugs.
  2. Transport matters: HTTP/2’s multiplexing, binary framing, and flow control are not incidental. They are load-bearing pillars that make streaming possible at scale.
  3. Streaming is first-class: The four streaming modes are not bolt-ons. They are designed into the framework from the ground up, with full lifecycle management, flow control, and cancellation support.

When you see stream in a .proto file, you’re looking at a design decision — a deliberate choice about data flow, latency, and resource consumption. Understanding why that stream keyword exists is the difference between a developer who uses gRPC and an engineer who designs with it.

```