Before we write a single line of query syntax, let’s visit a scene most of us know well: a busy government office in the 1970s.
Picture rows upon rows of metal filing cabinets. Every citizen has a file. Every file lives in a specific drawer, in a specific cabinet, labeled by last name, then first name, then year of birth. The system is beautiful — it is orderly, predictable, and perfectly consistent. Any clerk can walk up, follow the rules, and retrieve exactly the right file within seconds.
This is, at its core, a relational database.
But now imagine the city grows. The population triples. Different departments start collecting different kinds of data. The health department wants to attach X-rays. The DMV wants to attach vehicle photos. The tax office wants to attach spreadsheets, some with 3 columns, some with 300. The marriage registry wants to link files between citizens: “This person is related to that person, who is also related to that person.”
The filing cabinet breaks. Not because it was poorly designed — it was perfect for what it was designed to do. It breaks because the nature of the data changed. It became: varied, massive, interconnected, and fast-moving.
This is the origin story of NoSQL.
“Not Only SQL” is not a single database system. It is a family of database philosophies, each built to solve a specific failure mode of the relational model. By the end of this article, we will understand all four members of this family, when to reach for each one, and how to reason about them in a system design interview with confidence.
To appreciate the NoSQL family, we need to be honest about where SQL databases struggle. We are not dismissing SQL — it remains the gold standard for many applications. But we are identifying the pressure points.
Pain Point 1: Schema Rigidity
In a relational database, the schema — the shape of your tables, the columns, their data types — must be defined before data is inserted. This is called schema-on-write. If your product evolves and you need to add a new field, you run a migration. For large tables with billions of rows, that migration can take hours or days and often requires downtime.
Pain Point 2: Horizontal Scaling Is Expensive
Relational databases were designed to run on a single, powerful machine. Scaling them horizontally (adding more machines) is possible but complex. Features like foreign keys, transactions, and JOINs assume that all the data a query needs lives in the same place. Once you split data across machines (called sharding), maintaining those guarantees becomes a distributed systems nightmare.
Pain Point 3: Not All Data is Tabular
A social network does not naturally look like a table. A user’s activity stream does not fit neatly into normalized rows. A product catalog where each product has wildly different attributes is painful to model in SQL. Forcing non-tabular data into rows and columns introduces impedance mismatch — the friction between how data lives in the real world and how it is stored in the database.
These three pain points are the precise problems that each NoSQL category was designed to address.
Before we dive deep into each type, here is the landscape. Think of these four database types as four different specialists. Each is the best in the world at one thing.
+----------------------------------------------------------+
| NoSQL Universe |
| |
| +---------------+ +------------------+ |
| | Key-Value | | Document | |
| | (Redis) | | (MongoDB) | |
| | | | | |
| | key --> value | | key --> {JSON} | |
| | | | | |
| | SPEED | | FLEXIBILITY | |
| +---------------+ +------------------+ |
| |
| +---------------+ +------------------+ |
| | Wide-Column | | Graph | |
| | (Cassandra) | | (Neo4j) | |
| | | | | |
| | rows x cols | | nodes --edges--> | |
| | (sparse) | | nodes | |
| | | | | |
| | SCALE | | RELATIONSHIPS | |
| +---------------+ +------------------+ |
+----------------------------------------------------------+
In this diagram, we can see that each NoSQL type occupies a different “specialty zone.” Key-value databases optimize for raw speed. Document databases optimize for schema flexibility. Wide-column databases optimize for massive-scale writes. Graph databases optimize for relationship traversal. Keep this mental map in mind as we go deeper.
You walk into a restaurant. You hand the attendant your coat. They give you a numbered token: #42. When you leave, you hand back #42 and you get your coat. The attendant doesn’t know or care what is inside your coat. They only know: token maps to item.
This is a key-value database. Blindingly simple. Blindingly fast.
A key-value store has only one concept: a key that maps to a value.
# Conceptual key-value model
store = {}
# Write
store["user:session:abc123"] = {
"user_id": 99,
"logged_in_at": "2025-03-21T08:00:00Z",
"cart_items": 3
}
# Read (O(1) lookup — no scan, no join)
session = store["user:session:abc123"]
print(session)
# {'user_id': 99, 'logged_in_at': '2025-03-21T08:00:00Z', 'cart_items': 3}
The value can be anything: a string, a number, a binary blob, a JSON blob. The database does not inspect or index the value — it just stores it and retrieves it. This constraint is what makes key-value stores so fast. Lookup is O(1) — constant time, regardless of how many records are in the database.
Now let’s add a real-world constraint. You are building a web application with 10 million users. Every page load requires checking whether the user is authenticated. If you hit your SQL database for every single request, the latency mounts and your database becomes a bottleneck. This is the session management problem.
The fix: store active sessions in a key-value database like Redis, and only consult the SQL database for deep operations (reading user profiles, writing orders, etc.).
import redis
# Connect to Redis
r = redis.Redis(host='localhost', port=6379, db=0)
# Store a session with a 30-minute TTL (Time-To-Live)
r.setex(
name="session:abc123",
time=1800, # seconds
value="user_id:99"
)
# Retrieve session on each request
session = r.get("session:abc123")
if session:
print(f"Authenticated: {session.decode('utf-8')}")
else:
print("Session expired. Please log in again.")
Notice the setex call — this sets a key with an automatic expiration.
Most key-value stores support TTL natively. This makes them perfect for
caching, ephemeral data, and rate limiting.
+----------------------------------+
| Key-Value Store (Hash Table) |
| |
| Key Value |
| +-----------+ +-------------+ |
| |session:a1 |->| {user:99} | |
| +-----------+ +-------------+ |
| |cache:home |->| "<html>..." | |
| +-----------+ +-------------+ |
| |rate:ip:x |->| 47 | |
| +-----------+ +-------------+ |
| |lock:order1|->| "LOCKED" | |
| +-----------+ +-------------+ |
+----------------------------------+
^
| O(1) hash lookup
| (no table scan, no index walk)
In this diagram, notice that the store is essentially a hash table. Given a key, the database hashes it to find the bucket in memory where the value lives. There is no “WHERE” clause, no filtering — just a direct address lookup. This is why Redis can process millions of operations per second.
| Use Case | Why Key-Value Works |
|---|---|
| Session storage | Fast R/W, natural TTL support |
| Caching (HTML, queries) | Offloads DB, sub-millisecond response |
| Rate limiting | Atomic increment operations |
| Leaderboards | Redis sorted sets provide ordered scores |
| Feature flags | Simple boolean values per user/environment |
The simplicity is a double-edged sword. You cannot query by value. You cannot say “give me all sessions created in the last 30 minutes” unless you designed specific keys for that. The key is your only access point. If your access patterns are complex — if you need to filter, sort, or aggregate — a key-value store alone is insufficient.
Imagine a library where every book is unique. Some books have 3 chapters. Some have 30. Some include illustrations, footnotes, and multi-language appendices. Some are tiny pamphlets; some are multi-volume encyclopedias.
A relational database would force all of these into a single, rigid table structure. A document database says: “Store each book as it is. Let each one have its own shape.”
A document database stores self-contained records called documents. Each document is typically a JSON (or BSON) object — a nested, flexible data structure.
# A MongoDB-style document for a product catalog
product = {
"_id": "prod_001",
"name": "Wireless Headphones Pro",
"brand": "SoundWave",
"price": 149.99,
"tags": ["audio", "wireless", "noise-cancelling"],
"specs": {
"battery_life_hours": 30,
"driver_size_mm": 40,
"bluetooth_version": "5.2"
},
"reviews": [
{"user": "alice", "rating": 5, "text": "Best purchase ever"},
{"user": "bob", "rating": 4, "text": "Great, but heavy"}
]
}
Notice what just happened. This single document holds a nested object
(specs), an array of primitive values (tags), and an
array of embedded sub-documents (reviews). In SQL, you would need
at least three separate tables — products, product_specs,
product_reviews — and JOIN them at query time.
In a document database, there is no enforced schema at the database level (unless you choose to add validation). Documents in the same collection can have different shapes. This is called schema-on-read: the application interprets the structure of the data when it reads it, not when it writes it.
# Two documents in the same "users" collection — different shapes are fine
user_basic = {
"_id": "u001",
"name": "Alice",
"email": "alice@example.com"
}
user_full = {
"_id": "u002",
"name": "Bob",
"email": "bob@example.com",
"address": {
"street": "123 Oak Ave",
"city": "Milwaukee",
"zip": "53202"
},
"preferences": {
"theme": "dark",
"notifications": True
},
"subscription_tier": "pro"
}
# Both documents live in the same "users" collection without conflict
This is a massive win for fast-moving product teams. Adding a new field to your user model no longer requires a schema migration — you just start writing the new field to new documents.
Database: "ecommerce"
|
+-- Collection: "products"
| |
| +-- Document: { _id: "p001", name: "...", specs: {...} }
| +-- Document: { _id: "p002", name: "...", tags: [...] }
| +-- Document: { _id: "p003", name: "...", variants: [...] }
|
+-- Collection: "orders"
| |
| +-- Document: { _id: "o001", user_id: "u001", items: [...] }
| +-- Document: { _id: "o002", user_id: "u002", items: [...] }
|
+-- Collection: "users"
|
+-- Document: { _id: "u001", name: "Alice", email: "..." }
+-- Document: { _id: "u002", name: "Bob", prefs: {...} }
In this diagram, we can see the hierarchy: one database contains multiple collections, and each collection contains multiple documents. Unlike SQL tables, collections do not enforce a shared schema across their documents. Each document is a self-contained, independently structured unit.
Document databases support rich queries — far beyond the simple key lookups of a key-value store.
from pymongo import MongoClient
client = MongoClient("mongodb://localhost:27017/")
db = client["ecommerce"]
products = db["products"]
# Insert a document
products.insert_one({
"_id": "p004",
"name": "Studio Monitor Speakers",
"price": 299.99,
"tags": ["audio", "studio", "professional"],
"specs": {"watts": 50, "frequency_response": "40Hz-20kHz"}
})
# Query: find all audio products under $300
results = products.find({
"tags": "audio",
"price": {"$lt": 300}
})
for doc in results:
print(doc["name"], doc["price"])
# Query: find products and project only specific fields
results = products.find(
{"specs.watts": {"$gte": 40}},
{"name": 1, "price": 1, "_id": 0} # projection
)
We can filter on nested fields (specs.watts), arrays (tags),
apply range operators ($lt, $gte), and project only needed fields.
This is far more expressive than a key-value store.
When modeling data in a document database, every engineer faces the embedding vs. referencing dilemma. This is a favorite interview question.
Embed when: sub-data is always read with the parent, it’s small, and it doesn’t need to be queried independently.
Reference when: sub-data is large, frequently updated independently, or shared across multiple parent documents.
# EMBEDDING (good for comments always loaded with a post)
post = {
"_id": "post_01",
"title": "Intro to NoSQL",
"comments": [ # <-- embedded
{"author": "alice", "text": "Great read!"},
{"author": "bob", "text": "Very helpful"}
]
}
# REFERENCING (good for authors used across many posts)
post = {
"_id": "post_01",
"title": "Intro to NoSQL",
"author_id": "author_42" # <-- reference (like a foreign key)
}
| Use Case | Why Document Works |
|---|---|
| Product catalogs | Products have wildly different attributes |
| CMS / blog platforms | Posts have variable structures |
| User profiles | Users accumulate different optional data |
| Real-time apps | Flexible schema supports rapid iteration |
| Mobile backends | JSON maps naturally to mobile data models |
Imagine a spreadsheet used by a weather agency to record sensor readings from 10,000 weather stations, every second of every day. The spreadsheet has millions of rows (one per station, per timestamp) and potentially thousands of columns (temperature, humidity, pressure, wind speed, visibility, UV index, etc.). Most cells are empty — not every station measures every metric.
This is a wide-column database: a massive, sparse, multi-dimensional table optimized for enormous datasets and heavy write throughput.
Many engineers confuse wide-column stores with columnar databases (like Apache Parquet or Redshift). They are different.
Wide-column = wide flexibility, not “columns are the primary storage unit.”
Table: "sensor_data"
Row Key | Column Family: "readings"
----------------+------------------------------------------------
"station_001" | ts:1711001600 -> 72.3F | ts:1711001601 -> 72.4F
"station_002" | ts:1711001600 -> 68.1F | humidity:1711001600 -> 55%
"station_003" | ts:1711001601 -> 101.0F | pressure:1711001601 -> 29.9
----------------+------------------------------------------------
Each row can have DIFFERENT columns.
Missing columns take up NO storage space (sparse).
In this diagram, notice that station_001, station_002, and station_003 each have different columns. There is no “NULL” penalty for missing columns — they simply do not exist in that row. This is the sparse matrix model, and it is what enables wide-column databases to scale to petabytes without wasted storage.
Apache Cassandra was created at Facebook to handle the inbox search problem — storing billions of messages across millions of users, with write speeds that SQL databases could not sustain. It is now one of the most widely deployed NoSQL databases in the world, used by Netflix, Apple, and Instagram.
Cassandra models data around partition keys and clustering columns.
# Using the Cassandra Python driver (cassandra-driver)
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
cluster = Cluster(['127.0.0.1'])
session = cluster.connect()
# Create keyspace (like a database)
session.execute("""
CREATE KEYSPACE IF NOT EXISTS iot_platform
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}
""")
session.set_keyspace('iot_platform')
# Create table optimized for time-series sensor data
session.execute("""
CREATE TABLE IF NOT EXISTS sensor_readings (
station_id TEXT,
recorded_at TIMESTAMP,
temperature FLOAT,
humidity FLOAT,
pressure FLOAT,
PRIMARY KEY (station_id, recorded_at)
) WITH CLUSTERING ORDER BY (recorded_at DESC)
""")
# Write — Cassandra is optimized for fast writes
session.execute("""
INSERT INTO sensor_readings
(station_id, recorded_at, temperature, humidity)
VALUES ('station_001', toTimestamp(now()), 72.4, 55.3)
""")
# Read the latest 100 readings for a station
rows = session.execute("""
SELECT recorded_at, temperature, humidity
FROM sensor_readings
WHERE station_id = 'station_001'
LIMIT 100
""")
for row in rows:
print(row.recorded_at, row.temperature)
When designing for Cassandra (a common interview topic), the partition key is the most critical decision. All rows with the same partition key are stored together on the same node. Queries that include the partition key are fast. Queries that don’t are slow (full cluster scans).
Cassandra Cluster (3 Nodes)
+----------+ +----------+ +----------+
| Node A | | Node B | | Node C |
| | | | | |
| station_ | | station_ | | station_ |
| 001 data | | 002 data | | 003 data |
| (all | | (all | | (all |
| timestamps) | timestamps) | timestamps)
+----------+ +----------+ +----------+
Query: "Get all readings for station_001"
--> Goes ONLY to Node A. (FAST)
Query: "Get all readings above 90°F across all stations"
--> Must query ALL nodes. (SLOW - avoid this)
In this diagram, we can see how Cassandra routes queries. When we query by partition key, the request goes directly to the node(s) holding that partition. Cross-partition queries require querying every node in the cluster, which is why Cassandra’s data model must be designed around your read patterns — not the other way around.
This is a critical mindset shift. In SQL, we normalize data to eliminate redundancy. In Cassandra, we denormalize — we duplicate data so that each query pattern has its own optimally structured table.
# Scenario: "Get all orders for a user" AND "Get all orders by date"
# In Cassandra, we create TWO tables — one for each access pattern
# Table 1: Optimized for "orders by user"
CREATE TABLE orders_by_user (
user_id UUID,
created_at TIMESTAMP,
order_id UUID,
total DECIMAL,
PRIMARY KEY (user_id, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
# Table 2: Optimized for "orders by date"
CREATE TABLE orders_by_date (
order_date DATE,
created_at TIMESTAMP,
order_id UUID,
user_id UUID,
PRIMARY KEY (order_date, created_at)
) WITH CLUSTERING ORDER BY (created_at DESC);
Yes, we store order data twice. Yes, this uses more disk space. But Cassandra is designed to run on commodity hardware in clusters of hundreds of nodes — disk space is cheap, cross-partition query performance is not.
| Use Case | Why Wide-Column Works |
|---|---|
| IoT sensor streams | Billions of writes/day, sparse columns |
| Time-series data | Natural partition by entity + clustering by time |
| Activity logs / audit trails | Append-heavy, massive scale |
| Message inboxes (Facebook scale) | High write throughput, by-user queries |
| Real-time analytics | Fast reads by known partition keys |
There is a famous theory that any two people on Earth are connected by no more than six social relationships. Kevin Bacon, the actor, became the center of a game: how many movies do you need to trace before you can connect any actor to Kevin Bacon?
This problem — “what is the shortest path between node A and node B through a network of relationships?” — is trivially natural in a graph database and brutally painful in a relational database.
Let’s make the pain concrete. Suppose we have a SQL database for a social network. We want to find all friends-of-friends for user Alice.
-- Level 1: Alice's direct friends
SELECT friend_id FROM friendships WHERE user_id = 'alice';
-- Level 2: Friends of Alice's friends
SELECT f2.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
WHERE f1.user_id = 'alice';
-- Level 3: Friends of friends of friends
SELECT f3.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN friendships f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 'alice';
Each additional “hop” requires another JOIN. A 6-hop query requires 6 JOINs over potentially billions of rows. Performance collapses exponentially. This is called the JOIN explosion problem, and it is the precise reason graph databases were invented.
A graph database stores data as:
(Alice)-[:KNOWS {since: 2020}]->(Bob)
(Alice)-[:KNOWS {since: 2019}]->(Carol)
(Bob) -[:KNOWS {since: 2021}]->(Dave)
(Carol)-[:KNOWS {since: 2022}]->(Dave)
Visualized:
[Alice] ---KNOWS---> [Bob] ---KNOWS---> [Dave]
| ^
+----KNOWS---> [Carol] ---KNOWS--------+
Find shortest path from Alice to Dave:
Alice -> Bob -> Dave (2 hops) ✓
In this diagram, notice that we can visually see the path between Alice and Dave. A graph database traverses this structure natively — starting at Alice’s node, following edges labeled KNOWS, and arriving at Dave in exactly 2 hops. No JOIN is required; the relationship itself is a first-class citizen in the database.
Neo4j is the most widely deployed graph database. It uses a query language called Cypher, designed to be visually intuitive — queries look like the graph they describe.
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687",
auth=("neo4j", "password"))
with driver.session() as session:
# Create nodes and relationships
session.run("""
CREATE (alice:Person {name: 'Alice', city: 'Milwaukee'})
CREATE (bob:Person {name: 'Bob', city: 'Chicago'})
CREATE (carol:Person {name: 'Carol', city: 'Denver'})
CREATE (dave:Person {name: 'Dave', city: 'Milwaukee'})
CREATE (alice)-[:KNOWS {since: 2020}]->(bob)
CREATE (alice)-[:KNOWS {since: 2019}]->(carol)
CREATE (bob)-[:KNOWS {since: 2021}]->(dave)
CREATE (carol)-[:KNOWS {since: 2022}]->(dave)
""")
# Find all friends-of-friends of Alice (2 hops) — compare to 2 SQL JOINs
results = session.run("""
MATCH (alice:Person {name: 'Alice'})-[:KNOWS*2]->(friend_of_friend)
RETURN friend_of_friend.name AS name
""")
for record in results:
print(record["name"]) # --> Dave
# Find shortest path between Alice and Dave
result = session.run("""
MATCH path = shortestPath(
(alice:Person {name: 'Alice'})-[:KNOWS*]-(dave:Person {name: 'Dave'})
)
RETURN [node IN nodes(path) | node.name] AS path_names,
length(path) AS hops
""")
for record in result:
print(record["path_names"]) # --> ['Alice', 'Bob', 'Dave']
print(record["hops"]) # --> 2
The Cypher pattern (alice)-[:KNOWS*2]->(friend) reads exactly like
a diagram: “match a path from Alice, following KNOWS edges exactly 2 hops,
to a friend.” The * means “any number of hops,” and shortestPath()
is built in. In SQL, both of these require significant engineering.
Here is the core performance insight, beloved in system design interviews.
In a relational database, a JOIN operation must:
In a graph database, a traversal operation:
The difference becomes dramatic at scale. Traversing a 6-hop path in a 15-million-node social graph takes milliseconds in Neo4j and can time out in SQL.
There are two dominant graph data models. For most interviews and applications, the property graph (used by Neo4j, Amazon Neptune) is the relevant model. For semantic web, knowledge graphs, and AI ontologies, the RDF (Resource Description Framework) model is used. We focus on property graphs here.
| Use Case | Why Graph Works |
|---|---|
| Social networks | Friend recommendations, mutual connections |
| Fraud detection | Pattern matching across transaction networks |
| Recommendation engines | “Users who bought X also bought Y” |
| Knowledge graphs (AI/RAG) | Entity relationships for LLM reasoning |
| Network \& IT topology | Map server/dependency relationships |
| Access control (RBAC/ABAC) | Role chains and permission inheritance |
This is the section that matters most in a system design interview. Knowing what each database is is table stakes. Knowing when to reach for which one is what separates a junior engineer from a senior engineer.
| Dimension | Key-Value | Document | Wide-Column | Graph |
|---|---|---|---|---|
| Data model | key → value | key → JSON doc | row × dynamic cols | nodes + edges |
| Query power | Key only | Rich (filters) | Partition + range | Traversal |
| Write throughput | Very high | High | Extremely high | Moderate |
| Scaling model | Horizontal | Horizontal | Linear horizontal | Vertical / Shard |
| Schema | None | Optional | Column families | Node/edge labels |
| Sweet spot | Caching, sessions | Content, APIs | IoT, logs, time-series | Social, fraud |
| Weakness | No value queries | No deep joins | No ad-hoc queries | Complex writes |
| Top example | Redis | MongoDB | Cassandra | Neo4j |
In production systems at scale, these databases are rarely used in isolation. A senior engineer thinks in terms of polyglot persistence: choosing the best storage engine for each specific access pattern.
User Request
|
v
[API Layer (Next.js / Node)]
|
+---> [Redis] - Session check, rate limiting, cache layer
|
+---> [MongoDB] - Fetch user profile, product details
|
+---> [Cassandra] - Write user activity event (1 of billions/day)
|
+---> [Neo4j] - Generate "you might also know" recommendations
|
+---> [PostgreSQL] - Process payment transaction (ACID required)
In this architecture diagram, we see five different databases serving five different concerns within a single application. This is not over-engineering — at scale, each of these databases is running on dedicated infrastructure, handling traffic that would bring a single relational database to its knees. The API layer is the orchestrator that routes each operation to its ideal data store.
“When would you choose NoSQL over SQL, and which NoSQL type?”
Here is a framework for answering it with precision:
Step 1: Identify the data shape. Is the data tabular and relational? → SQL. Is it document-like with variable attributes? → Document DB. Is it a network of relationships? → Graph DB. Is it a massive stream of timestamped events? → Wide-Column. Is it ephemeral access data? → Key-Value.
Step 2: Identify the scale requirement. Millions of rows? SQL handles it fine. Billions of rows with high write throughput? Cassandra. Millions of graph hops per second? Neo4j. Millions of cache hits per second? Redis.
Step 3: Identify the consistency requirement. Financial transactions, inventory management? → ACID → SQL (or NewSQL). Eventually consistent activity feeds, logs? → NoSQL AP model. Session data that can be regenerated? → Key-Value with TTL.
Step 4: Identify the access pattern. In SQL, design around the data. In NoSQL, design around the query. Ask: “What are the top 3 queries this service must answer quickly?” Then choose and model accordingly.
date as a partition key for a high-volume system, so all today’s
writes hit one node). The fix: composite partition keys or bucketing.Let’s close with the most important takeaway. NoSQL is not a replacement for SQL. It is a toolkit of specialized instruments, each evolved to solve a problem that the relational model cannot solve elegantly at scale.
When an interviewer asks you to design a system, you should be thinking:
And you should be comfortable saying: “In this design, I would use PostgreSQL for the financial core, MongoDB for the user profile service, Redis as the caching layer, and Cassandra for the activity event stream. This is polyglot persistence — each tool solving the problem it was built for.”
That sentence, spoken confidently with reasoning attached, is a senior-engineer answer.
KEY-VALUE : Redis | cache, session, rate-limit | O(1) by key
DOCUMENT : MongoDB | profiles, catalog, CMS | Rich JSON queries
WIDE-COLUMN : Cassandra | IoT, logs, time-series | Partition + range
GRAPH : Neo4j | social, fraud, recommender | Relationship traversal