Before we write a single line of query syntax, let’s think about a library.
Imagine a massive library with millions of books. The classic librarian system — a relational one — organizes books into tables: one shelf for authors, one for titles, one for publishers. To find “every book written by an author who was influenced by Kafka and later influenced Haruki Murakami,” the librarian has to consult three different ledgers, flip back and forth, cross-reference IDs, and staple the results together. This process is called a JOIN — and in a digital database, the cost of that stapling grows painfully as the number of connections multiplies.
Now imagine a different kind of library. In this one, each book has a physical string tied to every related book — by influence, by genre, by shared publishers. To answer the same question, you simply grab the Kafka book and follow the strings. No ledgers. No cross-referencing. Just traversal.
That is the fundamental promise of a graph database.
Graph databases are purpose-built for data where the relationships between things are as important as the things themselves. They don’t store relationships as a side effect of foreign keys — they store them as first-class citizens with their own identity, direction, and attributes.
This distinction seems subtle until you encounter data that’s deeply connected: social networks, fraud rings, recommendation engines, supply chains, knowledge graphs. In those domains, the relational model collapses under its own JOIN overhead, and graph databases begin to shine.
Let’s construct a graph database from scratch by identifying its atomic units. There are exactly three.
Nodes are the entities — the things in your domain. A person. A product. A city. A bank account. In the diagram below, circles represent nodes.
Edges (also called relationships or arcs) are the connections between nodes. Unlike foreign keys in SQL, edges in a graph database are stored as physical pointers. An edge always has:
FOLLOWS, PURCHASED, LIVES_IN)Properties are the key-value metadata attached to either a node or
an edge. A Person node might have {name: "Alice", age: 29}. A
FRIENDS_WITH edge might have {since: "2018-04-11"}.
name: "Alice" name: "Bob"
age: 29 age: 31
.-----------. .-----------.
| Person | | Person |
| (Alice) | | (Bob) |
'-----------' '-----------'
| ^
| FOLLOWS |
| {since: "2021-03-01"} |
'----------------------------'
In Figure 1, notice that the FOLLOWS relationship has a direction (Alice follows Bob) and carries its own metadata as a property. The nodes and the relationship are peers — each stored with equal importance.
Together, these three primitives form what is formally called a Labeled Property Graph (LPG) — the data model used by Neo4j, Amazon Neptune, and the emerging GQL ISO standard.
Before we go further, we need to understand exactly why the relational model struggles with highly connected data. This is the “broken state” that graph databases exist to fix.
Consider a social network. In a relational database, you store users in
a users table and friendships in a friendships table with two
foreign keys: user_id_1 and user_id_2. Querying “friends of friends”
requires a self-JOIN:
# Pseudocode representing a SQL-style approach in Python
# "Friends of Alice's friends"
query = """
SELECT DISTINCT u3.name
FROM users u1
JOIN friendships f1 ON u1.id = f1.user_id_1
JOIN users u2 ON u2.id = f1.user_id_2
JOIN friendships f2 ON u2.id = f2.user_id_1
JOIN users u3 ON u3.id = f2.user_id_2
WHERE u1.name = 'Alice'
"""
Now query “friends of friends of friends.” You need three JOINs. Four degrees of separation? Four JOINs. Each additional hop multiplies the query complexity. For a network with millions of nodes, this quickly becomes intractable. The relational database was never designed with traversal as a first-class operation.
This is the architectural breakthrough that separates a “real” graph database from a relational database pretending to be one.
In a relational database, when you want to find all friends of Alice,
the engine must consult a global index (typically a B-Tree) on the
friendships table. That lookup costs O(log N) where N is the total
number of friendships in the system. Every hop you take, you pay that
cost again.
In a native graph database like Neo4j, each node stores direct physical pointers to its neighboring nodes and edges — no global index needed. Finding Alice’s friends means starting at Alice’s node and following the pointers that are literally stored next to her. Each hop costs O(1) — constant time, regardless of how large the overall graph becomes.
RELATIONAL MODEL GRAPH MODEL (Index-Free Adjacency)
───────────────── ──────────────────────────────────
friendships table Alice node
┌──────────────────────┐ ┌──────────────────────────────┐
│ user_id_1 │ user_id_2│ │ name: "Alice" │
│ 1 │ 2 │ │ → ptr to Bob node │
│ 1 │ 3 │ │ → ptr to Carol node │
│ ... │ ... │ │ → ptr to Dave node │
└──────────────────────┘ └──────────────────────────────┘
↓ ↓
B-Tree global index lookup Direct pointer dereference
O(log N) per hop O(1) per hop
In Figure 2, the relational model must scan or index into a central table at every hop. The graph model stores relationship pointers directly on the node, making traversal a matter of following a physical address.
This design choice is why graph databases can answer “Who are Alice’s friends-of-friends?” on a billion-node social graph in milliseconds while a relational database is still waiting for its JOINs to complete.
There is one caveat worth noting: to find the starting node (Alice’s node), the graph engine does use an index — typically a B-Tree on node labels and properties. Once you have that starting node, however, all subsequent traversal hops are O(1).
So far we’ve been operating inside the Labeled Property Graph (LPG) model. But there’s a second model you need to know for interviews, particularly when knowledge graphs and semantic web technologies come up: RDF (Resource Description Framework).
This is the dominant model in production systems. The core philosophy is pragmatic: model your domain as nodes and edges with rich property bags.
| Feature | Description |
|---|---|
| Nodes | Entities with labels and properties |
| Edges | Directed, typed relationships with properties |
| Schema | Schema-optional (flexible by default) |
| Query Language | Cypher (Neo4j), Gremlin (Apache TinkerPop), GQL |
| Databases | Neo4j, Amazon Neptune, Memgraph, FalkorDB |
The LPG model’s greatest strength is its flexibility. You can add a new relationship type between two nodes without migrating a schema or touching existing data. This makes it ideal for domains that evolve quickly.
RDF comes from the Semantic Web tradition. Instead of nodes and
edges, everything is expressed as a triple: Subject → Predicate → Object.
<Alice> <follows> <Bob>
<Alice> <livesIn> <Milwaukee>
<Bob> <worksAt> <Acme Corp>
Every entity and relationship is a URI — a globally unique identifier. This makes RDF perfect for interoperability: two completely separate organizations can link their data using shared URIs without coordination.
| Feature | Description |
|---|---|
| Unit of storage | Subject-Predicate-Object triple |
| Identifiers | URIs (globally unique, web-compatible) |
| Schema | Enforced via OWL/RDFS ontologies |
| Query Language | SPARQL |
| Databases | Stardog, GraphDB (Ontotext), Amazon Neptune (RDF) |
The RDF model enables reasoning — using ontology rules to infer facts
not explicitly stated. If we assert that Alice is a Person and every Person is a Mammal, an RDF reasoner can infer that Alice is a Mammal
without you writing that triple explicitly.
Think of it this way: if you’re building an application-facing graph (social network, recommendation engine, fraud detection), choose LPG — it’s more performant and flexible. If you’re building a knowledge integration system (enterprise knowledge graph, biomedical data fusion, semantic search), choose RDF — its formalism and interoperability pay dividends.
Now we get to the heart of what you’ll be tested on in interviews. Let’s walk through the three dominant graph query languages, building complexity as we go.
Cypher was created by Neo4j and later open-sourced as OpenCypher, which has been adopted by multiple vendors. Its defining feature is its ASCII-art syntax — Cypher queries literally look like a diagram drawn on a whiteboard.
The grammar rule to remember:
(variable:Label {property: value})-[variable:TYPE {property: value}]->(a)-[:KNOWS]->(b)Let’s build Cypher queries from simple to complex.
Level 1 — Finding a Node:
# Equivalent Cypher (run via Neo4j Python driver)
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
def find_person(tx, name):
result = tx.run(
"MATCH (p:Person {name: $name}) RETURN p",
name=name
)
return [record["p"] for record in result]
with driver.session() as session:
persons = session.read_transaction(find_person, "Alice")
The MATCH keyword is Cypher’s SELECT FROM equivalent. The pattern
(p:Person {name: $name}) means “find any node labeled Person whose
name property equals the given parameter.”
Level 2 — Traversing a Relationship:
def find_friends(tx, name):
result = tx.run(
"""
MATCH (p:Person {name: $name})-[:FRIENDS_WITH]->(friend:Person)
RETURN friend.name AS friend_name
""",
name=name
)
return [record["friend_name"] for record in result]
Notice how the relationship [:FRIENDS_WITH] is written as an arrow
in the middle of the pattern. Reading Cypher is like reading a
whiteboard diagram: “Find a Person named Alice who has a FRIENDS_WITH
relationship pointing to another Person — return those persons.”
Level 3 — Variable-Length Traversal (Friends of Friends):
def find_friends_of_friends(tx, name, depth=2):
result = tx.run(
"""
MATCH (p:Person {name: $name})-[:FRIENDS_WITH*1..2]->(fof:Person)
WHERE fof.name <> $name
RETURN DISTINCT fof.name AS name
""",
name=name
)
return [record["name"] for record in result]
The syntax *1..2 is the variable-length path operator. It says:
“traverse this relationship type between 1 and 2 hops.” Change it to
*1..6 and you’re querying up to 6 degrees of separation — the kind of
query that brings a relational database to its knees but runs naturally
in Cypher.
Level 4 — Shortest Path:
def find_shortest_path(tx, source, target):
result = tx.run(
"""
MATCH path = shortestPath(
(a:Person {name: $source})-[:KNOWS*]-(b:Person {name: $target})
)
RETURN [node IN nodes(path) | node.name] AS path_names,
length(path) AS hops
""",
source=source, target=target
)
return result.single()
shortestPath() is a Cypher built-in that executes a BFS traversal
internally. The result gives both the path nodes and the hop count —
exactly what an interviewer might ask you to implement for a
“LinkedIn degrees of separation” problem.
Gremlin is part of the Apache TinkerPop framework and takes a fundamentally different approach: it is procedural and imperative rather than declarative. Instead of describing a pattern you want to match, you write a series of steps that walk the graph.
Think of Gremlin as a cursor navigating the graph. You “stand” at a node, take a step, filter, take another step, filter, and emit results.
# Using the Gremlin Python client (gremlinpython)
from gremlin_python.driver import client, serializer
gremlin_client = client.Client(
'ws://localhost:8182/gremlin',
'g',
message_serializer=serializer.GraphSONSerializersV2d0()
)
# Find friends of Alice
query = "g.V().has('Person', 'name', 'Alice').out('FRIENDS_WITH').values('name')"
result = gremlin_client.submit(query).all().result()
print(result) # ['Bob', 'Carol', 'Dave']
Let’s break down the Gremlin vocabulary:
| Step | Meaning |
|---|---|
g.V() |
Start from all vertices (nodes) in the graph |
.has(label, key, value) |
Filter to nodes with matching property |
.out(type) |
Traverse outgoing edges of the given type |
.in(type) |
Traverse incoming edges |
.both(type) |
Traverse edges in either direction |
.values(key) |
Extract a property value |
.repeat(step).times(n) |
Repeat a traversal step N times |
Variable-depth traversal in Gremlin:
# Friends of friends of Alice (2 hops)
query = """
g.V().has('Person', 'name', 'Alice')
.repeat(out('FRIENDS_WITH')).times(2)
.dedup()
.values('name')
"""
result = gremlin_client.submit(query).all().result()
The .repeat().times() pattern in Gremlin is the equivalent of
Cypher’s *1..N variable-length paths. The .dedup() step removes
duplicates — an important filter when traversal paths can intersect.
The key philosophical difference: Cypher tells the database what you want; Gremlin tells the database how to get it. This makes Gremlin more verbose but also more controllable for advanced traversal optimization.
SPARQL (pronounced “sparkle”) is the W3C standard query language for RDF data. Since RDF is built on triples, SPARQL queries are essentially requests to match triple patterns against the data.
Where Cypher uses ASCII-art patterns and Gremlin uses traversal steps, SPARQL uses variable binding over triple patterns.
# Using rdflib for SPARQL queries in Python
from rdflib import Graph, Namespace, Literal
from rdflib.namespace import RDF, FOAF
# Build an in-memory RDF graph
g = Graph()
EX = Namespace("http://example.org/")
g.add((EX.Alice, RDF.type, FOAF.Person))
g.add((EX.Alice, FOAF.name, Literal("Alice")))
g.add((EX.Alice, EX.follows, EX.Bob))
g.add((EX.Bob, FOAF.name, Literal("Bob")))
g.add((EX.Bob, EX.follows, EX.Carol))
# SPARQL: Find people that Alice follows
sparql_query = """
PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?followedName
WHERE {
ex:Alice ex:follows ?followed .
?followed foaf:name ?followedName .
}
"""
results = g.query(sparql_query)
for row in results:
print(row.followedName) # "Bob"
In SPARQL, ?followed is a variable — think of it as a SQL column
alias for a pattern binding. The two triple patterns joined in the
WHERE clause effectively traverse from Alice to Bob and then retrieve
Bob’s name.
SPARQL’s superpower — CONSTRUCT and inference:
# SPARQL CONSTRUCT creates new triples from existing ones
infer_query = """
PREFIX ex: <http://example.org/>
CONSTRUCT {
?person ex:indirectlyFollows ?indirect .
}
WHERE {
?person ex:follows ?intermediate .
?intermediate ex:follows ?indirect .
}
"""
new_graph = g.query(infer_query)
The CONSTRUCT form doesn’t just query — it *generates new triples
based on patterns. This is the foundation of semantic reasoning: deriving
implicit knowledge from explicit facts.*
TASK: "Find products bought by Alice's friends"
┌──────────────────────────────────────────────────────────────┐
│ CYPHER (Declarative Pattern) │
│ │
│ MATCH (a:Person {name:'Alice'})-[:FRIENDS_WITH]->(f:Person) │
│ -[:PURCHASED]->(p:Product) │
│ RETURN p.name │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ GREMLIN (Traversal Steps) │
│ │
│ g.V().has('Person','name','Alice') │
│ .out('FRIENDS_WITH') │
│ .out('PURCHASED') │
│ .values('name') │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ SPARQL (Triple Pattern Matching) │
│ │
│ SELECT ?productName WHERE { │
│ ex:Alice ex:friendsWith ?friend . │
│ ?friend ex:purchased ?product . │
│ ?product ex:name ?productName . │
│ } │
└──────────────────────────────────────────────────────────────┘
In Figure 3, notice that all three languages express the same semantic intent — traversing two hops from Alice — but with entirely different syntactic philosophies. Cypher reads like a diagram. Gremlin reads like a pipeline. SPARQL reads like a logic formula.
In April 2024, the database world reached a milestone that hadn’t happened since 1987: a new ISO database language standard was ratified. [web:27] GQL (Graph Query Language) — ISO/IEC 39075 — is now the official international standard for property graph querying. [web:22]
This is significant because it does for graph databases what SQL did for relational ones: it gives the ecosystem a common language that transcends individual vendors.
GQL builds on the foundations of Cypher, PGQL, GSQL, and G-CORE, and is developed by the same ISO working group that oversees SQL. [web:25] This means if you know SQL, GQL’s DDL and DML constructs will feel immediately familiar.
# GQL syntax example (conceptual, supported in Microsoft Fabric, Neo4j, etc.)
# Create a graph
# CREATE GRAPH SocialNetwork
# Insert data (DML)
# INSERT (:Person {name: 'Alice', age: 29}),
# (:Person {name: 'Bob', age: 31}),
# (:Person {name: 'Alice'})-[:FOLLOWS]->(:Person {name: 'Bob'})
# Query (pattern matching, familiar to Cypher users)
# MATCH (a:Person {name: 'Alice'})-[:FOLLOWS]->(b:Person)
# RETURN b.name
For interview purposes: GQL’s emergence signals that the graph database ecosystem is maturing into enterprise-grade standardization. Mentioning GQL in a senior-level system design discussion demonstrates awareness of the current state of the industry.
Understanding query models isn’t just about syntax — it’s also about knowing what happens underneath. When a graph database executes a traversal, it’s running well-known graph algorithms that you’ve likely studied in DSA.
BFS (shortest path, level-by-level)
─────────────────────────────────
[Alice]
/ \
[Bob] [Carol] ← Level 1
/ \ \
[Dave] [Eve] [Frank] ← Level 2
Visit order: Alice → Bob → Carol → Dave → Eve → Frank
DFS (deep traversal, path exploration)
──────────────────────────────────────
[Alice]
/
[Bob]
/
[Dave]
\
[Eve] ← Explores one branch fully before backtracking
In Figure 4, BFS explores level-by-level and is used for shortest-path
queries. DFS explores depth-first and is used for detecting cycles or
exploring all paths. Cypher’s shortestPath() uses BFS internally, while
Gremlin’s .repeat() traversal defaults to DFS unless configured otherwise.
| Operation | Graph DB (LPG, IFA) | Relational DB (with index) |
|---|---|---|
| Single node lookup | O(log N) — index | O(log N) — B-Tree index |
| Traverse 1 hop | O(1) per hop | O(log N) per JOIN |
| Traverse k hops | O(k · avg_degree) | O(k · log N) |
| Shortest path (BFS) | O(V + E) | Impractical beyond 3-4 hops |
| Connected components | O(V + E) | Requires recursive CTEs |
Here V = number of vertices, E = number of edges, N = total rows in the relevant table. The column “O(1) per hop” is what index-free adjacency buys you — that constant factor difference compounds dramatically over k hops.
Graph modeling is as much craft as science. Let’s walk through three canonical patterns.
The classic. Nodes are Person, edges are FRIENDS_WITH (undirected
conceptually, but stored as two directed edges or one with traversal
in both directions).
# Neo4j Cypher: Create social graph
create_query = """
MERGE (alice:Person {name: 'Alice', city: 'Milwaukee'})
MERGE (bob:Person {name: 'Bob', city: 'Chicago'})
MERGE (carol:Person {name: 'Carol', city: 'Milwaukee'})
MERGE (alice)-[:FRIENDS_WITH {since: '2020'}]->(bob)
MERGE (bob)-[:FRIENDS_WITH {since: '2021'}]->(carol)
MERGE (alice)-[:FRIENDS_WITH {since: '2019'}]->(carol)
"""
# Query: Mutual friends of Alice and Bob
mutual_friends_query = """
MATCH (alice:Person {name: 'Alice'})-[:FRIENDS_WITH]->(mutual)
<-[:FRIENDS_WITH]-(bob:Person {name: 'Bob'})
RETURN mutual.name AS mutual_friend
"""
Fraud rings work by sharing identifiers — the same phone number, email, or address across multiple fake accounts. Graph databases can detect these rings efficiently using link analysis.
# Cypher: Detect accounts sharing a phone number (potential fraud ring)
fraud_query = """
MATCH (a1:Account)-[:HAS_PHONE]->(phone:PhoneNumber)
<-[:HAS_PHONE]-(a2:Account)
WHERE a1 <> a2
WITH phone, collect(DISTINCT a1) + collect(DISTINCT a2) AS suspects
WHERE size(suspects) >= 3
RETURN phone.number AS shared_phone,
[s IN suspects | s.account_id] AS suspect_accounts,
size(suspects) AS ring_size
ORDER BY ring_size DESC
"""
Fraud Ring — Three accounts share one phone number
[Acc #1001] [Acc #1002]
\ /
\ /
HAS_PHONE HAS_PHONE
\ /
[+1-414-555-0101] ← Shared PhoneNumber node
/
HAS_PHONE
/
[Acc #1003]
In Figure 5, three Account nodes converge on a single PhoneNumber node. In a relational database, this pattern requires a GROUP BY with HAVING COUNT > 2 on a JOINed table. In a graph, it’s visible as a structural pattern — a “star” topology around a shared node — and a single Cypher MATCH finds it naturally.
The “customers who bought X also bought Y” pattern is a classic graph traversal: find items purchased by people with similar purchase history.
# Cypher: Content-based product recommendation
recommendation_query = """
MATCH (user:User {id: $user_id})-[:PURCHASED]->(item:Product)
<-[:PURCHASED]-(similar_user:User)
-[:PURCHASED]->(recommended:Product)
WHERE NOT (user)-[:PURCHASED]->(recommended)
AND user <> similar_user
WITH recommended, count(similar_user) AS score
RETURN recommended.name AS product,
score
ORDER BY score DESC
LIMIT 10
"""
This query is a two-hop traversal: from User → Products (via PURCHASED)
→ Similar Users (via the same products) → Other Products. The count(similar_user)
aggregation acts as the collaborative filtering signal — products purchased
by more similar users rank higher.
This is a decision you will be asked to defend in system design interviews. Here is a clear decision framework.
Common domains: social networks, fraud detection, knowledge graphs, recommendation systems, network/IT topology, identity resolution.
In practice, many production systems combine both. Transactional and reporting data lives in PostgreSQL or a similar RDBMS. The relationship graph — social connections, recommendation signals, fraud patterns — lives in Neo4j or Neptune. An application layer queries both and combines the results.
Application Layer
│
┌──────┴──────────────┐
│ │
▼ ▼
[PostgreSQL] [Neo4j / Neptune]
Users, Orders, Social graph,
Products, Payments Recommendations,
(Transactional) Fraud patterns
(Relationship traversal)
In Figure 6, the two databases serve fundamentally different query patterns. PostgreSQL handles ACID transactions and aggregation-heavy reporting. Neo4j handles multi-hop traversal and pattern matching. The application code decides which engine to query based on the type of question being asked.
Let’s anchor everything with the kind of design problem you’ll face.
Problem: “Design the backend for a LinkedIn-style ‘People You May Know’ feature.”
A strong answer using graph databases might unfold like this:
Step 1 — Model the Domain:
User nodes with properties {userId, name, headline}CONNECTED_TO edges with {connected_since}WORKS_AT edges to Company nodesATTENDED edges to School nodesStep 2 — Choose the Engine: Neo4j or Amazon Neptune (LPG, Cypher/Gremlin). RDF is unnecessary here — we don’t need semantic reasoning, just fast traversal.
Step 3 — Write the Core Query:
# Cypher: People You May Know
pymk_query = """
MATCH (me:User {userId: $user_id})-[:CONNECTED_TO]->(friend)
-[:CONNECTED_TO]->(suggestion:User)
WHERE NOT (me)-[:CONNECTED_TO]->(suggestion)
AND me <> suggestion
WITH suggestion, count(friend) AS mutual_connections
WHERE mutual_connections >= 2
RETURN suggestion.userId,
suggestion.name,
mutual_connections
ORDER BY mutual_connections DESC
LIMIT 20
"""
Step 4 — Address Scale:
User.userId for O(log N) entry point lookupStep 5 — Discuss Trade-offs: At truly massive scale (billions of users, Facebook-level), even native graph DBs hit partitioning challenges. Cross-partition hops add network latency. This is why Facebook historically used custom graph engines (TAO) rather than off-the-shelf graph databases. For a typical enterprise-scale system with tens to hundreds of millions of users, Neo4j or Neptune handles this comfortably.
We’ve come a long way since the early days of graph databases. A few context-setting notes for interviews:
2000-2007: Graph databases existed but were niche academic tools. The term “NoSQL” hadn’t been coined yet. Relationships were handled poorly by relational databases, and developers lived with the JOINs.
2007: Neo4j is founded, introducing the first production-grade native graph database with index-free adjacency. The property graph model becomes the dominant LPG standard.
2012-2019: Graph database adoption grows 605% as social networks, fraud detection, and recommendation systems become mainstream engineering challenges. [web:6] Apache TinkerPop and Gremlin standardize the traversal API across multiple vendors.
2024: ISO ratifies GQL (ISO/IEC 39075) — the first new ISO database language since SQL in 1987. [web:27] This marks graph databases’ formal graduation into the enterprise standards landscape. Amazon Neptune, Neo4j, and Microsoft Fabric are among the first adopters. [web:22]
2025-2026: The emergence of AI knowledge graphs and retrieval-augmented generation (RAG) systems creates a new wave of demand for graph databases. Connecting LLMs to structured relationship graphs enables richer, more accurate knowledge retrieval than vector stores alone.
Before we close, let’s crystallize the highest-signal concepts for interview contexts.
1. What is index-free adjacency? Each node stores direct physical pointers to its adjacent nodes and relationships. Traversal cost is O(1) per hop, compared to O(log N) for a B-Tree index lookup in a relational JOIN.
2. What is the difference between LPG and RDF? LPG (Labeled Property Graph) stores entities as nodes and relationships as typed, directed edges with properties — flexible and pragmatic. RDF stores everything as subject-predicate-object triples with URI identifiers — formal, interoperable, and supports logical inference.
3. What is Cypher’s shortestPath() doing under the hood?
It executes BFS (Breadth-First Search) starting from the source node,
traversing the graph level by level until it reaches the target. BFS
guarantees the shortest path in an unweighted graph.
4. When would you NOT use a graph database? When queries are primarily aggregation-heavy (SUM/GROUP BY), when data is naturally tabular, or when relationships are shallow (1-hop). In those cases, relational databases or columnar stores are more efficient.
5. What is GQL? ISO/IEC 39075, published April 2024, is the first ISO-standard query language for property graphs. It draws from Cypher, PGQL, GSQL, and G-CORE and is designed to do for graph databases what SQL did for relational ones. [web:27]
6. What are the three query language paradigms?
We started with a library full of strings — a mental model of traversal. We built the graph model atom-by-atom: nodes, edges, properties. We uncovered index-free adjacency as the architectural mechanism that makes graph traversal genuinely O(1) per hop. We distinguished LPG from RDF and traced the philosophical DNA of Cypher, Gremlin, and SPARQL. We applied these concepts to canonical interview problems — fraud rings, social recommendations, and system design.
The central insight to carry out of this article is this: graph databases don’t just store data differently — they think about relationships differently. In a relational database, a relationship is a side effect of matching foreign keys. In a graph database, a relationship is a first-class object with its own identity, direction, and meaning. That shift in philosophy is what unlocks the traversal performance, the expressive query patterns, and the modeling flexibility that makes graph databases indispensable for connected data problems.
When you walk into that interview and the whiteboard problem involves “degrees of separation,” “fraud ring detection,” or “people you may know,” you’ll know exactly which tool to reach for — and why.
```
Here’s a summary of the key sources informing the technical accuracy of this article: