
The Graph is the Index
Why traditional tree-based indexing fails for high-dimensional vectors and how the HNSW graph enables sub-millisecond similarity search.
I used to think that indexing was just a matter of sorting things efficiently. If you have a list of numbers, you sort them and use binary search. If you have a database of names, you build a B-Tree. It’s predictable, it’s logarithmic, and it’s been the backbone of computer science for decades. But then I started working with 1536-dimensional embeddings from OpenAI, and suddenly, every "logical" way of organizing data just... broke.
In three dimensions, we can visualize distance. In ten dimensions, things get weird. In a thousand dimensions, the very concept of "nearness" becomes a computational nightmare. Traditional tree-based indexing, the kind we’ve relied on since the 70s, falls victim to the "Curse of Dimensionality."
To solve this, we had to stop thinking about partitions and start thinking about neighborhoods. We had to realize that for high-dimensional vectors, the graph *is* the index.
Why your B-Tree is useless here
In a standard SQL database, if you want to find a value, the engine looks at a B-Tree. It asks: "Is this value greater than or less than X?" It narrows the search space by half at every step. This works because numbers exist on a one-dimensional line.
Vector embeddings don't live on a line. They live in a manifold. When you’re trying to find the "nearest neighbor" to a query vector in 1536-dimensional space, you can't just ask "is it greater than."
You might think, "Well, let's just use a KD-Tree." A KD-Tree (k-dimensional tree) tries to split space into boxes. In 2D or 3D, it’s beautiful. But as the number of dimensions ($D$) increases, the number of nodes you have to visit grows exponentially. Once you hit about 20 or 30 dimensions, a KD-Tree performs no better than a brute-force linear scan.
The math simply stops cooperating. In high-dimensional space, almost all points are "far" from each other, and the "corners" of your hypercubes contain almost all the volume. If you try to use a tree to index these, you’ll end up backtracking through every single branch. You aren't searching; you're just wandering.
The Intuition: Small Worlds and Navigable Graphs
If trees fail us, we need a different structure. Enter the Navigable Small World (NSW).
The idea is borrowed from social network theory—the "six degrees of separation" concept. In a small-world graph, most nodes aren't neighbors, but most nodes can be reached from every other node by a small number of hops.
If I want to find a point in a graph, I start at a random entry point. I look at its neighbors and move to the one that is closest to my target. I repeat this until I can't get any closer. This is a "greedy" search.
The problem? You can get stuck in "local minima." You might find a point that is closer than all its immediate neighbors, but there’s a much better point on the other side of the graph that you can't "see" because there’s no direct edge.
HNSW: Adding the Hierarchy
Hierarchical Navigable Small Worlds (HNSW) solved the local minima problem by stealing a page from the Skip List handbook.
Imagine a building.
- Level 0 (The Basement): Every single data point is here, connected by many edges. It’s a dense thicket of nodes.
- Level 1: Only 10% of the points are here. The edges cover longer distances.
- Level 2: Only 1% of the points are here. The edges are massive, spanning across the entire "map" of your data.
When you search, you start at the very top layer. Because there are so few nodes, you can traverse the entire width of your dataset in two or three hops. Once you find the "closest" node on the top layer, you drop down to the next layer and start your search from that same point.
You aren't just searching; you're zooming in.
A Practical Example with Faiss
Let's look at how this looks in code. Faiss (Facebook AI Similarity Search) is the industry standard for this. It’s written in C++ but has excellent Python bindings.
import numpy as np
import faiss
import time
# Let's generate 100,000 "documents" in 128-dimensional space
d = 128
nb = 100000
nq = 10 # We'll search for 10 vectors
np.random.seed(42)
# Generate some random data
xb = np.random.random((nb, d)).astype('float32')
xq = np.random.random((nq, d)).astype('float32')
# M is the number of neighbors for each node.
# Higher M = more accuracy, more memory, slower build.
M = 32
# Create the HNSW index
index = faiss.IndexHNSWFlat(d, M)
# Check if index is trained (HNSW doesn't need training like IVF)
print(f"Is index trained? {index.is_trained}")
# Add vectors to the index
start_time = time.time()
index.add(xb)
print(f"Time to build index: {time.time() - start_time:.4f}s")
# Search for the 5 nearest neighbors
k = 5
start_time = time.time()
distances, indices = index.search(xq, k)
print(f"Search time: {time.time() - start_time:.4f}s")
print(f"Top 5 neighbors for the first query:\n{indices[0]}")In this snippet, IndexHNSWFlat builds the graph structure. Notice the M parameter. This is the maximum number of outgoing connections in the graph. If you set M too low, the graph is too sparse and search might fail to find the true nearest neighbor. Set it too high, and your memory usage explodes.
The Parameters That Actually Matter
When you're deploying HNSW in production (whether via hnswlib, Faiss, or a vector DB like Weaviate or Pinecone), you'll run into three specific parameters that feel like magic numbers until you understand the trade-offs.
1. M: As mentioned, the number of bi-directional links created for every new element during construction. 16–64 is the sweet spot for most applications.
2. ef_construction: This is how many neighbors the algorithm explores while *building* the index. A higher number means a more accurate graph but a much slower indexing time. I usually set this to 200 or 400.
3. ef_search: This is how many neighbors are explored during a *query*. This is a dynamic parameter. You can increase ef_search at query time to get more accuracy without rebuilding the index.
Here is a more low-level implementation using hnswlib, which I find gives you a bit more "raw" control than Faiss for HNSW-specific tasks.
import hnswlib
import numpy as np
dim = 128
num_elements = 10000
# Generating sample data
data = np.float32(np.random.random((num_elements, dim)))
# Declaring index
# space can be 'l2', 'cosine' or 'ip' (Inner Product)
p = hnswlib.Index(space='l2', dim=dim)
# Initializing index
# max_elements: max capacity of the index
# ef_construction: accuracy/speed tradeoff for construction
# M: connections per node
p.init_index(max_elements=num_elements, ef_construction=200, M=16)
# Adding data
p.add_items(data)
# Controlling the recall by setting ef:
# higher ef leads to better accuracy, but slower search
p.set_ef(50)
# Query the elements
labels, distances = p.knn_query(data[:1], k=3)
print("Result labels:", labels)
print("Result distances:", distances)The "Flat" Problem: Memory Consumption
Here is the "gotcha" that no one tells you about HNSW initially: It is a memory hog.
Because HNSW needs to perform random access across the graph to find neighbors, it generally needs to keep the entire vector set in RAM. In the IndexHNSWFlat example above, "Flat" means the vectors are stored uncompressed.
If you have 10 million vectors, each 1536 dimensions (using 4-byte floats), the raw data alone is:10,000,000 * 1536 * 4 bytes ≈ 61.4 GB.
Add the graph overhead (the pointers for each node), and you’re looking at 70-80 GB of RAM. This gets expensive fast.
To combat this, we often use Product Quantization (PQ). PQ compresses the vectors before sticking them into the HNSW graph. You lose a bit of accuracy (the distance calculations are approximations), but you can reduce the memory footprint by 90% or more.
# Example of HNSW with Product Quantization in Faiss
m_quantizer = 8 # Number of sub-quantizers
bits_per_code = 8 # 8 bits per sub-quantizer
# This index will be much smaller than HNSWFlat
index_pq = faiss.IndexHNSWPQ(d, M, m_quantizer, bits_per_code)
# PQ requires training to understand the data distribution
index_pq.train(xb)
index_pq.add(xb)Why isn't HNSW "Exact"?
One thing that trips up developers coming from a SQL background is that HNSW is an Approximate Nearest Neighbor (ANN) algorithm.
In SQL, if you run SELECT * FROM table WHERE id = 10, you get exactly record 10. In HNSW, if you ask for the nearest neighbor, the algorithm says, "I'm 99% sure this is the closest point, but it's possible I missed one because I didn't check every single node."
For AI applications—like RAG (Retrieval-Augmented Generation) or image search—this is perfectly fine. The embeddings themselves are fuzzy representations of meaning. Seeking 100% precision in a search across 1000-dimensional "meaning-space" is usually overkill. You'd rather have a 2ms response time with 98% accuracy than a 2-second response time with 100% accuracy.
The Edge Cases: When HNSW Bites Back
It’s not all magic and sub-millisecond speeds. I’ve run into a few scenarios where HNSW was the wrong choice:
1. Low Latency, High Throughput, Small Data: If you only have 5,000 vectors, just use a flat index (brute force). Modern CPUs use SIMD instructions that can scan 5,000 vectors faster than the overhead of traversing a graph.
2. Frequently Changing Data: HNSW is great for adding data, but deleting data is a nightmare. Most implementations handle deletion by just "marking" a node as deleted, but the node stays in the graph because removing it would break the connections of its neighbors. Over time, your graph gets fragmented.
3. Low Memory Environments: If you're on a budget and can't afford massive RAM instances, you might need to look at DiskANN or other on-disk indexing strategies. HNSW is designed for the speed of memory.
Building for the Future
We are moving into an era where the "index" is no longer a sorted list, but a navigable map.
The Graph-based approach recognizes that data is relational, even when those relations aren't explicitly defined. By connecting points based on their proximity in high-dimensional space, HNSW allows us to navigate the vast landscape of human language and imagery in a way that feels almost organic.
If you're building a vector-heavy application, don't just treat the index as a black box. Play with M and ef_construction. Understand that you're trading memory for speed, and precision for time. The graph is the index—and how you build that graph determines whether your application feels like a futuristic AI or a sluggish legacy search bar.
Summary Checklist for HNSW
- Use HNSW when you need sub-10ms search over millions of vectors.
- Tune `M` (16–64) based on the complexity of your data.
- Tune `ef_search` at runtime to balance accuracy and speed.
- Use PQ (Product Quantization) if your RAM costs are spiraling out of control.
- Avoid HNSW if your data requires 100% perfect recall or if you are constantly deleting and re-inserting records.


