Vector databases
2024-08-12 — 2026-04-05
Wherein the approximate nearest neighbour problem is introduced, brute-force search is found wanting at scale, and several embedded and managed solutions are surveyed.
Databases for proximity search over vector embeddings. These are important in classic search, recommendation, and AI search systems.
For worked examples of vector search at blog scale (brute-force numpy) and with a proper retrieval tool (QMD), see AI search.
- Vector Database Primer
- What is a Vector Database? | Pinecone
- Vector Databases as Memory for your AI Agents | by Ivan Campos
1 When do we need one?
At small scale, we don’t. For example, the similarity search on this blog uses ~2000 vectors at 1024 dimensions stored in a flat numpy .npz file; brute-force Q @ E.T takes milliseconds and the whole matrix is about 7 MB.
Things start to change around 100k vectors. The core operation in any vector search — “given a query vector, find the \(k\) most similar vectors in the corpus” — is exact nearest neighbour search, and brute-force exact search is \(\mathcal{O}(Nd)\) per query (where \(N\) is the number of vectors and \(d\) is the dimensionality). At \(N = 2000\) this is trivial for typical vectors. At \(N = 100{,}000\) I might need to worry about quitting other apps so I can search. At \(N = 1{,}000{,}000\) a naive implementation will hard crash my laptop or be too slow to be usable.
2 The approximate nearest neighbor (ANN) problem
The fix is to accept approximate nearest neighbours — results that are very likely to include the true top-\(k\), but aren’t guaranteed to. In practice the recall loss is small (often >95%) and the speed gain is dramatic: \(\mathcal{O}(\log N)\) or better per query instead of \(\mathcal{O}(N)\).
This is a well-studied problem with a surprisingly active benchmarking community. ANN Benchmarks maintains standardised comparisons of algorithms across datasets, measuring the recall-vs-queries-per-second tradeoff. It’s worth browsing to get a feel for how the algorithms compare in practice — the Pareto frontiers are instructive.
The main families of ANN algorithms:
- HNSW (Hierarchical Navigable Small Worlds) — a graph-based index where each vector is a node, and edges connect nearby vectors at multiple scales. Query traversal walks the graph greedily from coarse to fine layers. This is the most popular choice in practice, used by Qdrant, Weaviate, pgvector, and most managed vector databases. The tradeoff: excellent recall/speed, but memory-hungry since the graph structure lives in RAM alongside the vectors.
- IVF-PQ (Inverted File Index with Product Quantization) — partitions the vector space into Voronoi cells (IVF), then compresses vectors within each cell using product quantization (PQ). At query time, only a few cells are searched. Used by FAISS. Better memory efficiency through quantization, at the cost of slightly lower recall.
- DiskANN — Microsoft’s approach for datasets that don’t fit in RAM. Builds a graph index on disk with a small in-memory “entry point” structure. Good for billion-scale search on commodity hardware.
At 10M+ vectors (~40 GB in float32 for 1024d), you also need memory-mapped storage or sharding — a flat array no longer fits comfortably in RAM. This is where a heavy-duty tool like a dedicated vector database earns its keep: it combines an ANN index with persistence, filtering, and (for the server options) replication.
3 Embedded (serverless) options
For the middle ground — 10k to ~1M vectors — embedded databases that run as a library (no server process) are often the right fit.
3.1 ChromaDB
ChromaDB is a vector database with a focus on search and retrieval, backed by SQLite. I used it to store vector embeddings for “similar posts” on this site and I can report it was incredibly easy for my use case. No fancy extra DB servers required.
3.2 LanceDB
LanceDB is an embedded vector database built on the Lance columnar format. It’s designed for larger-than-memory datasets with memory-mapped I/O, and supports both full-text and vector search. Like ChromaDB, no server process — just files.
3.3 QMD
QMD uses SQLite for both its BM25 full-text index and its vector embeddings. It’s more of a search tool than a database, but it illustrates the embedded approach well. See the worked example on this blog.
3.4 Voyager
Voyager is Spotify’s HNSW library, based on hnswlib with bindings for Python and Java. It’s reportedly used hundreds of millions of times per day at Spotify. I think it replaces the earlier Annoy.
4 Managed / server options
For production systems at scale (millions to billions of vectors), we typically need a dedicated server.
4.1 Pinecone
Pinecone is a fully managed vector database — no infrastructure to operate. The tradeoff is vendor lock-in and cost at scale.
4.2 Qdrant
Qdrant is open source, HNSW-based, and can run self-hosted or managed. Supports filtering, payload storage, and hybrid search.
4.3 Weaviate
Weaviate is open source with a GraphQL API. Supports vectorization at ingest time (bring your own model or use built-in integrations).
4.4 Milvus
Built on top of popular vector search libraries including Faiss, Annoy, HNSW, and more, Milvus was designed for similarity search on dense vector datasets containing millions, billions, or even trillions of vectors. Before proceeding, familiarize yourself with the basic principles of embedding retrieval.
Milvus also supports data sharding, data persistence, streaming data ingestion, hybrid search between vector and scalar data, time travel, and many other advanced functions. The platform offers performance on demand and can be optimized to suit any embedding retrieval scenario. We recommend deploying Milvus using Kubernetes for optimal availability and elasticity.
Milvus adopts a shared-storage architecture featuring storage and computing disaggregation and horizontal scalability for its computing nodes. Following the principle of data plane and control plane disaggregation, Milvus comprises four layers: access layer, coordinator service, worker node, and storage. These layers are mutually independent when it comes to scaling or disaster recovery.
Milvus Lite is a simplified alternative to Milvus that offers many advantages and benefits.
- You can integrate it into your Python application without adding extra weight.
- It is self-contained and does not require any other dependencies, thanks to the standalone Milvus’ ability to work with embedded Etcd and local storage.
- You can import it as a Python library and use it as a command-line interface (CLI)-based standalone server.
- It works smoothly with Google Colab and Jupyter Notebook.
- You can safely migrate your work and write code to other Milvus instances (standalone, clustered, and fully-managed versions) without any risk of losing data.
