Part I — Foundations¶
This section builds the theoretical and algorithmic bedrock needed to understand vector databases. We start with the geometry of high-dimensional spaces, work through the major families of approximate nearest neighbor algorithms, and finish with the practical engineering of data ingestion pipelines.
Chapters¶
| # | Chapter | Key Topics |
|---|---|---|
| 1 | High-Dimensional Geometry | Vector spaces, norms, cosine similarity, curse of dimensionality, dimensionality reduction |
| 2 | ANN Algorithms | KD-Trees, LSH, HNSW, NSG, Product Quantization, Annoy, ScaNN, DiskANN |
| 3 | Index-Storage Trade-offs | Memory hierarchies, cache-oblivious layouts, quantization vs. recall |
| 4 | Query Semantics & Similarities | k-NN vs. range search, hybrid predicates, multi-vector queries |
| 5 | Data Ingestion & Vectorization | Transformers, word2vec, multimodal embeddings, batch vs. streaming |
Prerequisites
Familiarity with linear algebra (vectors, matrices, inner products) and basic algorithm analysis (Big-O) is assumed. Appendix A provides a concise math refresher.