12. Observability & Operations¶
Running a vector database in production requires monitoring, alerting, and capacity planning tools tailored to ANN workloads.
12.1 Key Metrics¶
The Four Golden Signals for Vector Search¶
| Signal | Metric | Alert Threshold |
|---|---|---|
| Latency | p50, p95, p99 query latency | p99 > 100ms |
| Traffic | Queries/sec, inserts/sec | Spike > 3× baseline |
| Errors | Failed queries, timeouts | Error rate > 0.1% |
| Saturation | Memory usage, CPU, disk I/O | Memory > 85% |
Vector-Specific Metrics¶
| Metric | Description | Why It Matters |
|---|---|---|
| Recall@k | Fraction of true k-NN found | Quality regression detection |
| Distance ratio | $d_{\text{approx}} / d_{\text{exact}}$ | ANN quality per-query |
| Candidates scanned | Vectors evaluated per query | Early termination indicator |
| Segment count | Number of active segments | Compaction needed? |
| Tombstone ratio | Deleted / total vectors | Index degradation |
| Write amplification | Bytes written / bytes inserted | Storage efficiency |
12.2 Recall Monitoring¶
Synthetic Ground-Truth Testing¶
Periodically run exact brute-force search on a sample and compare with ANN results:
Recall can silently degrade
As data distribution shifts (new embedding model, different content), index quality may drop without any error signals. Automated recall monitoring is essential.
12.3 Capacity Planning¶
Memory Estimation¶
where: - $n$ = number of vectors - $b$ = bytes per dimension (4 for FP32, 1 for INT8) - $M_{\text{avg}}$ = average edges per node - $\text{overhead}$ ≈ 10–20%
QPS Estimation¶
Rule of thumb: 1 core ≈ 500–2000 QPS for 1M vectors at 768 dimensions (HNSW, ef_search=50).
12.4 Auto-Tuning¶
Adaptive ef_search¶
Monitor recall and latency, automatically adjust $\texttt{ef\_search}$:
Adaptive nprobe (IVF)¶
Similar for IVF indexes: increase $n_{\text{probe}}$ when recall drops, decrease when latency is too high.
12.5 Production Operations¶
Common Runbook Items¶
| Scenario | Action |
|---|---|
| p99 latency spike | Check segment count, compaction status, memory pressure |
| Recall dropped | Check data distribution shift, run synthetic recall test |
| Memory growing | Check tombstone ratio, trigger compaction |
| Cold start slow | Pre-warm critical segments, increase boot parallelism |
| Disk space full | Compact, archive old segments, promote SQ8 |
References¶
- Google SRE Book. Monitoring Distributed Systems. https://sre.google/sre-book/monitoring-distributed-systems/