19. Continual & Online Learning¶

Embedding models evolve — new versions offer better quality but produce vectors incompatible with existing indexes. This chapter covers strategies for managing embedding updates without rebuilding everything.

19.1 The Embedding Versioning Problem¶

When you upgrade from model $v_1$ to $v_2$:

$$ \text{cos}(f_{v_1}(\text{doc}), f_{v_2}(\text{query})) \neq \text{cos}(f_{v_1}(\text{doc}), f_{v_1}(\text{query})) $$

Old and new embeddings live in different vector spaces — distances between them are meaningless.

Migration Strategies¶

Strategy	Downtime	Cost	Quality
Full re-embed	None (dual-index)	High (re-compute all)	Best
Lazy re-embed	None	Gradual	Mixed quality during transition
Compatible training	None	Training cost	Model may be constrained
Adapter	None	One-time training	Good but imperfect alignment

19.2 Compatibility-Aware Training¶

Train $v_2$ to produce embeddings aligned with $v_1$:

$$ \mathcal{L}_{\text{compat}} = \mathcal{L}_{\text{task}} + \lambda \sum_{i} \|f_{v_2}(x_i) - f_{v_1}(x_i)\|^2 $$

This backward-compatible training lets new queries work against old embeddings.

Matryoshka embeddings help

Models trained with Matryoshka Representation Learning produce embeddings where the first $k$ dimensions form a valid $k$-dimensional embedding. This enables dimension-adaptive search and gradual migration.

19.3 Concept Drift Detection¶

Monitoring Distribution Shift¶

$$ \text{MMD}^2(\mathcal{D}_{\text{old}}, \mathcal{D}_{\text{new}}) = \|\mu_{\text{old}} - \mu_{\text{new}}\|^2 + \text{trace}(\Sigma_{\text{old}} + \Sigma_{\text{new}} - 2\Sigma_{\text{cross}}) $$

When the Maximum Mean Discrepancy (MMD) between recent and historical embedding distributions exceeds a threshold, trigger alerts.

Practical Signals¶

Signal	Indicates	Action
Centroid drift > 5%	Data distribution changed	Retrain IVF centroids
Recall drop > 3%	Index quality degraded	Rebuild graph
Query-result distance increase	Embedding space diverged	Re-embed or fine-tune

19.4 Online Index Updates¶

Incremental vs. Periodic Rebuild¶

flowchart TD
    A{Update frequency?}
    A -->|Minutes| B[Incremental: insert into live HNSW]
    A -->|Daily| C[Periodic: nightly rebuild from scratch]
    A -->|Weekly| D[Full: rebuild + parameter re-tune]

Shadow Index Pattern¶

Build a new index in the background
Run a recall test against ground truth
If quality ≥ threshold, swap atomically
Roll back if quality regresses

References¶

Kusupati, A., et al. (2022). Matryoshka Representation Learning. NeurIPS.
Shen, Y., et al. (2020). Backwards-Compatible Representation Learning. CVPR.