13. Building an HNSW Store in Rust¶

This chapter walks through the design of a production-quality HNSW index in Rust, focusing on memory layout, safety trade-offs, and performance engineering.

13.1 Why Rust for Vector Indexes?¶

Feature	C++	Rust	Go/Java
Memory safety	Manual	Compile-time guaranteed	GC
Performance	Fastest	~Same as C++	2–5× slower
Concurrency	Threads + locks	Ownership + `Send`/`Sync`	Goroutines
GC pauses	None	None	Unpredictable latency spikes
Ecosystem	Mature	Growing rapidly	Large

GC is the enemy of p99 latency

A 50ms GC pause in a Go/Java vector database can violate SLAs. Rust and C++ avoid this entirely.

13.2 Memory Layout Design¶

Flat Vector Storage¶

Store all vectors in a contiguous, memory-mapped Vec<f32>:

/// Contiguous vector storage — cache-friendly, mmap-able.
pub struct VectorStorage {
    data: memmap2::MmapMut,  // memory-mapped file
    dim: usize,
    count: usize,
}

impl VectorStorage {
    /// Get vector by ID — zero-copy from mmap.
    #[inline]
    pub fn get(&self, id: usize) -> &[f32] {
        let offset = id * self.dim * std::mem::size_of::<f32>();
        let bytes = &self.data[offset..offset + self.dim * 4];
        // SAFETY: data is aligned and initialized during build
        unsafe { std::slice::from_raw_parts(bytes.as_ptr() as *const f32, self.dim) }
    }
}

Graph Adjacency Storage¶

Store neighbor lists in a flat array with fixed-width entries (no heap allocation per node):

/// Fixed-width adjacency list for one layer.
/// Layout: [count, n0, n1, ..., n_{M-1}] per node, each u32.
pub struct GraphLayer {
    data: Vec<u32>,
    max_neighbors: usize,
    entry_width: usize,  // = 1 + max_neighbors
}

impl GraphLayer {
    pub fn neighbors(&self, node: u32) -> &[u32] {
        let offset = node as usize * self.entry_width;
        let count = self.data[offset] as usize;
        &self.data[offset + 1..offset + 1 + count]
    }

    pub fn add_neighbor(&mut self, node: u32, neighbor: u32) {
        let offset = node as usize * self.entry_width;
        let count = self.data[offset] as usize;
        if count < self.max_neighbors {
            self.data[offset + 1 + count] = neighbor;
            self.data[offset] += 1;
        }
    }
}

This layout has zero heap fragmentation and is trivially serializable to disk.

13.3 SIMD Distance Kernels in Rust¶

Use the std::arch intrinsics for platform-specific SIMD:

#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

/// AVX2-accelerated L2 squared distance.
#[cfg(target_arch = "x86_64")]
#[target_feature(enable = "avx2", enable = "fma")]
unsafe fn l2_squared_avx2(a: &[f32], b: &[f32]) -> f32 {
    let mut sum = _mm256_setzero_ps();
    let chunks = a.len() / 8;
    for i in 0..chunks {
        let va = _mm256_loadu_ps(a.as_ptr().add(i * 8));
        let vb = _mm256_loadu_ps(b.as_ptr().add(i * 8));
        let diff = _mm256_sub_ps(va, vb);
        sum = _mm256_fmadd_ps(diff, diff, sum);
    }
    // Horizontal sum
    let hi = _mm256_extractf128_ps(sum, 1);
    let lo = _mm256_castps256_ps128(sum);
    let sum128 = _mm_add_ps(lo, hi);
    let sum128 = _mm_hadd_ps(sum128, sum128);
    let sum128 = _mm_hadd_ps(sum128, sum128);
    let mut result = _mm_cvtss_f32(sum128);
    // Remainder
    for i in (chunks * 8)..a.len() {
        let d = a[i] - b[i];
        result += d * d;
    }
    result
}

13.4 Concurrent Search with `RwLock`¶

Multiple readers can search simultaneously; writers take an exclusive lock:

use std::sync::RwLock;

pub struct ConcurrentHNSW {
    graph: RwLock<Vec<GraphLayer>>,
    vectors: VectorStorage,
}

impl ConcurrentHNSW {
    pub fn search(&self, query: &[f32], k: usize) -> Vec<(f32, u32)> {
        let graph = self.graph.read().unwrap();  // shared lock
        // ... beam search using graph ...
        todo!()
    }

    pub fn insert(&self, vector: &[f32]) -> u32 {
        let mut graph = self.graph.write().unwrap();  // exclusive lock
        // ... insert into graph ...
        todo!()
    }
}

Write starvation

Under heavy read load, writers may starve. Use a parking_lot::RwLock which provides writer-preferring fairness.

13.5 Persistence and Recovery¶

On-Disk Format¶

index.hnsw
├── header (8 bytes: magic, version)
├── metadata (dim, M, ef, entry_point, max_layer)
├── vectors.bin (n × dim × 4 bytes, mmap'd)
├── graph_layer_0.bin (n × entry_width × 4 bytes)
├── graph_layer_1.bin (...)
└── ...

WAL for Crash Recovery¶

Append (INSERT, id, vector) to WAL
Insert into in-memory graph
On flush, write graph layers to disk
On startup, replay WAL entries after last checkpoint

13.6 Optimization Checklist¶

Optimization	Impact	Complexity
SIMD distance kernels	5–8× throughput	Medium
Memory-mapped vectors	Zero-copy, OS-managed paging	Low
Flat adjacency arrays	Cache-friendly, no allocator overhead	Medium
`RwLock` for concurrency	Linear read scaling	Low
Prefetch neighbor vectors	10–30% latency reduction	Medium
PQ re-ranking	10–50× memory reduction	High

References¶

Malkov, Y. A., & Yashunin, D. A. (2020). HNSW. IEEE TPAMI.
Qdrant source code (Rust HNSW implementation). https://github.com/qdrant/qdrant