# Embedding index (file-backed) This backend builds an embedding index under a corpus and queries it using exact cosine similarity. It is intended for larger corpora where you want a local, pip-installable workflow that does not depend on an external vector database. ## Backend ID `embedding-index-file` ## What it builds This backend builds a retrieval snapshot that materializes snapshot artifacts under the corpus, for example: - an embedding matrix stored as a NumPy array on disk - an id mapping from chunk identifiers to embedding row offsets - chunk records (text + boundaries + provenance) Queries memory-map the embedding matrix and scan in batches so memory usage stays bounded, even when the index is larger than available RAM. ## Chunking Embeddings are computed over chunks. Chunking is configured per configuration by selecting a chunker and its configuration. Chunking is part of the index contract: evidence references chunk boundaries so you can trace retrieval outputs back to the original item text. ## Dependencies - Requires `numpy`. - Requires an embedding provider configuration.