# Adding a Retrieval Backend Backends are pluggable engines that implement a small, stable interface. The goal is to make new retrieval ideas easy to test without reshaping the corpus. For user documentation on available backends, see the [Backend Reference](backends/index.md). ## Backend contract Backends implement two operations: - **Build run**: create a `RetrievalRun` manifest (and optional artifacts). - **Query**: return structured `Evidence` objects under a `QueryBudget`. ## Run artifacts Backends store artifacts and manifests under: ``` retrieval/// manifest.json ``` The manifest is the reproducible contract. Artifacts are backend-specific and listed in `artifact_paths`. ## Implementation checklist 1. **Define a Pydantic configuration model** for your backend configuration. 2. **Implement `RetrievalBackend`**: - `build_run(corpus, configuration_name, config)` - `query(corpus, run, query_text, budget)` 3. **Emit `Evidence`** with required fields: - `item_id`, `source_uri`, `media_type`, `score`, `rank`, `stage`, `configuration_id`, `snapshot_id` - `text` **or** `content_ref` 4. **Register the backend** in `biblicus.backends.available_backends`. 5. **Add behavior-driven development specifications** before implementation and make them pass with 100% coverage. ## Design notes - Treat **runs** as immutable manifests with reproducible parameters. - If your backend needs artifacts, store them under `retrieval/` and record paths in `artifact_paths`. - Keep **text extraction** in explicit pipeline stages, not in backend ingestion. See `docs/extraction.md` for how extraction snapshots are built and referenced from backend configs. ## Reproducibility checklist - Record the extraction snapshot reference used to build the backend. - Keep the backend configuration configuration in source control. - Reuse the same `QueryBudget` when comparing backends. ## Common pitfalls - Returning evidence without `text` or `content_ref`. - Mutating artifacts after a run is created (breaks reproducibility). - Comparing runs built from different extraction outputs. ## Examples See: - `biblicus.retrievers.scan.ScanRetriever` (minimal baseline) - `biblicus.retrievers.sqlite_full_text_search.SqliteFullTextSearchRetriever` (practical local backend) - `biblicus.retrievers.tf_vector.TfVectorRetriever` (term-frequency vector baseline; `tf-vector`)