# Roadmap This document describes what we plan to build next. If you are looking for runnable examples, see `docs/demos.md`. If you are looking for what already exists, start with: - `docs/feature-index.md` for a map of features to behavior specifications and modules. - `CHANGELOG.md` for released changes. ## Principles - Behavior specifications are the authoritative definition of behavior. - Every behavior that exists is specified. - Validation and documentation are part of the product. - Raw corpus items remain readable, portable files. - Derived artifacts are stored under the corpus and can coexist for multiple implementations. ## Completed foundations These are the capability slices that already exist and have end-to-end behavior specifications. ### Retrieval evaluation and datasets - Dataset authoring workflow for small hand-labeled sets and larger synthetic sets. - Evaluation reports with per-query diagnostics and summary metrics. - Versioned dataset formats and deterministic reports for stable inputs. ### Retrieval quality upgrades - Tuned lexical baseline with BM25, n-gram range controls, and stop word policies. - Reranking stage for top-N candidates with explicit stage metadata. - Hybrid retrieval with explicit fusion weights and stage-level scores. ### Context pack policy surfaces - Policy variants for formatting, ordering, and metadata inclusion. - Token and character budget strategies with explicit selectors. - Documentation and examples that show how policy choices change outputs. ### Extraction evaluation harness This evaluation harness compares extraction approaches in a way that is measurable, repeatable, and useful for practical engineering decisions. ### Corpus analysis tools Lightweight analysis utilities summarize corpus themes and guide curation: - Basic corpus profiling with deterministic metrics for raw items and extracted text. - Topic modeling with BERTopic and optional LLM-assisted labeling. - Side-by-side analysis outputs stored under the corpus for reproducible comparison. ### Sequence analysis (Markov analysis) Goal: provide a sequence-oriented analysis backend for corpora where order matters (conversations, timelines, logs). Deliverables: - Markov analysis for sequence-driven corpora (including hidden Markov models where appropriate). - A report format that explains state transitions and emissions with evidence. - Evaluation guidance for comparing HMM outputs across corpora or snapshots. Acceptance checks: - HMM analysis is reproducible for the same corpus state and extraction snapshot. - Reports are exportable and readable without custom tooling. ### Text utilities Small, reusable building blocks for transforming text in ways that are hard to do reliably with one-shot generation. Deliverables: - Text extraction and slicing utilities that operate via a virtual file editing tool loop. - Optional higher-level utilities built on the same pattern (annotation, linking, redaction). - Documentation and runnable demos that show the mechanism and how to use each utility. Acceptance checks: - Utilities have end-to-end behavior specifications and are fully covered by tests. - Integration tests can be run against real model APIs when configured. ## Next: Tactus integration Goal: make Biblicus usable from durable agent workflows without baking assistant logic into Biblicus itself. Deliverables: - A Model Context Protocol (MCP) toolset surface for Biblicus (ingest, query, stats, and evidence retrieval). - Clear dependency wiring for secrets and network access (in-sandbox vs brokered). - One reference procedure demonstrating retrieval-augmented generation built on Biblicus evidence outputs. Acceptance checks: - Tools expose evidence-first outputs with stable schemas. - Procedures remain in control of prompting and context budgeting policy. ## Later: alternate backends and hosting modes Goal: broaden the backend surface while keeping the core predictable. Deliverables: - A second backend with different performance tradeoffs. - A tool server that exposes a backend through a stable interface. - Documentation that shows how to run a backend out of process. Acceptance checks: - Local tests remain fast and deterministic. - Integration tests validate retrieval through the tool boundary. ## Deferred: corpus and extraction work These are valuable, but intentionally not the near-term focus while retrieval becomes practical end to end. ### In-memory corpus for ephemeral workflows Goal: allow programmatic, temporary corpora that live in memory for short-lived agents or tests. Deliverables: - A memory-backed corpus implementation that supports the same ingestion and catalog APIs. - A serialization option for snapshots so ephemeral corpora can be persisted when needed. - Documentation that explains tradeoffs versus file-based corpora. Acceptance checks: - Behavior specifications cover ingestion, listing, and reindexing in memory. - Retrieval and extraction can operate on the in-memory corpus without special casing.