# Roadmap

This document describes what we plan to build next.

If you are looking for runnable examples, see `docs/demos.md`.

If you are looking for what already exists, start with:

- `docs/feature-index.md` for a map of features to behavior specifications and modules.
- `CHANGELOG.md` for released changes.

## Principles

- Behavior specifications are the authoritative definition of behavior.
- Every behavior that exists is specified.
- Validation and documentation are part of the product.
- Raw corpus items remain readable, portable files.
- Derived artifacts are stored under the corpus and can coexist for multiple implementations.

## Completed foundations

These are the capability slices that already exist and have end-to-end behavior specifications.

### Retrieval evaluation and datasets

- Dataset authoring workflow for small hand-labeled sets and larger synthetic sets.
- Evaluation reports with per-query diagnostics and summary metrics.
- Versioned dataset formats and deterministic reports for stable inputs.

### Retrieval quality upgrades

- Tuned lexical baseline with BM25, n-gram range controls, and stop word policies.
- Reranking stage for top-N candidates with explicit stage metadata.
- Hybrid retrieval with explicit fusion weights and stage-level scores.

### Context pack policy surfaces

- Policy variants for formatting, ordering, and metadata inclusion.
- Token and character budget strategies with explicit selectors.
- Documentation and examples that show how policy choices change outputs.

### Extraction evaluation harness

This evaluation harness compares extraction approaches in a way that is measurable, repeatable, and useful for
practical engineering decisions.

### Corpus analysis tools

Lightweight analysis utilities summarize corpus themes and guide curation:

- Basic corpus profiling with deterministic metrics for raw items and extracted text.
- Topic modeling with BERTopic and optional LLM-assisted labeling.
- Side-by-side analysis outputs stored under the corpus for reproducible comparison.

### Sequence analysis (Markov analysis)

Goal: provide a sequence-oriented analysis backend for corpora where order matters (conversations, timelines, logs).

Deliverables:

- Markov analysis for sequence-driven corpora (including hidden Markov models where appropriate).
- A report format that explains state transitions and emissions with evidence.
- Evaluation guidance for comparing HMM outputs across corpora or snapshots.

Acceptance checks:

- HMM analysis is reproducible for the same corpus state and extraction snapshot.
- Reports are exportable and readable without custom tooling.

### Text utilities

Small, reusable building blocks for transforming text in ways that are hard to do reliably with one-shot generation.

Deliverables:

- Text extraction and slicing utilities that operate via a virtual file editing tool loop.
- Optional higher-level utilities built on the same pattern (annotation, linking, redaction).
- Documentation and runnable demos that show the mechanism and how to use each utility.

Acceptance checks:

- Utilities have end-to-end behavior specifications and are fully covered by tests.
- Integration tests can be run against real model APIs when configured.

## Next: Tactus integration

Goal: make Biblicus usable from durable agent workflows without baking assistant logic into Biblicus itself.

Deliverables:

- A Model Context Protocol (MCP) toolset surface for Biblicus (ingest, query, stats, and evidence retrieval).
- Clear dependency wiring for secrets and network access (in-sandbox vs brokered).
- One reference procedure demonstrating retrieval-augmented generation built on Biblicus evidence outputs.

Acceptance checks:

- Tools expose evidence-first outputs with stable schemas.
- Procedures remain in control of prompting and context budgeting policy.

## Later: alternate backends and hosting modes

Goal: broaden the backend surface while keeping the core predictable.

Deliverables:

- A second backend with different performance tradeoffs.
- A tool server that exposes a backend through a stable interface.
- Documentation that shows how to run a backend out of process.

Acceptance checks:

- Local tests remain fast and deterministic.
- Integration tests validate retrieval through the tool boundary.

## Deferred: corpus and extraction work

These are valuable, but intentionally not the near-term focus while retrieval becomes practical end to end.

### In-memory corpus for ephemeral workflows

Goal: allow programmatic, temporary corpora that live in memory for short-lived agents or tests.

Deliverables:

- A memory-backed corpus implementation that supports the same ingestion and catalog APIs.
- A serialization option for snapshots so ephemeral corpora can be persisted when needed.
- Documentation that explains tradeoffs versus file-based corpora.

Acceptance checks:

- Behavior specifications cover ingestion, listing, and reindexing in memory.
- Retrieval and extraction can operate on the in-memory corpus without special casing.