Roadmap

This document describes what we plan to build next.

If you are looking for runnable examples, see docs/demos.md.

If you are looking for what already exists, start with:

  • docs/feature-index.md for a map of features to behavior specifications and modules.

  • CHANGELOG.md for released changes.

Principles

  • Behavior specifications are the authoritative definition of behavior.

  • Every behavior that exists is specified.

  • Validation and documentation are part of the product.

  • Raw corpus items remain readable, portable files.

  • Derived artifacts are stored under the corpus and can coexist for multiple implementations.

Completed foundations

These are the capability slices that already exist and have end-to-end behavior specifications.

Retrieval evaluation and datasets

  • Dataset authoring workflow for small hand-labeled sets and larger synthetic sets.

  • Evaluation reports with per-query diagnostics and summary metrics.

  • Versioned dataset formats and deterministic reports for stable inputs.

Retrieval quality upgrades

  • Tuned lexical baseline with BM25, n-gram range controls, and stop word policies.

  • Reranking stage for top-N candidates with explicit stage metadata.

  • Hybrid retrieval with explicit fusion weights and stage-level scores.

Context pack policy surfaces

  • Policy variants for formatting, ordering, and metadata inclusion.

  • Token and character budget strategies with explicit selectors.

  • Documentation and examples that show how policy choices change outputs.

Extraction evaluation harness

This evaluation harness compares extraction approaches in a way that is measurable, repeatable, and useful for practical engineering decisions.

Corpus analysis tools

Lightweight analysis utilities summarize corpus themes and guide curation:

  • Basic corpus profiling with deterministic metrics for raw items and extracted text.

  • Topic modeling with BERTopic and optional LLM-assisted labeling.

  • Side-by-side analysis outputs stored under the corpus for reproducible comparison.

Sequence analysis (Markov analysis)

Goal: provide a sequence-oriented analysis backend for corpora where order matters (conversations, timelines, logs).

Deliverables:

  • Markov analysis for sequence-driven corpora (including hidden Markov models where appropriate).

  • A report format that explains state transitions and emissions with evidence.

  • Evaluation guidance for comparing HMM outputs across corpora or snapshots.

Acceptance checks:

  • HMM analysis is reproducible for the same corpus state and extraction snapshot.

  • Reports are exportable and readable without custom tooling.

Text utilities

Small, reusable building blocks for transforming text in ways that are hard to do reliably with one-shot generation.

Deliverables:

  • Text extraction and slicing utilities that operate via a virtual file editing tool loop.

  • Optional higher-level utilities built on the same pattern (annotation, linking, redaction).

  • Documentation and runnable demos that show the mechanism and how to use each utility.

Acceptance checks:

  • Utilities have end-to-end behavior specifications and are fully covered by tests.

  • Integration tests can be run against real model APIs when configured.

Next: Tactus integration

Goal: make Biblicus usable from durable agent workflows without baking assistant logic into Biblicus itself.

Deliverables:

  • A Model Context Protocol (MCP) toolset surface for Biblicus (ingest, query, stats, and evidence retrieval).

  • Clear dependency wiring for secrets and network access (in-sandbox vs brokered).

  • One reference procedure demonstrating retrieval-augmented generation built on Biblicus evidence outputs.

Acceptance checks:

  • Tools expose evidence-first outputs with stable schemas.

  • Procedures remain in control of prompting and context budgeting policy.

Later: alternate backends and hosting modes

Goal: broaden the backend surface while keeping the core predictable.

Deliverables:

  • A second backend with different performance tradeoffs.

  • A tool server that exposes a backend through a stable interface.

  • Documentation that shows how to run a backend out of process.

Acceptance checks:

  • Local tests remain fast and deterministic.

  • Integration tests validate retrieval through the tool boundary.

Deferred: corpus and extraction work

These are valuable, but intentionally not the near-term focus while retrieval becomes practical end to end.

In-memory corpus for ephemeral workflows

Goal: allow programmatic, temporary corpora that live in memory for short-lived agents or tests.

Deliverables:

  • A memory-backed corpus implementation that supports the same ingestion and catalog APIs.

  • A serialization option for snapshots so ephemeral corpora can be persisted when needed.

  • Documentation that explains tradeoffs versus file-based corpora.

Acceptance checks:

  • Behavior specifications cover ingestion, listing, and reindexing in memory.

  • Retrieval and extraction can operate on the in-memory corpus without special casing.