# Feature index

This document lists the major capabilities that exist today and points you to:

- Behavior specifications in `features/*.feature`
- Documentation pages in `docs/`
- Primary implementation modules in `src/biblicus/`

The behavior specifications are the authoritative definition of behavior. The documentation is a narrative guide.

## Corpus

What it does:

- Creates a file based corpus with raw items and a rebuildable catalog.
- Ingests local files, web addresses, and text notes.
- Stores metadata in Markdown front matter or sidecar files.

Documentation:

- `docs/corpus.md`
- `docs/corpus-design.md`

Behavior specifications:

- `features/biblicus_corpus.feature`
- `features/corpus_identity.feature`
- `features/corpus_edge_cases.feature`
- `features/corpus_purge.feature`
- `features/ingest_sources.feature`
- `features/source_helper_internal_branches.feature`
- `features/corpus_internal_branches.feature`

Primary implementation:

- `src/biblicus/corpus.py`
- `src/biblicus/sources.py`
- `src/biblicus/frontmatter.py`
- `src/biblicus/uris.py`

## Import and ignore rules

What it does:

- Imports an existing folder tree while preserving relative paths.
- Applies ignore rules from a `.biblicusignore` file.

Documentation:

- `docs/corpus.md`

Behavior specifications:

- `features/import_tree.feature`

Primary implementation:

- `src/biblicus/corpus.py`
- `src/biblicus/ignore.py`

## Streaming ingest

What it does:

- Supports ingestion of large binary items from a stream without loading all bytes into memory.

Behavior specifications:

- `features/streaming_ingest.feature`

Primary implementation:

- `src/biblicus/corpus.py`

## Lifecycle hooks

What it does:

- Defines explicit hook points for ingestion and catalog rebuild.
- Validates hook input and output models and records hook execution.

Documentation:

- `docs/corpus-design.md`

Behavior specifications:

- `features/lifecycle_hooks.feature`
- `features/hook_config_validation.feature`
- `features/hook_error_handling.feature`
- `features/python_hook_logging.feature`
- `features/hook_logging_internal_branches.feature`

Primary implementation:

- `src/biblicus/hooks.py`
- `src/biblicus/hook_manager.py`
- `src/biblicus/hook_logging.py`

## User configuration files

What it does:

- Loads machine-specific configuration for optional integrations.
- Supports home and local configuration file locations.

Documentation:

- `docs/user-configuration.md`

Behavior specifications:

- `features/user_config.feature`

Primary implementation:

- `src/biblicus/user_config.py`

## Text extraction stage

What it does:

- Builds extraction snapshots as a separate pipeline stage.
- Stores extracted text artifacts under the corpus so multiple extractors can coexist.
- Supports an explicit extractor pipeline through the `pipeline` extractor.
- Includes a Portable Document Format text extractor plugin.
- Includes a speech to text extractor plugin for audio items.
- Includes a selection extractor stage for choosing extracted text within a pipeline.
- Includes a MarkItDown extractor plugin for document conversion.

Documentation:

- `docs/extraction.md`

Behavior specifications:

- `features/text_extraction_snapshots.feature`
- `features/extractor_pipeline.feature`
- `features/extractor_validation.feature`
- `features/extraction_selection.feature`
- `features/extraction_selection_longest.feature`
- `features/extraction_error_handling.feature`
- `features/ocr_extractor.feature`
- `features/stt_extractor.feature`
- `features/unstructured_extractor.feature`
- `features/markitdown_extractor.feature`
- `features/integration_unstructured_extraction.feature`

Primary implementation:

- `src/biblicus/extraction.py`
- `src/biblicus/extractors/`

## Extraction evaluation

What it does:

- Evaluates extraction snapshots against labeled datasets.
- Reports coverage, accuracy, and processable fraction metrics.

Documentation:

- `docs/extraction-evaluation.md`

Behavior specifications:

- `features/extraction_evaluation.feature`
- `features/extraction_evaluation_lab.feature`

Primary implementation:

- `src/biblicus/extraction_evaluation.py`

## Graph extraction stage

What it does:

- Builds graph snapshots from extracted text.
- Writes graph nodes and edges to a Neo4j backend.
- Supports deterministic graph identifiers for reproducible experiments.
- Includes deterministic NLP baselines (NER entities, dependency relations).

Documentation:

- `docs/graph-extraction.md`

Behavior specifications:

- `features/graph_extraction.feature`
- `features/integration_graph_extraction.feature`
- `features/graph_extraction_baselines.feature`

Primary implementation:

- `src/biblicus/graph/`
- `src/biblicus/graph/neo4j.py`

## Retrieval backends

What it does:

- Builds and queries retrieval snapshots.
- Returns evidence as structured output.
- Supports a minimal scan backend and a practical Sqlite full text search backend.

Documentation:

- `docs/backends.md`

Behavior specifications:

- `features/retrieval_scan.feature`
- `features/retrieval_sqlite_full_text_search.feature`
- `features/retrieval_uses_extraction_snapshot.feature`
- `features/retrieval_budget.feature`
- `features/retrieval_utilities.feature`
- `features/backend_validation.feature`
- `features/embedding_index_internal_branches.feature`
- `features/90_embedding_index_evidence_fallback.feature`
- `features/91_tf_vector_internal_branches.feature`

Primary implementation:

- `src/biblicus/retrieval.py`
- `src/biblicus/backends/`

## Evaluation

What it does:

- Evaluates retrieval snapshots against datasets and budgets.

Documentation:

- `docs/retrieval-evaluation.md`

Behavior specifications:

- `features/evaluation.feature`
- `features/model_validation.feature`
- `features/retrieval_evaluation_lab.feature`

Primary implementation:

- `src/biblicus/evaluation.py`
- `src/biblicus/models.py`

## Context packs

What it does:

- Builds context pack text from retrieval evidence using an explicit policy.
- Fits a context pack to a token budget using an explicit tokenizer identifier.

Documentation:

- `docs/context-pack.md`

Behavior specifications:

- `features/context_pack.feature`
- `features/context_pack_policies.feature`
- `features/token_budget.feature`

Primary implementation:

- `src/biblicus/context.py`

## Context engine

What it does:

- Assembles elastic, budget-aware prompt contexts from messages and packs.
- Compacts or expands retriever packs based on policy.
- Supports pagination via `offset` and `limit` for retriever expansion.

Documentation:

- `docs/context-engine.md`

Behavior specifications:

- `features/context_engine_retrieve_context_pack.feature`
- `features/context_engine_retrieval_internal_branches.feature`
- `features/70_context_retriever.feature`
- `features/71_context_compaction.feature`
- `features/72_context_history_compaction.feature`
- `features/73_context_nested_compaction.feature`
- `features/74_context_regeneration.feature`
- `features/75_context_default_regeneration.feature`
- `features/76_context_pack_budget_weights.feature`
- `features/77_context_default_pack_priority.feature`
- `features/78_context_default_pack_weights.feature`
- `features/79_context_nested_context_packs.feature`
- `features/80_context_nested_pack_budget_cap.feature`
- `features/81_context_nested_regeneration.feature`
- `features/82_context_explicit_regeneration.feature`
- `features/83_context_explicit_pack_priority.feature`
- `features/84_context_explicit_pack_weights.feature`
- `features/85_context_expansion.feature`
- `features/86_context_engine_errors.feature`
- `features/87_context_compactor_strategies.feature`
- `features/88_context_engine_model_validation.feature`
- `features/89_context_engine_internal_branches.feature`

Primary implementation:

- `src/biblicus/context_engine/assembler.py`
- `src/biblicus/context_engine/models.py`
- `src/biblicus/context_engine/compaction.py`

## Knowledge base

What it does:

- Provides a turnkey interface that accepts a folder and returns a ready-to-query workflow.
- Applies sensible defaults for import, retrieval, and context pack shaping.

Behavior specifications:

- `features/knowledge_base.feature`

Primary implementation:

- `src/biblicus/knowledge_base.py`

## Text utilities

What it does:

- Provides reusable utilities that edit a virtual in-memory text file using tool calls.
- Supports extraction, slicing, annotation, redaction, and linking with consistent validation.
- Keeps prompts small and focused for reliability with small models.

Documentation:

- `docs/text-utilities.md`
- `docs/text-extract.md`
- `docs/text-slice.md`
- `docs/text-annotate.md`
- `docs/text-redact.md`
- `docs/text-link.md`

Behavior specifications:

- `features/text_extract.feature`
- `features/text_slice.feature`
- `features/text_annotate.feature`
- `features/text_redact.feature`
- `features/text_link.feature`
- `features/text_utilities.feature`
- `features/integration_text_extract.feature`
- `features/integration_text_slice.feature`
- `features/integration_text_annotate.feature`
- `features/integration_text_redact.feature`
- `features/integration_text_link.feature`

Primary implementation:

- `src/biblicus/text/`

## Text extract

What it does:

- Inserts XML span tags into long texts using a virtual file edit loop.
- Produces ordered spans without re-emitting the full document.
- Validates that only tags were inserted.

Documentation:

- `docs/text-extract.md`

Behavior specifications:

- `features/text_extract.feature`
- `features/integration_text_extract.feature`

Primary implementation:

- `src/biblicus/text/extract.py`
- `src/biblicus/text/models.py`

## Text slice

What it does:

- Inserts `<slice/>` markers into long texts using a virtual file edit loop.
- Produces ordered slices without re-emitting the full document.
- Validates that only slice markers were inserted.

Documentation:

- `docs/text-slice.md`

Behavior specifications:

- `features/text_slice.feature`
- `features/integration_text_slice.feature`

Primary implementation:

- `src/biblicus/text/slice.py`
- `src/biblicus/text/models.py`

## Text annotate

What it does:

- Inserts XML span tags with attributes into long texts using a virtual file edit loop.
- Produces ordered spans with attributes without re-emitting the full document.
- Validates attribute allow lists and tag structure.

Documentation:

- `docs/text-annotate.md`

Behavior specifications:

- `features/text_annotate.feature`
- `features/integration_text_annotate.feature`

Primary implementation:

- `src/biblicus/text/annotate.py`
- `src/biblicus/text/models.py`

## Text redact

What it does:

- Inserts XML span tags around redacted text using a virtual file edit loop.
- Supports optional redaction types via a redact attribute.
- Validates that only tags were inserted.

Documentation:

- `docs/text-redact.md`

Behavior specifications:

- `features/text_redact.feature`
- `features/integration_text_redact.feature`

Primary implementation:

- `src/biblicus/text/redact.py`
- `src/biblicus/text/models.py`

## Text link

What it does:

- Inserts id/ref span tags to connect repeated mentions.
- Produces ordered linked spans without re-emitting the full document.
- Validates id prefix and reference ordering.

Documentation:

- `docs/text-link.md`

Behavior specifications:

- `features/text_link.feature`
- `features/integration_text_link.feature`

Primary implementation:

- `src/biblicus/text/link.py`
- `src/biblicus/text/models.py`

## Testing, coverage, and documentation build

What it does:

- Runs behavior specifications under coverage and emits an Hypertext Markup Language coverage report.
- Builds Sphinx documentation from docstrings and documentation pages.

Documentation:

- `docs/testing.md`

Primary implementation:

- `scripts/test.py`
- `docs/conf.py`
- `.github/workflows/ci.yml`

## Integration corpora

What it does:

- Downloads small public datasets at runtime for integration scenarios.

Behavior specifications:

- `features/integration_wikipedia.feature`
- `features/integration_pdf_samples.feature`
- `features/integration_mixed_corpus.feature`
- `features/integration_mixed_extraction.feature`
- `features/integration_pdf_retrieval.feature`
- `features/integration_audio_samples.feature`

Integration scripts:

- `scripts/download_wikipedia.py`
- `scripts/download_pdf_samples.py`
- `scripts/download_mixed_samples.py`
- `scripts/download_audio_samples.py`