Feature index

This document lists the major capabilities that exist today and points you to:

  • Behavior specifications in features/*.feature

  • Documentation pages in docs/

  • Primary implementation modules in src/biblicus/

The behavior specifications are the authoritative definition of behavior. The documentation is a narrative guide.

Corpus

What it does:

  • Creates a file based corpus with raw items and a rebuildable catalog.

  • Ingests local files, web addresses, and text notes.

  • Stores metadata in Markdown front matter or sidecar files.

Documentation:

  • docs/corpus.md

  • docs/corpus-design.md

Behavior specifications:

  • features/biblicus_corpus.feature

  • features/corpus_identity.feature

  • features/corpus_edge_cases.feature

  • features/corpus_purge.feature

  • features/ingest_sources.feature

  • features/source_helper_internal_branches.feature

  • features/corpus_internal_branches.feature

Primary implementation:

  • src/biblicus/corpus.py

  • src/biblicus/sources.py

  • src/biblicus/frontmatter.py

  • src/biblicus/uris.py

Import and ignore rules

What it does:

  • Imports an existing folder tree while preserving relative paths.

  • Applies ignore rules from a .biblicusignore file.

Documentation:

  • docs/corpus.md

Behavior specifications:

  • features/import_tree.feature

Primary implementation:

  • src/biblicus/corpus.py

  • src/biblicus/ignore.py

Streaming ingest

What it does:

  • Supports ingestion of large binary items from a stream without loading all bytes into memory.

Behavior specifications:

  • features/streaming_ingest.feature

Primary implementation:

  • src/biblicus/corpus.py

Lifecycle hooks

What it does:

  • Defines explicit hook points for ingestion and catalog rebuild.

  • Validates hook input and output models and records hook execution.

Documentation:

  • docs/corpus-design.md

Behavior specifications:

  • features/lifecycle_hooks.feature

  • features/hook_config_validation.feature

  • features/hook_error_handling.feature

  • features/python_hook_logging.feature

  • features/hook_logging_internal_branches.feature

Primary implementation:

  • src/biblicus/hooks.py

  • src/biblicus/hook_manager.py

  • src/biblicus/hook_logging.py

User configuration files

What it does:

  • Loads machine-specific configuration for optional integrations.

  • Supports home and local configuration file locations.

Documentation:

  • docs/user-configuration.md

Behavior specifications:

  • features/user_config.feature

Primary implementation:

  • src/biblicus/user_config.py

Text extraction stage

What it does:

  • Builds extraction snapshots as a separate pipeline stage.

  • Stores extracted text artifacts under the corpus so multiple extractors can coexist.

  • Supports an explicit extractor pipeline through the pipeline extractor.

  • Includes a Portable Document Format text extractor plugin.

  • Includes a speech to text extractor plugin for audio items.

  • Includes a selection extractor stage for choosing extracted text within a pipeline.

  • Includes a MarkItDown extractor plugin for document conversion.

Documentation:

  • docs/extraction.md

Behavior specifications:

  • features/text_extraction_snapshots.feature

  • features/extractor_pipeline.feature

  • features/extractor_validation.feature

  • features/extraction_selection.feature

  • features/extraction_selection_longest.feature

  • features/extraction_error_handling.feature

  • features/ocr_extractor.feature

  • features/stt_extractor.feature

  • features/unstructured_extractor.feature

  • features/markitdown_extractor.feature

  • features/integration_unstructured_extraction.feature

Primary implementation:

  • src/biblicus/extraction.py

  • src/biblicus/extractors/

Extraction evaluation

What it does:

  • Evaluates extraction snapshots against labeled datasets.

  • Reports coverage, accuracy, and processable fraction metrics.

Documentation:

  • docs/extraction-evaluation.md

Behavior specifications:

  • features/extraction_evaluation.feature

  • features/extraction_evaluation_lab.feature

Primary implementation:

  • src/biblicus/extraction_evaluation.py

Graph extraction stage

What it does:

  • Builds graph snapshots from extracted text.

  • Writes graph nodes and edges to a Neo4j backend.

  • Supports deterministic graph identifiers for reproducible experiments.

  • Includes deterministic NLP baselines (NER entities, dependency relations).

Documentation:

  • docs/graph-extraction.md

Behavior specifications:

  • features/graph_extraction.feature

  • features/integration_graph_extraction.feature

  • features/graph_extraction_baselines.feature

Primary implementation:

  • src/biblicus/graph/

  • src/biblicus/graph/neo4j.py

Retrieval backends

What it does:

  • Builds and queries retrieval snapshots.

  • Returns evidence as structured output.

  • Supports a minimal scan backend and a practical Sqlite full text search backend.

Documentation:

  • docs/backends.md

Behavior specifications:

  • features/retrieval_scan.feature

  • features/retrieval_sqlite_full_text_search.feature

  • features/retrieval_uses_extraction_snapshot.feature

  • features/retrieval_budget.feature

  • features/retrieval_utilities.feature

  • features/backend_validation.feature

  • features/embedding_index_internal_branches.feature

  • features/90_embedding_index_evidence_fallback.feature

  • features/91_tf_vector_internal_branches.feature

Primary implementation:

  • src/biblicus/retrieval.py

  • src/biblicus/backends/

Evaluation

What it does:

  • Evaluates retrieval snapshots against datasets and budgets.

Documentation:

  • docs/retrieval-evaluation.md

Behavior specifications:

  • features/evaluation.feature

  • features/model_validation.feature

  • features/retrieval_evaluation_lab.feature

Primary implementation:

  • src/biblicus/evaluation.py

  • src/biblicus/models.py

Context packs

What it does:

  • Builds context pack text from retrieval evidence using an explicit policy.

  • Fits a context pack to a token budget using an explicit tokenizer identifier.

Documentation:

  • docs/context-pack.md

Behavior specifications:

  • features/context_pack.feature

  • features/context_pack_policies.feature

  • features/token_budget.feature

Primary implementation:

  • src/biblicus/context.py

Context engine

What it does:

  • Assembles elastic, budget-aware prompt contexts from messages and packs.

  • Compacts or expands retriever packs based on policy.

  • Supports pagination via offset and limit for retriever expansion.

Documentation:

  • docs/context-engine.md

Behavior specifications:

  • features/context_engine_retrieve_context_pack.feature

  • features/context_engine_retrieval_internal_branches.feature

  • features/70_context_retriever.feature

  • features/71_context_compaction.feature

  • features/72_context_history_compaction.feature

  • features/73_context_nested_compaction.feature

  • features/74_context_regeneration.feature

  • features/75_context_default_regeneration.feature

  • features/76_context_pack_budget_weights.feature

  • features/77_context_default_pack_priority.feature

  • features/78_context_default_pack_weights.feature

  • features/79_context_nested_context_packs.feature

  • features/80_context_nested_pack_budget_cap.feature

  • features/81_context_nested_regeneration.feature

  • features/82_context_explicit_regeneration.feature

  • features/83_context_explicit_pack_priority.feature

  • features/84_context_explicit_pack_weights.feature

  • features/85_context_expansion.feature

  • features/86_context_engine_errors.feature

  • features/87_context_compactor_strategies.feature

  • features/88_context_engine_model_validation.feature

  • features/89_context_engine_internal_branches.feature

Primary implementation:

  • src/biblicus/context_engine/assembler.py

  • src/biblicus/context_engine/models.py

  • src/biblicus/context_engine/compaction.py

Knowledge base

What it does:

  • Provides a turnkey interface that accepts a folder and returns a ready-to-query workflow.

  • Applies sensible defaults for import, retrieval, and context pack shaping.

Behavior specifications:

  • features/knowledge_base.feature

Primary implementation:

  • src/biblicus/knowledge_base.py

Text utilities

What it does:

  • Provides reusable utilities that edit a virtual in-memory text file using tool calls.

  • Supports extraction, slicing, annotation, redaction, and linking with consistent validation.

  • Keeps prompts small and focused for reliability with small models.

Documentation:

  • docs/text-utilities.md

  • docs/text-extract.md

  • docs/text-slice.md

  • docs/text-annotate.md

  • docs/text-redact.md

  • docs/text-link.md

Behavior specifications:

  • features/text_extract.feature

  • features/text_slice.feature

  • features/text_annotate.feature

  • features/text_redact.feature

  • features/text_link.feature

  • features/text_utilities.feature

  • features/integration_text_extract.feature

  • features/integration_text_slice.feature

  • features/integration_text_annotate.feature

  • features/integration_text_redact.feature

  • features/integration_text_link.feature

Primary implementation:

  • src/biblicus/text/

Text extract

What it does:

  • Inserts XML span tags into long texts using a virtual file edit loop.

  • Produces ordered spans without re-emitting the full document.

  • Validates that only tags were inserted.

Documentation:

  • docs/text-extract.md

Behavior specifications:

  • features/text_extract.feature

  • features/integration_text_extract.feature

Primary implementation:

  • src/biblicus/text/extract.py

  • src/biblicus/text/models.py

Text slice

What it does:

  • Inserts <slice/> markers into long texts using a virtual file edit loop.

  • Produces ordered slices without re-emitting the full document.

  • Validates that only slice markers were inserted.

Documentation:

  • docs/text-slice.md

Behavior specifications:

  • features/text_slice.feature

  • features/integration_text_slice.feature

Primary implementation:

  • src/biblicus/text/slice.py

  • src/biblicus/text/models.py

Text annotate

What it does:

  • Inserts XML span tags with attributes into long texts using a virtual file edit loop.

  • Produces ordered spans with attributes without re-emitting the full document.

  • Validates attribute allow lists and tag structure.

Documentation:

  • docs/text-annotate.md

Behavior specifications:

  • features/text_annotate.feature

  • features/integration_text_annotate.feature

Primary implementation:

  • src/biblicus/text/annotate.py

  • src/biblicus/text/models.py

Text redact

What it does:

  • Inserts XML span tags around redacted text using a virtual file edit loop.

  • Supports optional redaction types via a redact attribute.

  • Validates that only tags were inserted.

Documentation:

  • docs/text-redact.md

Behavior specifications:

  • features/text_redact.feature

  • features/integration_text_redact.feature

Primary implementation:

  • src/biblicus/text/redact.py

  • src/biblicus/text/models.py

Testing, coverage, and documentation build

What it does:

  • Runs behavior specifications under coverage and emits an Hypertext Markup Language coverage report.

  • Builds Sphinx documentation from docstrings and documentation pages.

Documentation:

  • docs/testing.md

Primary implementation:

  • scripts/test.py

  • docs/conf.py

  • .github/workflows/ci.yml

Integration corpora

What it does:

  • Downloads small public datasets at runtime for integration scenarios.

Behavior specifications:

  • features/integration_wikipedia.feature

  • features/integration_pdf_samples.feature

  • features/integration_mixed_corpus.feature

  • features/integration_mixed_extraction.feature

  • features/integration_pdf_retrieval.feature

  • features/integration_audio_samples.feature

Integration scripts:

  • scripts/download_wikipedia.py

  • scripts/download_pdf_samples.py

  • scripts/download_mixed_samples.py

  • scripts/download_audio_samples.py