Feature index
This document lists the major capabilities that exist today and points you to:
Behavior specifications in
features/*.featureDocumentation pages in
docs/Primary implementation modules in
src/biblicus/
The behavior specifications are the authoritative definition of behavior. The documentation is a narrative guide.
Corpus
What it does:
Creates a file based corpus with raw items and a rebuildable catalog.
Ingests local files, web addresses, and text notes.
Stores metadata in Markdown front matter or sidecar files.
Documentation:
docs/corpus.mddocs/corpus-design.md
Behavior specifications:
features/biblicus_corpus.featurefeatures/corpus_identity.featurefeatures/corpus_edge_cases.featurefeatures/corpus_purge.featurefeatures/ingest_sources.featurefeatures/source_helper_internal_branches.featurefeatures/corpus_internal_branches.feature
Primary implementation:
src/biblicus/corpus.pysrc/biblicus/sources.pysrc/biblicus/frontmatter.pysrc/biblicus/uris.py
Import and ignore rules
What it does:
Imports an existing folder tree while preserving relative paths.
Applies ignore rules from a
.biblicusignorefile.
Documentation:
docs/corpus.md
Behavior specifications:
features/import_tree.feature
Primary implementation:
src/biblicus/corpus.pysrc/biblicus/ignore.py
Streaming ingest
What it does:
Supports ingestion of large binary items from a stream without loading all bytes into memory.
Behavior specifications:
features/streaming_ingest.feature
Primary implementation:
src/biblicus/corpus.py
Lifecycle hooks
What it does:
Defines explicit hook points for ingestion and catalog rebuild.
Validates hook input and output models and records hook execution.
Documentation:
docs/corpus-design.md
Behavior specifications:
features/lifecycle_hooks.featurefeatures/hook_config_validation.featurefeatures/hook_error_handling.featurefeatures/python_hook_logging.featurefeatures/hook_logging_internal_branches.feature
Primary implementation:
src/biblicus/hooks.pysrc/biblicus/hook_manager.pysrc/biblicus/hook_logging.py
User configuration files
What it does:
Loads machine-specific configuration for optional integrations.
Supports home and local configuration file locations.
Documentation:
docs/user-configuration.md
Behavior specifications:
features/user_config.feature
Primary implementation:
src/biblicus/user_config.py
Text extraction stage
What it does:
Builds extraction snapshots as a separate pipeline stage.
Stores extracted text artifacts under the corpus so multiple extractors can coexist.
Supports an explicit extractor pipeline through the
pipelineextractor.Includes a Portable Document Format text extractor plugin.
Includes a speech to text extractor plugin for audio items.
Includes a selection extractor stage for choosing extracted text within a pipeline.
Includes a MarkItDown extractor plugin for document conversion.
Documentation:
docs/extraction.md
Behavior specifications:
features/text_extraction_snapshots.featurefeatures/extractor_pipeline.featurefeatures/extractor_validation.featurefeatures/extraction_selection.featurefeatures/extraction_selection_longest.featurefeatures/extraction_error_handling.featurefeatures/ocr_extractor.featurefeatures/stt_extractor.featurefeatures/unstructured_extractor.featurefeatures/markitdown_extractor.featurefeatures/integration_unstructured_extraction.feature
Primary implementation:
src/biblicus/extraction.pysrc/biblicus/extractors/
Extraction evaluation
What it does:
Evaluates extraction snapshots against labeled datasets.
Reports coverage, accuracy, and processable fraction metrics.
Documentation:
docs/extraction-evaluation.md
Behavior specifications:
features/extraction_evaluation.featurefeatures/extraction_evaluation_lab.feature
Primary implementation:
src/biblicus/extraction_evaluation.py
Graph extraction stage
What it does:
Builds graph snapshots from extracted text.
Writes graph nodes and edges to a Neo4j backend.
Supports deterministic graph identifiers for reproducible experiments.
Includes deterministic NLP baselines (NER entities, dependency relations).
Documentation:
docs/graph-extraction.md
Behavior specifications:
features/graph_extraction.featurefeatures/integration_graph_extraction.featurefeatures/graph_extraction_baselines.feature
Primary implementation:
src/biblicus/graph/src/biblicus/graph/neo4j.py
Retrieval backends
What it does:
Builds and queries retrieval snapshots.
Returns evidence as structured output.
Supports a minimal scan backend and a practical Sqlite full text search backend.
Documentation:
docs/backends.md
Behavior specifications:
features/retrieval_scan.featurefeatures/retrieval_sqlite_full_text_search.featurefeatures/retrieval_uses_extraction_snapshot.featurefeatures/retrieval_budget.featurefeatures/retrieval_utilities.featurefeatures/backend_validation.featurefeatures/embedding_index_internal_branches.featurefeatures/90_embedding_index_evidence_fallback.featurefeatures/91_tf_vector_internal_branches.feature
Primary implementation:
src/biblicus/retrieval.pysrc/biblicus/backends/
Evaluation
What it does:
Evaluates retrieval snapshots against datasets and budgets.
Documentation:
docs/retrieval-evaluation.md
Behavior specifications:
features/evaluation.featurefeatures/model_validation.featurefeatures/retrieval_evaluation_lab.feature
Primary implementation:
src/biblicus/evaluation.pysrc/biblicus/models.py
Context packs
What it does:
Builds context pack text from retrieval evidence using an explicit policy.
Fits a context pack to a token budget using an explicit tokenizer identifier.
Documentation:
docs/context-pack.md
Behavior specifications:
features/context_pack.featurefeatures/context_pack_policies.featurefeatures/token_budget.feature
Primary implementation:
src/biblicus/context.py
Context engine
What it does:
Assembles elastic, budget-aware prompt contexts from messages and packs.
Compacts or expands retriever packs based on policy.
Supports pagination via
offsetandlimitfor retriever expansion.
Documentation:
docs/context-engine.md
Behavior specifications:
features/context_engine_retrieve_context_pack.featurefeatures/context_engine_retrieval_internal_branches.featurefeatures/70_context_retriever.featurefeatures/71_context_compaction.featurefeatures/72_context_history_compaction.featurefeatures/73_context_nested_compaction.featurefeatures/74_context_regeneration.featurefeatures/75_context_default_regeneration.featurefeatures/76_context_pack_budget_weights.featurefeatures/77_context_default_pack_priority.featurefeatures/78_context_default_pack_weights.featurefeatures/79_context_nested_context_packs.featurefeatures/80_context_nested_pack_budget_cap.featurefeatures/81_context_nested_regeneration.featurefeatures/82_context_explicit_regeneration.featurefeatures/83_context_explicit_pack_priority.featurefeatures/84_context_explicit_pack_weights.featurefeatures/85_context_expansion.featurefeatures/86_context_engine_errors.featurefeatures/87_context_compactor_strategies.featurefeatures/88_context_engine_model_validation.featurefeatures/89_context_engine_internal_branches.feature
Primary implementation:
src/biblicus/context_engine/assembler.pysrc/biblicus/context_engine/models.pysrc/biblicus/context_engine/compaction.py
Knowledge base
What it does:
Provides a turnkey interface that accepts a folder and returns a ready-to-query workflow.
Applies sensible defaults for import, retrieval, and context pack shaping.
Behavior specifications:
features/knowledge_base.feature
Primary implementation:
src/biblicus/knowledge_base.py
Text utilities
What it does:
Provides reusable utilities that edit a virtual in-memory text file using tool calls.
Supports extraction, slicing, annotation, redaction, and linking with consistent validation.
Keeps prompts small and focused for reliability with small models.
Documentation:
docs/text-utilities.mddocs/text-extract.mddocs/text-slice.mddocs/text-annotate.mddocs/text-redact.mddocs/text-link.md
Behavior specifications:
features/text_extract.featurefeatures/text_slice.featurefeatures/text_annotate.featurefeatures/text_redact.featurefeatures/text_link.featurefeatures/text_utilities.featurefeatures/integration_text_extract.featurefeatures/integration_text_slice.featurefeatures/integration_text_annotate.featurefeatures/integration_text_redact.featurefeatures/integration_text_link.feature
Primary implementation:
src/biblicus/text/
Text extract
What it does:
Inserts XML span tags into long texts using a virtual file edit loop.
Produces ordered spans without re-emitting the full document.
Validates that only tags were inserted.
Documentation:
docs/text-extract.md
Behavior specifications:
features/text_extract.featurefeatures/integration_text_extract.feature
Primary implementation:
src/biblicus/text/extract.pysrc/biblicus/text/models.py
Text slice
What it does:
Inserts
<slice/>markers into long texts using a virtual file edit loop.Produces ordered slices without re-emitting the full document.
Validates that only slice markers were inserted.
Documentation:
docs/text-slice.md
Behavior specifications:
features/text_slice.featurefeatures/integration_text_slice.feature
Primary implementation:
src/biblicus/text/slice.pysrc/biblicus/text/models.py
Text annotate
What it does:
Inserts XML span tags with attributes into long texts using a virtual file edit loop.
Produces ordered spans with attributes without re-emitting the full document.
Validates attribute allow lists and tag structure.
Documentation:
docs/text-annotate.md
Behavior specifications:
features/text_annotate.featurefeatures/integration_text_annotate.feature
Primary implementation:
src/biblicus/text/annotate.pysrc/biblicus/text/models.py
Text redact
What it does:
Inserts XML span tags around redacted text using a virtual file edit loop.
Supports optional redaction types via a redact attribute.
Validates that only tags were inserted.
Documentation:
docs/text-redact.md
Behavior specifications:
features/text_redact.featurefeatures/integration_text_redact.feature
Primary implementation:
src/biblicus/text/redact.pysrc/biblicus/text/models.py
Text link
What it does:
Inserts id/ref span tags to connect repeated mentions.
Produces ordered linked spans without re-emitting the full document.
Validates id prefix and reference ordering.
Documentation:
docs/text-link.md
Behavior specifications:
features/text_link.featurefeatures/integration_text_link.feature
Primary implementation:
src/biblicus/text/link.pysrc/biblicus/text/models.py
Testing, coverage, and documentation build
What it does:
Runs behavior specifications under coverage and emits an Hypertext Markup Language coverage report.
Builds Sphinx documentation from docstrings and documentation pages.
Documentation:
docs/testing.md
Primary implementation:
scripts/test.pydocs/conf.py.github/workflows/ci.yml
Integration corpora
What it does:
Downloads small public datasets at runtime for integration scenarios.
Behavior specifications:
features/integration_wikipedia.featurefeatures/integration_pdf_samples.featurefeatures/integration_mixed_corpus.featurefeatures/integration_mixed_extraction.featurefeatures/integration_pdf_retrieval.featurefeatures/integration_audio_samples.feature
Integration scripts:
scripts/download_wikipedia.pyscripts/download_pdf_samples.pyscripts/download_mixed_samples.pyscripts/download_audio_samples.py