# TF Vector backend

The TF Vector backend implements a deterministic vector space model baseline using term-frequency vectors and cosine
similarity. It builds no persistent index and scores items at query time. This makes it useful as a lightweight
“vector-style” baseline without dense embeddings or external services.

## When to use it

- You want a minimal baseline to compare against lexical search.
- You want deterministic, inspectable similarity scoring.
- You are teaching retrieval concepts and want a small, runnable backend.

## Backend ID

`tf-vector`

## How it works

1) Tokenize the query and each item into lowercase word tokens.
2) Build term-frequency vectors.
3) Compute cosine similarity between the query vector and each item vector.
4) Return evidence ranked by similarity score.

## Configuration

The backend accepts these configuration fields:

- `snippet_characters`: maximum characters to include in evidence snippets.
- `extraction_snapshot`: optional extraction snapshot reference (`extractor_id:snapshot_id`).

Example configuration:

```yaml
snippet_characters: 320
extraction_snapshot: pipeline:RUN_ID
```

## Build a run

```
python -m biblicus build --corpus corpora/example --backend tf-vector --config extraction_snapshot=pipeline:RUN_ID
```

This backend does not create artifacts beyond the snapshot manifest.

## Query a run

```
python -m biblicus query --corpus corpora/example --run tf-vector:RUN_ID --query "semantic match"
```

The evidence results include a `stage` value of `tf-vector` and similarity scores for each match.

## What it is not

- This backend does not compute dense embeddings.
- It does not use approximate nearest neighbor indexing.
- It does not depend on external services.