# Text link
Text link is a reusable utility for connecting repeated mentions (coreference resolution) without re-emitting the text.
If you ask a model to "return a list of all entity mentions and their canonical IDs," you face the same hallucination and cost issues as other extraction tasks.
Text link uses the **virtual file pattern** to handle this in-place. Biblicus asks the model to wrap mentions in XML tags with ID/REF attributes (e.g., `...` and `...`). The model returns a small edit script, and Biblicus parses it into a graph of connected spans. This lets you resolve entities and structure relationships without regenerating the content.
## How text link works
1) Biblicus loads the full text into memory.
2) The model receives the text and returns an **edit script** with str_replace operations.
3) Biblicus applies the operations and validates id/ref rules.
4) The marked-up string is parsed into ordered **linked spans**.
### Mechanism example
Biblicus supplies an internal protocol that defines the edit protocol and embeds the current text:
**Internal protocol (excerpt):**
```
You are a virtual file editor. Use the available tools to edit the text.
Interpret the word "return" in the user's request as: wrap the returned text with
... in-place in the current text.
Each span must include exactly one attribute: id for first mentions and ref for repeats.
Id values must start with "link_".
Current text:
---
Acme launched a product. Later, Acme reported results.
---
```
Then provide a short user prompt describing what to return:
**User prompt:**
```
Link repeated mentions of the same company to the first mention.
```
The input text is the same content embedded in the internal protocol:
**Input text:**
```
Acme launched a product. Later, Acme reported results.
```
The model edits the virtual file by inserting tags in-place:
**Marked-up text:**
```
Acme launched a product. Later, Acme reported results.
```
Biblicus returns structured data parsed from the markup:
**Structured data (result):**
```
{
"marked_up_text": "Acme launched a product. Later, Acme reported results.",
"spans": [
{
"index": 1,
"start_char": 0,
"end_char": 25,
"text": "Acme launched a product",
"attributes": {"id": "link_1"}
},
{
"index": 2,
"start_char": 33,
"end_char": 53,
"text": "Acme reported results",
"attributes": {"ref": "link_1"}
}
],
"warnings": []
}
```
## Data model
Text link uses Pydantic models for strict validation:
- `TextLinkRequest`: input text + LLM config + prompt template + id prefix.
- `TextLinkResult`: marked-up text and linked spans.
Internal protocol templates (advanced overrides) must include `{text}`. Prompt templates must not include `{text}` and
should only describe what to return. The internal protocol template can interpolate the id prefix via Jinja2.
Most callers only supply the user prompt and text. Override `system_prompt` only when you need to customize the edit
protocol.
## Output contract
Text link is tool-driven. The model must use tool calls instead of returning JSON in the assistant message.
Tool call arguments:
```
str_replace(old_str="Acme launched a product", new_str="Acme launched a product")
str_replace(old_str="Acme reported results", new_str="Acme reported results")
done()
```
Rules:
- Use the str_replace tool only.
- Each old_str must match exactly once.
- Each new_str must be the same text with span tags inserted.
- Use id for first mentions and ref for repeats.
- Id values must start with the configured prefix.
- Id/ref spans must wrap the same repeated text (avoid wrapping extra words).
Long-span handling: the system prompt instructs the model to insert `` and `` in separate `str_replace` calls for long passages (single-call insertion is allowed for short spans). This is covered by unit tests in `tests/test_text_utility_tool_calls.py`.
## Example: Python API
```python
from biblicus.ai.models import AiProvider, LlmClientConfig
from biblicus.text import TextLinkRequest, apply_text_link
request = TextLinkRequest(
text="Acme launched a product. Later, Acme reported results.",
client=LlmClientConfig(provider=AiProvider.OPENAI, model="gpt-4o-mini"),
prompt_template="Link repeated mentions of the same company to the first mention.",
id_prefix="link_",
)
result = apply_text_link(request)
```