10  The Agent Loop

Most useful agents aren’t single-shot. They plan, act, check results, and sometimes need more turns. This chapter teaches multi-turn patterns without turning your code into a brittle “agent script.”

In the last chapter, we made our workflow safe to integrate with the outside world: state, stages, and idempotency guards mean “retry” doesn’t imply “double-send.” Now we’ll address the other reason real workflows need multiple turns: quality.

The goal is not “make the model try again until it looks good.” The goal is:

10.1 When You Need a Loop (and When You Don’t)

You need a loop when:

  • The agent must iteratively improve an artifact (drafts, plans, code).
  • You have tool-using behavior and need multiple turns.
  • You have checks that can fail (validation, policy checks, completeness rules).

You don’t need a loop when:

  • A single, typed output is enough (especially with structured output + validation).
  • The work is deterministic and can be done with tools directly.

In Part II, we’ll use a loop for a very common reason: the first draft is often not the final draft.

10.2 The Safe Loop Shape

The safe loop has three properties:

  1. A hard cap (max_turns, max_attempts)
  2. An explicit success condition (“we’re done when X is true”)
  3. A deterministic exit (either success or a clear failure)

In Lua, that usually looks like:

local max_attempts = 3
local attempt = 0

while attempt < max_attempts do
  attempt = attempt + 1

  -- take a turn (agent call, tool call, etc.)

  if success then
    break
  end
end

assert(success, "Failed to produce an acceptable result after " .. max_attempts .. " attempts")

10.3 Agent-level retry (built-in)

There’s another kind of retry that’s worth separating from your procedure loop: retrying a single agent turn.

Use agent-level retry for:

  • transient provider / transport failures
  • structured output validation failures (wrong shape, missing required fields)

Configure it directly on the agent:

worker = Agent {
  name = "worker",
  model = "openai/gpt-4o-mini",
  system_prompt = "...",
  retry = {
    enabled = true,
    attempts = 3,
    delay_seconds = 0.5,
    backoff = "exponential",
    on = "infra_plus_validation",
  },
}

This retry policy is bounded (attempts) and local to that call site: it retries the agent’s turn and rolls back the agent’s message history to the start of the attempted turn on each retry.

10.4 Three Retry Layers

By this point, Tactus gives you three different places where “retry” can happen:

  1. Standard-library retries Helpers like classification/extraction may expose max_retries because they own a tight prompt + validation loop internally.
  2. Agent-level retries retry = { ... } on Agent { ... } retries one agent turn when the provider fails or the structured output is invalid.
  3. Procedure-level loops Your Lua code decides the workflow needs another step because a check failed, a human rejected the draft, or more information is needed.

The distinction matters:

  • Use stdlib retry when a helper already owns the micro-loop.
  • Use Agent retry when the same call should simply be tried again.
  • Use a procedure loop when the workflow needs new information, new feedback, or a different next action.

10.5 Make Replays Legible With Logs

Durable execution is only useful if you can understand what happened. Log the shape of your loop:

  • which attempt you’re on
  • what failed the check
  • what you’re asking the model to fix

Use Log.info, Log.warn, and Log.error with structured metadata so traces stay readable.

10.6 Per-Turn Tool Control (Preview)

When you call an agent, you can override which tools are available for that call:

worker({tools = {}})              -- no tools this turn
worker({tools = {search, done}})  -- only these tools

This is one of the simplest ways to prevent “tool drift”: you only enable tools when the workflow is ready for them.

We’ll use this in the next chapter when we add human approval gates.

10.7 Example: Iterate Until the Draft Passes Checks

The example for this chapter is code/chapter-09/40-meeting-recap-quality-loop.tac.

It runs the recap agent, then applies a few deterministic checks like:

  • subject is non-empty and not too long
  • body is long enough to be useful
  • action items are present when the notes include obvious actions

If a check fails, the procedure gives the agent targeted feedback and tries again—up to a max attempt count.

Run it:

tactus run code/chapter-09/40-meeting-recap-quality-loop.tac \
  --param recipient_name="Sam" \
  --param raw_notes="Discussed Q1 launch timeline. Risks: vendor delays. Action: Sam to confirm dates by Friday."

And test it in mock mode:

tactus test code/chapter-09/40-meeting-recap-quality-loop.tac --mock

10.8 Common Failure Modes (and Fixes)

  • Infinite loops: always cap attempts and assert on failure.
  • Vague retry prompts: tell the model exactly what failed (“Subject must be under 80 chars”).
  • Mixing responsibilities: keep “draft” and “send” separate; loop on draft quality, not on tool calls.
  • Silent degradation: if you’re accepting “good enough,” encode that in the checks so it’s explicit.

10.9 Looking Ahead

We can now iterate safely and deterministically. The last missing piece for a trustworthy workflow is human-in-the-loop: approvals, reviews, and input requests as durable checkpoints.