16  Behavior Specifications

At this point in the book you can build a workflow that is both useful and well-contained.

The remaining question is the one every engineering team eventually asks:

How do we know it keeps doing the right thing next week?

Tactus procedures mix deterministic code (Lua) and non-deterministic steps (agents). Behavior specifications are how you stop that mix from turning into “it worked on my laptop” folklore.

If Part III was about putting the monkey with a razor blade in a cage, Part IV is about making sure it reliably shaves the right thing.

16.1 Specs Are About Invariants

For agent workflows, a “test” is usually not “the exact output string matches”.

A spec is a set of invariants that must hold:

  • a dangerous tool was never called
  • a stage transition happened
  • state contains an idempotency marker after a side effect
  • outputs match a schema and basic constraints
  • the workflow fails fast when preconditions aren’t met

Those are the properties that keep you out of incident postmortems.

16.2 Gherkin in Tactus

Tactus uses Gherkin-style BDD (behavior-driven development) directly inside .tac files:

  • Feature / Scenario describe intent in plain language
  • Given / When / Then use built-in step definitions to execute and assert

In code, you embed it like:

Specifications([[
Feature: Idempotent send step
  Use state to guard side effects so retries don't double-send

  Scenario: Send is only performed once
    Given the procedure has started
    When the procedure runs
    Then the procedure should complete successfully
    And the send_email tool should be called exactly 1 time
    And the state message_id should exist
    And the state send_attempts should be 1
    And the stage should be complete
]])

That example is runnable in this repo: code/chapter-08/35-meeting-recap-idempotent-send.tac.

16.3 Running Specs

Run a spec once:

tactus test code/chapter-08/35-meeting-recap-idempotent-send.tac

Run in mock mode (fast, deterministic, no API keys):

tactus test code/chapter-08/35-meeting-recap-idempotent-send.tac --mock

Measure consistency by running each scenario multiple times:

tactus test code/chapter-08/35-meeting-recap-idempotent-send.tac --runs 10

16.4 Mocking: Make Tests Cheap

The fastest path to maintainable specs is to keep them cheap to run.

In this book, most examples include a Mocks { ... } block so that:

  • agent calls don’t hit real model APIs
  • tool calls are deterministic
  • you can run tests in CI without secrets

Mock mode is not only about cost; it’s what makes agent workflow iteration feel like normal software development.

16.5 How to Write Good Specs for Agent Workflows

Three patterns show up again and again:

  1. Assert capability boundaries.
    • “the send tool should not be called”
    • “the filesystem tool should not be called”
  2. Assert state/progress markers.
    • “the state message_id should exist”
    • “the stage should be complete”
  3. Assert shape and constraints of outputs.
    • “output subject should match pattern …”
    • “output action_items should exist”

These are the invariants that make “bounded autonomy” real.

16.6 How It Connects to the Running Example

Part II gave us a recap workflow that drafts, iterates, asks for human approval, and sends.

In Part IV, we’ll use specs to lock in policies like:

  • “send_email must never be called unless approved”
  • “send must be idempotent across retries”
  • “draft output must have subject/body/action_items”

Then we’ll graduate from “can it work?” to “how often does it work?” with evaluations.