plexus.cli.procedure.test_sop_agent_coin_flip_scenario module

BDD Test Suite: SOP Agent Coin Flip Scenario

This module provides comprehensive testing of the StandardOperatingProcedureAgent using a simple, controllable scenario: conducting a coin flip procedure.

The scenario tests all core SOP agent features: 1. Worker agent tool access management 2. Tool explanation enforcement 3. Stop tool functionality 4. Manager agent coaching 5. Procedure completion detection 6. Chat recording integration

Test Scenario: 1. Worker calls coin_flip tool 3 times 2. Worker calls data_logging tool to record each result 3. Worker calls accuracy_calculator tool to compute results 4. Worker calls stop_procedure tool to finish 5. Manager provides coaching guidance throughout

class plexus.cli.procedure.test_sop_agent_coin_flip_scenario.CoinFlipProcedureDefinition

Bases: ProcedureDefinition

Simple procedure definition for coin flip procedure.

This demonstrates how to create a custom procedure with: - Specific tool subset for worker agent - Custom prompts for the task - Simple completion criteria

Initialize with coin flip procedure tools.

__init__(): Initialize with coin flip procedure tools.

get_allowed_tools() → List[str]: Get allowed tools for worker agent.

get_completion_summary(state_data: Dict[str, Any]) → str: Get completion summary.

get_sop_guidance_prompt(context: Dict[str, Any], state_data: Dict[str, Any]) → str: Get SOP manager guidance prompt.

get_system_prompt(context: Dict[str, Any]) → str: Get worker agent system prompt for coin flip procedure.

get_user_prompt(context: Dict[str, Any]) → str: Get initial user prompt for coin flip procedure.

should_continue(state_data: Dict[str, Any]) → bool: Determine if procedure should continue.

class plexus.cli.procedure.test_sop_agent_coin_flip_scenario.MockCoinFlipChatRecorder

Bases: ChatRecorder

Mock chat recorder for testing.

__init__()

async end_session(status: str, name: str = None) → bool: End recording session.

async record_message(role: str, content: str, message_type: str) → str | None: Record a message.

async record_system_message(content: str) → str | None: Record a system message.

async start_session(context: Dict[str, Any]) → str | None: Start recording session.

class plexus.cli.procedure.test_sop_agent_coin_flip_scenario.MockCoinFlipFlowManager

Bases: FlowManager

Mock flow manager for coin flip procedure.

__init__()

get_completion_summary() → str: Get completion summary.

get_next_guidance() → str | None: Get guidance for next step.

should_continue() → bool: Check if flow should continue.

update_state(new_data: Dict[str, Any]) → Dict[str, Any]: Update and return current state.

class plexus.cli.procedure.test_sop_agent_coin_flip_scenario.TestSOPAgentCoinFlipScenario

Bases: object

Comprehensive BDD test suite for SOP agent using coin flip scenario.

This tests all core SOP agent functionality: - Tool access management - Tool explanation enforcement - Manager coaching - Stop functionality - Chat recording - Procedure completion

coin_flip_procedure_definition(): Create coin flip procedure definition.

test_coin_flip_scenario_story_complete_workflow()

Story Test: Complete coin flip procedure workflow

This test tells the complete story of using an SOP agent to accomplish the coin flip task, demonstrating all the key features working together.

test_given_coin_flip_procedure_when_checking_continuation_then_respects_stop_and_safety_limits(coin_flip_procedure_definition): Given a coin flip procedure definition When checking if procedure should continue Then it should respect stop requests and safety limits

test_given_coin_flip_procedure_when_generating_sop_guidance_then_provides_contextual_coaching(coin_flip_procedure_definition): Given a coin flip procedure definition When generating SOP guidance at different stages Then it should provide contextual coaching questions

test_given_coin_flip_procedure_when_getting_prompts_then_provides_task_specific_guidance(coin_flip_procedure_definition): Given a coin flip procedure definition When getting system and user prompts Then it should provide task-specific guidance for the coin flip experiment

test_given_coin_flip_procedure_when_initialized_then_has_correct_tools(coin_flip_procedure_definition): Given a coin flip procedure definition When the procedure is initialized Then it should have the correct subset of tools available

test_given_coin_flip_scenario_when_checking_stop_tool_functionality_then_stops_correctly(coin_flip_procedure_definition): Given a coin flip scenario When the stop tool is used Then the procedure should stop correctly with proper reason tracking

test_given_multiple_coin_flip_procedures_when_comparing_configurations_then_demonstrates_customization(): Given multiple coin flip procedure configurations When comparing their setup Then it should demonstrate how the base SOP agent can be customized for different tasks