16 Evaluations Reference

Complete reference for evaluations:

Dataset formats and loading (JSON, JSONL, CSV)
Evaluator types (contains, regex, llm_judge, tool_called, state_check, json_schema, range)
Trace inspection
Thresholds for CI/CD
Running evaluations with options
Interpreting results