All runs7.2s · 3 tools
69
score
Initech billing outage escalation
failedrun_cust_initech_mt_0_770 · escalation-agent@1.0.0 · 22d ago · source mock
“Investigate Initech's billing outage and draft a customer-safe status update.”
Latency
7.2s
Cost
$0.10
Tokens
5,250
Tool calls
3
Execution signal path
3 steps · 2 missing
- Never called — Check deployment historyRequired before a customer-facing root-cause claim. This gap is the failure.Never called — Check runtime telemetryRequired before a customer-facing root-cause claim. This gap is the failure.
Ask Autopilot
Autonomous, evidence-backed diagnosis
This run failed its evals. Ask Autopilot to diagnose the root cause from the trace and evidence.
Final customer-facing response
We are investigating the issue and will follow up; deployment verification is still pending.
Eval scores
Groundedness70%
Required tool usage50%
Approval policy100%
Usefulness72%
Cost & Latency Doctor
Efficient — no waste detected.