All runs
69
score

Initech billing outage escalation

failed

run_cust_initech_mt_1_3300 · escalation-agent@1.0.0 · 24d ago · source mock

Investigate Initech's billing outage and draft a customer-safe status update.

Latency
7.2s
Cost
$0.10
Tokens
5,250
Tool calls
3
Execution signal path
3 steps · 2 missing
7.2s · 3 tools
  1. Never called — Check deployment history
    Required before a customer-facing root-cause claim. This gap is the failure.
    Never called — Check runtime telemetry
    Required before a customer-facing root-cause claim. This gap is the failure.
Ask Autopilot
Autonomous, evidence-backed diagnosis

This run failed its evals. Ask Autopilot to diagnose the root cause from the trace and evidence.

Final customer-facing response

We are investigating the issue and will follow up; deployment verification is still pending.

Eval scores
Groundedness70%
Required tool usage50%
Approval policy100%
Usefulness72%
Cost & Latency Doctor
Efficient — no waste detected.
AgentOps Autopilot — reliability for AI agents