Failure Patterns
Recurring failure modes mined from failed evals across runs. Turn failed traces into regression tests.
Unsupported root-cause claims
sig_unsupported_causal
Root cause
The agent makes customer-facing causal claims after checking tickets/docs but before verifying deployment history or runtime telemetry.
Missing deployment & telemetry verification
sig_missing_telemetry
Root cause
Required verification tools are skipped for customer-facing root-cause missions.
Inefficient loops / elevated cost
sig_inefficient_loop
Root cause
Repeated tool calls and retries inflate latency and cost without improving the answer.
Regression memory · quality audit
Failed runs are stored as long-term regression lessons. The Memory Quality Auditor flags lessons that are stale, low-confidence, or contradicted by an approved fix.
Active, high-confidence lesson backing a regression case.
8 runs
Active, high-confidence lesson backing a regression case.
4 runs
Confidence 64% — gather more evidence before acting on it.
2 runs