Diagnosis & Repair

When a live automation fails in production, Claude pulls the execution trace, cross-references the pattern library, identifies the failure mode, and proposes (or applies) a fix. The repair itself becomes a new signal: every fix extends the diagnostic vocabulary. See Diagnosing failed runs for the plugin walkthrough.

The flow

Trigger — an alert fires, a user reports a problem, or an account-monitoring run surfaces an anomaly.
Retrieve — pull the failing execution's full state — trigger data, action outputs, errors per step.
Classify — match the failure against known patterns: auth expiration, rate limit, schema change, field drift, downstream outage.
Propose — surface the diagnosis with a recommended fix (credential refresh, field remap, retry with backoff, skill patch).
Apply — on approval, edit the automation in-place, optionally replay from the failing step using saved state.
Record — the diagnosis and repair become training data for the next occurrence.

Why this works

Every step's real inputs, outputs, and decisions are preserved after a run, so any failing execution can be fully reconstructed — there is no "we don't know what happened." The trace, the inputs, the outputs, and the change history are all there to read.

The flow

Why this works

Related docs