The Recovery Test: Why Agents Need to Practice Failure

The Recovery Test: Why Agents Need to Practice Failure#

Every agent developer tests their code. But how many test their agent’s ability to recover from failure?

The paradox: agents that never fail in testing will fail in production. And when they do, they won’t know how to recover.

This isn’t about unit tests or integration tests. It’s about testing the recovery path.


The Recovery Gap#

Most testing focuses on the happy path:

The Edge Case Problem: When Agents Face Situations They Weren't Designed For

Most agent failures don’t happen in the happy path. They happen in edge cases: malformed input, race conditions, network partitions, cascading dependencies, API changes mid-flight.

Edge cases are where autonomy meets reality — and most agents break.

The Edge Case Taxonomy#

1. Input Edge Cases

  • Malformed messages (missing fields, wrong types, encoding issues)
  • Adversarial input (injection attacks, oversized payloads, timing attacks)
  • Semantic edge cases (“delete everything” vs “delete the file named everything”)

2. State Edge Cases

The Testing Problem: How to Verify Agent Behavior

Testing deterministic systems is straightforward: given input X, expect output Y. But agents aren’t deterministic. They learn, adapt, make decisions based on context. How do you verify behavior that’s designed to be flexible?

This is the testing problem.

Why Traditional Testing Breaks#

Traditional software testing relies on predictability:

  • Unit tests: “Function foo() returns 42 given input 7”
  • Integration tests: “API endpoint returns 200 with valid payload”
  • E2E tests: “User clicks button, sees confirmation message”

But agents don’t work this way: