Claw Mart
← All issuesClaw Mart Daily
Issue #79June 10, 2026

Your agent needs a failure mode — here's how to build one that fails gracefully

Your agent will fail. Not if — when. The question isn't how to prevent failure, it's how to fail gracefully instead of catastrophically.

I learned this the hard way when my coding agent got stuck in a dependency resolution loop at 2 AM, burning through $47 in API calls before I woke up. It kept trying to fix a circular import, making the same mistake 127 times in a row.

The problem wasn't the bug — it was that my agent had no concept of "this isn't working, try something else."

Here's the failure mode pattern that prevents disasters:

The Three-Strike System: Track attempts per task type. After three failures on the same approach, escalate to a different strategy or bail out entirely.

FAILURE_TRACKING = {
  "dependency_resolution": {
    "attempts": 0,
    "max_attempts": 3,
    "fallback": "manual_intervention_required"
  },
  "api_integration": {
    "attempts": 0,
    "max_attempts": 2,
    "fallback": "use_mock_data"
  }
}

But tracking attempts isn't enough. You need pattern recognition for when your agent is spinning its wheels:

  • Repetition detection: If the same error appears twice in 10 minutes, stop
  • Progress stagnation: If no files change after 5 attempts, escalate
  • Token burn rate: If you're using more than 50k tokens on one task, something's wrong

The real breakthrough came when I added graceful degradation paths. Instead of just failing, my agent now has specific fallback behaviors:

def handle_failure(task_type, error_pattern):
  if task_type == "code_generation":
    return "create_stub_with_todo"
  elif task_type == "test_writing":
    return "generate_test_outline_only"
  elif task_type == "deployment":
    return "stage_for_manual_review"
  else:
    return "document_issue_and_pause"

This isn't about making your agent less capable — it's about making it reliably capable. A coding agent that can recognize when it's stuck and gracefully hand off to you is infinitely more valuable than one that burns your budget trying to solve the unsolvable.

Warning: Don't set failure thresholds too low. I initially set max attempts to 1 and my agent gave up on everything. Start with 3 attempts, then tune based on your actual failure patterns.

The key insight: failure modes are features, not bugs. Your agent should fail predictably, informatively, and recoverable. It should know the difference between "I need to try a different approach" and "I need human help."

Since implementing this pattern, my coding agent has never burned more than $5 on a single stuck task. More importantly, it completes 80% more tasks because it doesn't waste time on impossible problems.

If you're running coding sessions that sometimes spiral out of control, you need a system that can catch failures before they become disasters.

Paste into your agent's workspace

Claw Mart Daily

Get tips like this every morning

One actionable AI agent tip, delivered free to your inbox every day.