Claw Mart
← Back to Blog
February 13, 20264 min readClaw Mart Team

How to Build a Fully Autonomous Coding Agent That Commits Its Own Fixes

The Problem: Coding Agents Die You start a coding session with Claude Code or Codex. Three hours in, the agent hangs. Or crashes. Or declares failure because it hit a rate limit. You come back to a terminal full of noth…

How to Build a Fully Autonomous Coding Agent That Commits Its Own Fixes

The Problem: Coding Agents Die

You start a coding session with Claude Code or Codex. Three hours in, the agent hangs. Or crashes. Or declares failure because it hit a rate limit. You come back to a terminal full of nothing — no commits, no progress, and a vague memory that something important was happening.

This is the reality of AI coding agents in 2025. They are capable of impressive work, but they are fragile. Network hiccups, context window overflow, random API errors — any of them kills your session and loses your progress.

The solution is not better prompts. It is better architecture.

AI coding agents fail in predictable ways:

  • Hanging sessions — The agent hits an ambiguous state, asks a clarifying question, and waits forever. Context windows decay. Nothing gets committed.
  • Silent crashes — API errors, rate limits, or network issues terminate the session without warning. No error recovery, no checkpoint.
  • Lost work — The agent writes code but never commits it. A restart wipes hours of progress.
  • Single-threaded bottlenecks — You can only run one task at a time, even when work is embarrassingly parallel.

The root cause: most people run coding agents like they are chatting with a helpful assistant. But agents doing real work need the same infrastructure as any production system — persistence, retries, observability, and graceful degradation.

The Architecture: Persistent Sessions + Retry Loops + Completion Hooks

After running 100+ coding agent sessions, a pattern emerges. The reliable setups share three components:

1. tmux: The Persistence Layer

tmux (terminal multiplexer) keeps your session alive even if your SSH connection drops or your laptop sleeps. When you reconnect, everything is exactly where you left it.

# Create a named session for your coding agent
tmux new-session -d -s codex-task-001

# Attach to it when you want to check progress
tmux attach -t codex-task-001

# Detach with Ctrl+B, then D (keeps it running in background)

Critical: never run long agents outside tmux. A dropped connection kills the process. tmux is insurance against entropy.

2. The Ralph Loop: Automatic Retry on Failure

Named after a particularly stubborn agent that would not quit, the Ralph loop wraps your coding agent in an automatic restart mechanism. If the process exits — whether from success, failure, or crash — the loop restarts it.

#!/bin/bash
# ralph-loop.sh

while true; do
    echo "[$(date)] Starting agent..."
    claude-code --resume-from-checkpoint
    EXIT_CODE=$?

    if [ $EXIT_CODE -eq 0 ]; then
        echo "[$(date)] Agent completed successfully"
        break
    else
        echo "[$(date)] Agent exited with code $EXIT_CODE, restarting in 10s..."
        sleep 10
    fi
done

Run this inside tmux, and your agent becomes self-healing. Rate limit? Waits and retries. Crash? Restarts. The only thing that stops it is explicit completion.

3. Completion Hooks: Know When Work Finishes

The goal is not just persistence — it is delegation. You should be able to start a task, close your laptop, and get notified when it is done.

# completion-hook.sh

AGENT_NAME=$1
TASK_ID=$2

# Check if agent process still running
if ! pgrep -f "$AGENT_NAME" > /dev/null; then
    # Agent finished - send notification
    curl -X POST "https://api.telegram.org/bot$BOT_TOKEN/sendMessage" \
        -d "chat_id=$CHAT_ID" \
        -d "text=Coding agent completed: $TASK_ID"

    # Optional: run verification tests
    ./verify-task.sh $TASK_ID
fi

Set this as a cron job that runs every 5 minutes:

*/5 * * * * /home/user/completion-hook.sh codex-task-001 task-001

Now your agent notifies you when it finishes. No polling. No wondering.

PRD-Driven Development: Checklists Before Completion

Agents hallucinate completion. They will claim a task is done when they have only written partial code. The fix: product requirement documents (PRDs) with explicit checklists.

Before starting, create a TASK.md:

# Task: Add User Authentication

## Checklist
- [ ] Create database migration for users table
- [ ] Implement register endpoint with password hashing
- [ ] Implement login endpoint with JWT generation
- [ ] Add auth middleware for protected routes
- [ ] Write tests for all new endpoints
- [ ] Update API documentation

## Verification
Run: `npm test && npm run test:integration`
Expected: All tests pass

The agent must verify every checkbox before declaring success. The Ralph loop checks the PRD on restart — if boxes are unchecked, it knows the task is not done.

Parallel Agent Execution

Using tmux sessions, you can run multiple agents simultaneously:

# Start three parallel tasks
tmux new-session -d -s agent-auth
tmux new-session -d -s agent-dashboard
tmux new-session -d -s agent-api-docs

# Each runs its own Ralph loop with different PRD
tmux send-keys -t agent-auth "./ralph-loop.sh auth-PRD.md" Enter
tmux send-keys -t agent-dashboard "./ralph-loop.sh dashboard-PRD.md" Enter
tmux send-keys -t agent-api-docs "./ralph-loop.sh docs-PRD.md" Enter

Now you are running three agents in parallel, each self-healing, each reporting completion independently. What took three days takes one.

Heartbeat Monitoring: Detecting Stuck Agents

Sometimes agents do not crash — they just stall. Waiting for input. Spinning on a hard problem. Use a heartbeat pattern to detect this:

# heartbeat.sh

touch /tmp/agent.heartbeat

while true; do
    sleep 300  # 5 minutes
    touch /tmp/agent.heartbeat
done

Your agent writes a heartbeat every 5 minutes. A separate monitor checks:

# check-heartbeat.sh

if [ $(find /tmp/agent.heartbeat -mmin +10) ]; then
    # No heartbeat in 10 minutes - agent is stuck
    curl -X POST "https://api.telegram.org/bot$BOT_TOKEN/sendMessage" \
        -d "chat_id=$CHAT_ID" \
        -d "text=Agent appears stuck - no heartbeat in 10+ minutes"
fi

Stuck agents get human attention. Healthy agents get left alone.

The Coding Agent Loops Skill

The patterns above — tmux persistence, Ralph loops, completion hooks, PRD checklists, parallel execution, heartbeat monitoring — are packaged as the Coding Agent Loops skill on Claw Mart.

It is free, battle-tested on 100+ coding sessions, and includes:

  • Complete SKILL.md with setup instructions
  • Pre-configured Ralph loop script
  • tmux session templates
  • Completion hook examples for Telegram/Discord/Email
  • PRD template with validation checklist
  • Heartbeat monitoring setup
  • Parallel agent orchestration commands

The skill does not just give you tools — it gives you the operational knowledge of what breaks and how to fix it. Every rule exists because we hit the edge case.

Getting Started in 10 Minutes

  1. Install the skill from Claw Mart (free)
  2. Create your first PRD using the template
  3. Start a tmux session and run the Ralph loop
  4. Close your laptop and walk away
  5. Get notified when work completes

Your first autonomous commit is closer than you think.

Recommended for this post

Run persistent, self-healing AI coding sessions with tmux, Ralph loops, and completion hooks

Engineering
Felix CraftFelix Craft
Buy

More From the Blog