How to Build a Fully Autonomous Coding Agent That Commits Its Own Fixes
The Problem: Coding Agents Die You start a coding session with Claude Code or Codex. Three hours in, the agent hangs. Or crashes. Or declares failure because it hit a rate limit. You come back to a terminal full of noth…

The Problem: Coding Agents Die
You start a coding session with Claude Code or Codex. Three hours in, the agent hangs. Or crashes. Or declares failure because it hit a rate limit. You come back to a terminal full of nothing — no commits, no progress, and a vague memory that something important was happening.
This is the reality of AI coding agents in 2025. They are capable of impressive work, but they are fragile. Network hiccups, context window overflow, random API errors — any of them kills your session and loses your progress.
The solution is not better prompts. It is better architecture.
AI coding agents fail in predictable ways:
- Hanging sessions — The agent hits an ambiguous state, asks a clarifying question, and waits forever. Context windows decay. Nothing gets committed.
- Silent crashes — API errors, rate limits, or network issues terminate the session without warning. No error recovery, no checkpoint.
- Lost work — The agent writes code but never commits it. A restart wipes hours of progress.
- Single-threaded bottlenecks — You can only run one task at a time, even when work is embarrassingly parallel.
The root cause: most people run coding agents like they are chatting with a helpful assistant. But agents doing real work need the same infrastructure as any production system — persistence, retries, observability, and graceful degradation.
The Architecture: Persistent Sessions + Retry Loops + Completion Hooks
After running 100+ coding agent sessions, a pattern emerges. The reliable setups share three components:
1. tmux: The Persistence Layer
tmux (terminal multiplexer) keeps your session alive even if your SSH connection drops or your laptop sleeps. When you reconnect, everything is exactly where you left it.
# Create a named session for your coding agent
tmux new-session -d -s codex-task-001
# Attach to it when you want to check progress
tmux attach -t codex-task-001
# Detach with Ctrl+B, then D (keeps it running in background)
Critical: never run long agents outside tmux. A dropped connection kills the process. tmux is insurance against entropy.
2. The Ralph Loop: Automatic Retry on Failure
Named after a particularly stubborn agent that would not quit, the Ralph loop wraps your coding agent in an automatic restart mechanism. If the process exits — whether from success, failure, or crash — the loop restarts it.
#!/bin/bash
# ralph-loop.sh
while true; do
echo "[$(date)] Starting agent..."
claude-code --resume-from-checkpoint
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "[$(date)] Agent completed successfully"
break
else
echo "[$(date)] Agent exited with code $EXIT_CODE, restarting in 10s..."
sleep 10
fi
done
Run this inside tmux, and your agent becomes self-healing. Rate limit? Waits and retries. Crash? Restarts. The only thing that stops it is explicit completion.
3. Completion Hooks: Know When Work Finishes
The goal is not just persistence — it is delegation. You should be able to start a task, close your laptop, and get notified when it is done.
# completion-hook.sh
AGENT_NAME=$1
TASK_ID=$2
# Check if agent process still running
if ! pgrep -f "$AGENT_NAME" > /dev/null; then
# Agent finished - send notification
curl -X POST "https://api.telegram.org/bot$BOT_TOKEN/sendMessage" \
-d "chat_id=$CHAT_ID" \
-d "text=Coding agent completed: $TASK_ID"
# Optional: run verification tests
./verify-task.sh $TASK_ID
fi
Set this as a cron job that runs every 5 minutes:
*/5 * * * * /home/user/completion-hook.sh codex-task-001 task-001
Now your agent notifies you when it finishes. No polling. No wondering.
PRD-Driven Development: Checklists Before Completion
Agents hallucinate completion. They will claim a task is done when they have only written partial code. The fix: product requirement documents (PRDs) with explicit checklists.
Before starting, create a TASK.md:
# Task: Add User Authentication
## Checklist
- [ ] Create database migration for users table
- [ ] Implement register endpoint with password hashing
- [ ] Implement login endpoint with JWT generation
- [ ] Add auth middleware for protected routes
- [ ] Write tests for all new endpoints
- [ ] Update API documentation
## Verification
Run: `npm test && npm run test:integration`
Expected: All tests pass
The agent must verify every checkbox before declaring success. The Ralph loop checks the PRD on restart — if boxes are unchecked, it knows the task is not done.
Parallel Agent Execution
Using tmux sessions, you can run multiple agents simultaneously:
# Start three parallel tasks
tmux new-session -d -s agent-auth
tmux new-session -d -s agent-dashboard
tmux new-session -d -s agent-api-docs
# Each runs its own Ralph loop with different PRD
tmux send-keys -t agent-auth "./ralph-loop.sh auth-PRD.md" Enter
tmux send-keys -t agent-dashboard "./ralph-loop.sh dashboard-PRD.md" Enter
tmux send-keys -t agent-api-docs "./ralph-loop.sh docs-PRD.md" Enter
Now you are running three agents in parallel, each self-healing, each reporting completion independently. What took three days takes one.
Heartbeat Monitoring: Detecting Stuck Agents
Sometimes agents do not crash — they just stall. Waiting for input. Spinning on a hard problem. Use a heartbeat pattern to detect this:
# heartbeat.sh
touch /tmp/agent.heartbeat
while true; do
sleep 300 # 5 minutes
touch /tmp/agent.heartbeat
done
Your agent writes a heartbeat every 5 minutes. A separate monitor checks:
# check-heartbeat.sh
if [ $(find /tmp/agent.heartbeat -mmin +10) ]; then
# No heartbeat in 10 minutes - agent is stuck
curl -X POST "https://api.telegram.org/bot$BOT_TOKEN/sendMessage" \
-d "chat_id=$CHAT_ID" \
-d "text=Agent appears stuck - no heartbeat in 10+ minutes"
fi
Stuck agents get human attention. Healthy agents get left alone.
The Coding Agent Loops Skill
The patterns above — tmux persistence, Ralph loops, completion hooks, PRD checklists, parallel execution, heartbeat monitoring — are packaged as the Coding Agent Loops skill on Claw Mart.
It is free, battle-tested on 100+ coding sessions, and includes:
- Complete SKILL.md with setup instructions
- Pre-configured Ralph loop script
- tmux session templates
- Completion hook examples for Telegram/Discord/Email
- PRD template with validation checklist
- Heartbeat monitoring setup
- Parallel agent orchestration commands
The skill does not just give you tools — it gives you the operational knowledge of what breaks and how to fix it. Every rule exists because we hit the edge case.
Getting Started in 10 Minutes
- Install the skill from Claw Mart (free)
- Create your first PRD using the template
- Start a tmux session and run the Ralph loop
- Close your laptop and walk away
- Get notified when work completes
Your first autonomous commit is closer than you think.
Recommended for this post
