Your agent needs a timeout policy (or it'll burn your API budget overnight)

I woke up to a $340 OpenAI bill last month. My agent had gotten stuck in a loop trying to "optimize" a function that was already perfect, making 2,847 API calls over six hours while I slept.

The problem wasn't the agent being dumb — it was me being naive about runaway processes. Every agent needs a timeout policy, and most people build them wrong.

The naive approach: hard timeouts

Your first instinct is probably to add a simple timer:

timeout = 30 * 60  # 30 minutes
start_time = time.time()
while not task_complete:
    if time.time() - start_time > timeout:
        break
    # do work

This works until it doesn't. Your agent stops mid-sentence during important work, or worse — it stops right before completing a task that took 29 minutes to set up.

The better approach: progressive timeouts

Instead of one hard cutoff, build a ladder of interventions:

timeout_config = {
    "warn_at": 10 * 60,      # 10 min: ask if should continue
    "escalate_at": 20 * 60,  # 20 min: require explicit approval
    "hard_stop": 45 * 60     # 45 min: save state and exit
}

At 10 minutes, the agent asks: "I've been working on this for 10 minutes. Should I continue or try a different approach?"

At 20 minutes: "This is taking longer than expected. Please confirm you want me to keep going."

At 45 minutes: Hard stop, save progress, send summary.

The smart part: loop detection

Time limits catch runaway processes, but loop detection catches runaway logic:

recent_actions = deque(maxlen=10)

def check_for_loops(action):
    recent_actions.append(action)
    if len(recent_actions) >= 5:
        if len(set(recent_actions)) <= 2:
            return "LOOP_DETECTED"
    return "OK"

If your agent does the same 2 actions repeatedly over 5 iterations, it's probably stuck. Stop it before it burns tokens.

Warning: Don't just kill the process. Always save state first. Your agent should be able to resume from where it left off, not start over.

The implementation that actually works

Build this into your agent's main loop:

class TimeoutManager:
    def __init__(self, config):
        self.config = config
        self.start_time = time.time()
        self.warnings_sent = set()
    
    def check_status(self):
        elapsed = time.time() - self.start_time
        
        if elapsed > self.config["hard_stop"]:
            return "HARD_STOP"
        elif elapsed > self.config["escalate_at"] and "escalate" not in self.warnings_sent:
            self.warnings_sent.add("escalate")
            return "ESCALATE"
        elif elapsed > self.config["warn_at"] and "warn" not in self.warnings_sent:
            self.warnings_sent.add("warn")
            return "WARN"
        
        return "OK"

Your agent checks this every iteration. When it hits a timeout threshold, it knows exactly what to do.

Different tasks need different timeouts

Don't use the same timeout for everything. A quick search should timeout in 2 minutes. A code refactor might need 30. A research task could run for hours.

Quick tasks: 2-5 minutes
Analysis work: 15-30 minutes
Code generation: 10-45 minutes
Research/learning: 1-3 hours

Set the timeout when you assign the task, not globally.

The key insight: timeouts aren't about limiting your agent — they're about making it predictable. You should never wonder if your agent is still working or just burning money in a loop.

Your agent needs a timeout policy (or it'll burn your API budget overnight)

Get tips like this every morning