Claw Mart
← Back to Blog
March 21, 20268 min readClaw Mart Team

Fixing Rate Limit Errors in OpenClaw (Claude & OpenAI)

Fixing Rate Limit Errors in OpenClaw (Claude & OpenAI)

Fixing Rate Limit Errors in OpenClaw (Claude & OpenAI)

If you've been building with OpenClaw for more than a few days, you've almost certainly seen this:

RateLimitError: 429 - You have exceeded your rate limit. Please try again in 2s.

Or its equally annoying cousin from the Anthropic side:

Error 529: API is temporarily overloaded. Please retry after backoff.

The first time it happens, you shrug it off. The second time, you add a time.sleep(2) and move on. The third time — when your five-skill agent chain crashes 40 minutes into a research task and you lose everything — you want to throw your laptop into the ocean.

I've been there. I spent the better part of three weeks battling rate limit errors across multiple OpenClaw builds before I figured out a system that actually works. This post is everything I learned, distilled into something you can implement today.

Why Rate Limits Hit So Hard in OpenClaw

First, let's understand why this problem is worse in OpenClaw than it is when you're just making one-off API calls.

OpenClaw is an agentic platform. Your skills chain together. One skill's output feeds into the next. A single "task" might involve 15, 30, even 50+ underlying API calls to Claude or OpenAI depending on how your skills are configured. And here's the thing most people miss: OpenClaw doesn't just call the API once per skill execution. Depending on your skill's complexity, context window usage, and tool configurations, a single skill might make multiple calls under the hood — tool use loops, retries on malformed outputs, chain-of-thought expansions.

So when you think you're running "5 skills," you might actually be making 60+ API calls in a few minutes. That blows right through rate limits, especially if you're on a lower API tier.

Here's a quick breakdown of the typical limits you're hitting:

OpenAI (GPT-4o, GPT-4-turbo):

  • Tier 1: 500 RPM (requests per minute), 30,000 TPM (tokens per minute)
  • Tier 2: 5,000 RPM, 450,000 TPM
  • Tier 3: 5,000 RPM, 800,000 TPM

Anthropic (Claude 3.5 Sonnet, Claude 3 Opus):

  • Tier 1: 50 RPM, 40,000 TPM
  • Tier 2: 1,000 RPM, 80,000 TPM
  • Tier 3: 2,000 RPM, 160,000 TPM

Notice how Claude's limits on Tier 1 are dramatically lower than OpenAI's. If you're using Claude as your backbone model in OpenClaw and you're on Tier 1, you have 50 requests per minute. That's almost nothing for an agentic workflow. This is the single biggest reason people hit rate limits in OpenClaw and don't understand why — they're using Claude without realizing how tight the RPM cap is.

The Actual Fix: A Three-Layer Approach

There's no single setting you flip. You need three layers working together: request throttling, retry logic, and architectural changes to your skill chains. Let me walk through each one.

Layer 1: Built-In Request Throttling

OpenClaw supports configuring rate limit parameters in your agent's configuration. Most people either don't know this or leave the defaults, which are optimistically aggressive.

In your OpenClaw agent config (typically agent_config.yaml or whatever you've named your configuration file), you want to add explicit rate limiting:

model_config:
  provider: anthropic  # or openai
  model: claude-3-5-sonnet-20241022
  rate_limit:
    max_rpm: 40          # Stay under your tier's RPM cap
    max_tpm: 35000       # Leave a buffer below your tier's TPM cap
    concurrent_requests: 3  # Max parallel API calls
    cooldown_on_429: 5   # Seconds to wait after hitting a 429

A few things to notice here:

  1. I set max_rpm to 40, not 50. You always want a buffer. API rate limits aren't perfectly metered — there's jitter, there's lag in the counter reset, and if you're right at the edge, you'll still get 429s. Leave 10–20% headroom.

  2. concurrent_requests is critical. If you have skills running in parallel (which OpenClaw supports for independent subtasks), each concurrent skill is burning through your RPM independently. Setting this to 3 means at most 3 API calls are in flight at once. This prevents burst traffic from slamming your limit all at once.

  3. cooldown_on_429 tells OpenClaw how long to pause before retrying when it does hit a rate limit. The default is often too short. Five seconds is a solid starting point.

For OpenAI, the equivalent config looks like:

model_config:
  provider: openai
  model: gpt-4o
  rate_limit:
    max_rpm: 400         # Tier 1 is 500, leave buffer
    max_tpm: 25000
    concurrent_requests: 5
    cooldown_on_429: 3

OpenAI's limits are more generous, so you can be a bit more aggressive. But the principle is the same: stay under the cap, limit concurrency, and give yourself room for retry cooldowns.

Layer 2: Exponential Backoff with Jitter

The built-in throttling catches most problems, but it won't catch everything. Network delays, API-side throttling that doesn't match your tier's published limits (this happens more than you'd think), or sudden bursts from tool-use loops can all trigger 429s even with throttling in place.

You need a retry wrapper. Here's what I use in my OpenClaw skills:

import time
import random
from functools import wraps

def retry_with_backoff(max_retries=5, base_delay=2, max_delay=60):
    """
    Decorator for OpenClaw skill functions that call LLM APIs.
    Implements exponential backoff with jitter.
    """
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            retries = 0
            while retries < max_retries:
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    error_str = str(e).lower()
                    if "429" in error_str or "rate" in error_str or "overloaded" in error_str:
                        retries += 1
                        if retries >= max_retries:
                            raise Exception(
                                f"Max retries ({max_retries}) exceeded. Last error: {e}"
                            )
                        # Exponential backoff: 2s, 4s, 8s, 16s, 32s
                        delay = min(base_delay * (2 ** (retries - 1)), max_delay)
                        # Add jitter: random 0-50% extra time
                        jitter = delay * random.uniform(0, 0.5)
                        total_delay = delay + jitter
                        print(f"Rate limited. Retry {retries}/{max_retries} "
                              f"in {total_delay:.1f}s...")
                        time.sleep(total_delay)
                    else:
                        # Not a rate limit error — don't retry, just fail
                        raise
            return func(*args, **kwargs)
        return wrapper
    return decorator

Then apply it to your skill's core function:

@retry_with_backoff(max_retries=5, base_delay=2, max_delay=60)
def execute_research_skill(query, context):
    # Your OpenClaw skill logic here
    result = openclaw.run_skill(
        skill_name="deep_research",
        input=query,
        context=context
    )
    return result

Why jitter matters: if you have multiple skills retrying simultaneously and they all wait exactly 4 seconds, they'll all retry at the same moment and immediately hit the rate limit again. Jitter staggers them randomly. This is a standard distributed systems technique, and it makes a massive difference in practice.

Layer 3: Restructure Your Skill Chains

This is where the real gains are, and it's the layer most people skip because it requires rethinking your architecture instead of just slapping on retry logic.

Problem pattern #1: Overly chatty skills

If your skill is making the LLM call itself repeatedly in a loop (e.g., "analyze each of these 20 items one at a time"), you're generating 20 API calls where you could batch them. Restructure the skill to pass all 20 items in a single prompt with clear instructions for handling each one. Yes, the prompt will be longer. Yes, the single call will cost more tokens. But one call that uses 8,000 tokens is vastly better than twenty calls that use 400 tokens each, from a rate limit perspective.

# BAD: 20 separate API calls
for item in items:
    result = openclaw.run_skill("analyze_item", input=item)
    results.append(result)

# GOOD: 1 API call with batched input
batched_input = "\n---\n".join(
    [f"Item {i+1}: {item}" for i, item in enumerate(items)]
)
results = openclaw.run_skill(
    "analyze_items_batch",
    input=batched_input,
    instructions="Analyze each item separated by --- and return results in order."
)

Problem pattern #2: Sequential chains that should be parallel (or vice versa)

If you have three independent skills that don't depend on each other's output, running them sequentially wastes time but stays within rate limits. Running them in parallel is faster but triples your burst RPM. The right answer depends on your tier.

On Claude Tier 1 (50 RPM)? Run them sequentially. The time cost is worth it.

On OpenAI Tier 2+ (5,000 RPM)? Parallelize aggressively.

# Sequential (safer for low tiers)
execution_mode: sequential
skills:
  - summarize_docs
  - extract_entities  
  - generate_report

# Parallel (faster for high tiers)
execution_mode: parallel
skills:
  - summarize_docs
  - extract_entities
# Then sequential for dependent step:
post_parallel:
  - generate_report  # This needs outputs from both above

Problem pattern #3: No checkpointing

This is the one that causes the most pain. You're running a 10-skill chain, skill #8 hits a rate limit, retries are exhausted, and the whole thing crashes. You've lost the output from skills 1–7. You're starting over from scratch. You're burning tokens (and money) re-running work that already succeeded.

OpenClaw supports intermediate state saving. Use it:

# Enable checkpointing in your agent config
checkpoint_config:
  enabled: true
  storage: local           # or 'redis', 's3', etc.
  checkpoint_dir: ./checkpoints
  save_after_each_skill: true
  resume_on_failure: true

With this enabled, if skill #8 fails, you can resume from skill #8 instead of starting over. The outputs from skills 1–7 are cached locally (or wherever you configure). This alone will save you more money and frustration than any other single change.

The Quick Diagnostic Checklist

When you hit a rate limit error in OpenClaw, run through this in order:

  1. Check your API tier. Log into your OpenAI or Anthropic dashboard. What are your actual limits? Most rate limit problems come from people on Tier 1 trying to run Tier 3 workloads.

  2. Count your actual requests. Add logging to see how many API calls a single agent run actually makes. You'll probably be surprised. I had a "simple" 3-skill chain that was making 47 API calls because of tool-use loops.

  3. Check for runaway tool loops. If a skill has tools enabled and the model keeps calling the same tool repeatedly, that's eating your RPM alive. Add a max_tool_calls parameter to your skill config to cap it (I use 5–10 depending on the skill).

  4. Verify your rate limit config is actually being applied. A common gotcha — configuration files with typos or wrong indentation get silently ignored. Add a log statement at startup that prints the active rate limit config.

  5. Look at token usage, not just request count. You might be under RPM but over TPM. Long prompts with big context windows eat through token-per-minute limits fast. Consider trimming context or using summarization skills to compress intermediate outputs.

Model Fallback: Your Secret Weapon

One more technique that's dramatically underused: configure model fallbacks in OpenClaw so that if your primary model hits a rate limit, the system automatically falls back to a secondary model.

model_config:
  primary:
    provider: anthropic
    model: claude-3-5-sonnet-20241022
  fallback:
    provider: openai
    model: gpt-4o
  fallback_triggers:
    - rate_limit
    - timeout
    - overloaded

This is incredibly powerful. You're essentially doubling your available rate limit headroom by spreading the load across two providers. Claude gets rate limited? The next call goes to GPT-4o automatically. The quality difference between Claude 3.5 Sonnet and GPT-4o is marginal for most tasks, so your output quality stays consistent while your reliability goes way up.

You can even chain multiple fallbacks:

fallback_chain:
  - provider: anthropic
    model: claude-3-5-sonnet-20241022
  - provider: openai
    model: gpt-4o
  - provider: openai
    model: gpt-4o-mini  # Cheaper, faster, rarely rate limited

The third fallback to GPT-4o-mini is a great safety net. It's almost never rate limited because the limits are so high, and for many skill types (summarization, extraction, formatting), the quality is perfectly fine.

Skip the Setup: The Starter Pack Approach

Everything I've described above works. I've been running it in production for months. But I'll be honest — getting all of this configured correctly took me a solid week of trial and error. The YAML configs, the retry logic, the checkpointing, the fallback chains, the batching patterns — there's a lot of surface area for things to go wrong, and the debugging cycle of "change config, run agent, wait for it to fail 20 minutes in, check logs, repeat" is genuinely painful.

If you don't want to set all of this up manually, Felix's OpenClaw Starter Pack on Claw Mart includes pre-configured skills with rate limit handling, retry logic, and checkpointing already baked in. It's $29 and it includes the exact patterns I've described here — exponential backoff with jitter, model fallback chains, concurrent request limiting, and checkpointed skill chains — all pre-wired so you can focus on your actual use case instead of infrastructure. I wish it had existed when I started. Would have saved me a week and a lot of wasted API credits from failed runs.

The Bottom Line

Rate limit errors in OpenClaw aren't a bug — they're an architectural challenge. The API providers have hard limits. Agentic workflows are inherently request-heavy. The solution isn't to pray that you stay under the limit; it's to build your system with the assumption that you'll hit it and handle it gracefully.

Three layers: throttle proactively, retry intelligently, and restructure your chains to minimize requests. Add checkpointing so failures don't cost you completed work. Add model fallbacks so a single provider's rate limit doesn't stop your entire pipeline.

Get those in place and rate limits go from a show-stopping frustration to a minor background event your system handles without you ever noticing. That's when OpenClaw starts feeling like the production-grade tool it's supposed to be.

Now go fix your configs. Your agents are waiting.

Claw Mart Daily

Get one AI agent tip every morning

Free daily tips to make your OpenClaw agent smarter. No spam, unsubscribe anytime.

More From the Blog