Claw Mart
← Back to Blog
March 20, 20269 min readClaw Mart Team

Beginner’s Guide to Debugging Errors in OpenClaw

Beginner’s Guide to Debugging Errors in OpenClaw

Beginner’s Guide to Debugging Errors in OpenClaw

Let's get right into it: if you're building with OpenClaw and you've hit a wall of cryptic errors, silent failures, or agents that loop endlessly doing nothing useful, you're not alone. This is the single most common frustration I hear from people getting started with the platform. The good news is that most of these errors follow predictable patterns, and once you learn to read them, you'll go from "what the hell is happening" to "oh, that's an easy fix" in about a week.

This guide covers the most frequent debugging scenarios in OpenClaw, why they happen, and exactly how to fix them. No hand-waving. No "just read the docs." Actual solutions.

Why Debugging AI Agents Is Uniquely Painful

Before we get into specifics, it's worth understanding why agent debugging is different from traditional software debugging. When your Python script throws a KeyError, you look at the traceback, find the line, fix the dict. Done. When your OpenClaw agent fails, the problem might be:

  • The LLM chose the wrong tool entirely
  • The LLM formatted its output in a way the parser didn't expect
  • The tool worked fine but returned data the agent misinterpreted
  • The agent entered a reasoning loop that burned through tokens without progressing
  • Some combination of all of the above, non-deterministically

The core issue is opacity. The LLM is making decisions inside a black box, and standard error messages rarely tell you why it made a particular choice. OpenClaw gives you more leverage over this than most setups, but you need to know where to look.

Error Category 1: Output Parsing Failures

What you'll see:

Error: Could not parse LLM output

or

Error: Failed to decode JSON from agent response

or sometimes the agent just returns None with no explanation.

Why it happens:

The LLM generated text that doesn't match the expected structured format. Maybe it added a preamble before the JSON. Maybe it used single quotes instead of double quotes. Maybe it decided to "explain its reasoning" in free text instead of using the tool-call format.

How to debug it in OpenClaw:

First, turn on raw output logging. You want to see exactly what the LLM produced before the parser tried to handle it.

# In your OpenClaw skill config
debug:
  log_raw_responses: true
  log_level: verbose
  output_dir: ./debug_logs

Now run your agent again. Check the logs. Nine times out of ten, you'll see something like:

Raw LLM output: "Sure! I'd be happy to help. Here's the tool call:
{"tool": "search_orders", "input": {"order_id": "12345"}}"

See it? The LLM added conversational fluff before the JSON. The parser expected pure JSON and choked.

The fix:

Option A — Add explicit instructions to your system prompt:

You must respond ONLY with valid JSON. No preamble, no explanation, no markdown formatting. Just the JSON object.

Option B — Use OpenClaw's built-in output normalization:

parsing:
  mode: flexible
  strip_markdown: true
  extract_json: true
  fallback: retry_with_correction

The flexible parsing mode tells OpenClaw to attempt JSON extraction from anywhere in the response, not just expect the entire response to be valid JSON. The fallback: retry_with_correction option will automatically re-prompt the LLM with a correction message if parsing fails on the first attempt.

Option B is almost always what you want. Option A helps too, but LLMs ignore system prompt instructions often enough that you can't rely on it alone.

Pro tip: If you're seeing parsing failures intermittently (works 3 out of 4 times), it's almost always a prompt issue, not a config issue. The LLM is inconsistently formatting its output. Tightening your prompt helps, but flexible parsing mode is your real safety net.

Error Category 2: Tool Call Failures

What you'll see:

Error: Tool 'search_orders' raised exception: TypeError - expected string, got int

or

Error: Tool not found: 'search_order'

(Note the missing 's' — the LLM hallucinated a slightly wrong tool name.)

Why it happens:

Two sub-problems here. Either the LLM is calling a tool with the wrong argument types/format, or it's calling a tool that doesn't exist (usually a misspelling or hallucination of a tool name).

How to debug it:

Enable tool call tracing:

debug:
  trace_tool_calls: true
  log_tool_inputs: true
  log_tool_outputs: true

This will give you a full trace like:

[Step 3] Agent selected tool: search_order
[Step 3] Tool input: {"order_id": 12345}
[Step 3] ERROR: Tool 'search_order' not found. Available tools: ['search_orders', 'get_customer', 'process_refund']

Now you can see exactly what happened: the agent dropped the 's' from search_orders.

The fix for wrong tool names:

tools:
  fuzzy_matching: true
  fuzzy_threshold: 0.85

This tells OpenClaw to attempt fuzzy matching on tool names. If the agent says search_order and you have search_orders, it'll match. Set the threshold based on how permissive you want to be — 0.85 is a solid default that catches typos without matching completely wrong tools.

The fix for wrong argument types:

Add explicit type validation in your tool definitions:

tools:
  - name: search_orders
    description: "Search for customer orders by order ID"
    parameters:
      order_id:
        type: string
        required: true
        description: "The order ID as a string, e.g., 'ORD-12345'"
    validation:
      coerce_types: true

The coerce_types: true setting is critical. It tells OpenClaw to attempt reasonable type coercion (int → string, string → int, etc.) before raising an error. Since LLMs are notoriously inconsistent about whether they pass "12345" or 12345, this saves you a ton of headaches.

Error Category 3: Infinite Loops and Runaway Agents

What you'll see:

Your agent keeps calling the same tool over and over, or it alternates between two tools without making progress. Meanwhile, your token count climbs and nothing useful happens.

Why it happens:

The agent's reasoning gets stuck. Common patterns:

  • "Search → get results → decide results aren't good enough → search again with the same query"
  • "Call tool → tool returns error → agent retries with same input → same error → retry..."
  • "Agent A passes task to Agent B → Agent B passes it back → loop"

How to debug it:

execution:
  max_iterations: 10
  max_tool_retries: 2
  loop_detection: true
  
debug:
  trace_reasoning: true

The trace_reasoning flag is the most valuable thing here. It logs the agent's "thought" at each step, so you can see exactly where the reasoning breaks down:

[Step 1] Thought: I need to find the customer's order. Let me search.
[Step 1] Action: search_orders(query="blue widget")
[Step 1] Result: No orders found.
[Step 2] Thought: The search didn't find anything. Let me try again.
[Step 2] Action: search_orders(query="blue widget")
[Step 2] Result: No orders found.
[Step 3] Thought: Still nothing. Let me search one more time.
...

Classic loop. The agent doesn't know what to do when the tool returns no results, so it just tries the same thing again.

The fix:

Set hard limits to prevent cost explosions:

execution:
  max_iterations: 10
  max_tool_retries: 2
  duplicate_call_policy: block_and_redirect
  token_budget: 5000

The duplicate_call_policy: block_and_redirect is particularly useful. When OpenClaw detects the agent calling the same tool with the same (or very similar) inputs, it blocks the call and injects a message like: "You already tried this search and got no results. Try a different approach or respond to the user."

For the token_budget, set this to whatever makes sense for your use case. It's a hard ceiling — when you hit it, the agent wraps up with whatever it has. Better than a $47 bill for nothing.

Also, improve your prompt to handle dead ends:

If a tool returns no results or an error, do NOT retry with the same input. Instead, either:
1. Try a different search query or approach
2. Ask the user for clarification
3. Respond with what you know and explain what you couldn't find

Error Category 4: State and Memory Bugs

What you'll see:

The agent "forgets" information from earlier in the conversation, contradicts itself, or behaves as if a previous step didn't happen.

Why it happens:

The conversation buffer isn't containing what you think it contains. Maybe old messages got truncated to stay within context limits. Maybe the memory summarization step lost critical details. Maybe tool outputs aren't being included in the conversation history correctly.

How to debug it:

debug:
  log_memory_state: true
  log_context_window: true
  snapshot_interval: every_step

With snapshot_interval: every_step, OpenClaw dumps the full state of the agent's memory and context window at every reasoning step. Yes, this generates a lot of logs. That's the point. You need to see exactly what the agent had in its context when it made the bad decision.

Check the snapshots for:

  • Truncated tool outputs: If a tool returned a massive JSON blob, it might have been truncated before being added to context. The agent then works with incomplete data.
  • Missing conversation turns: If the context window is tight, older messages get dropped. The agent literally doesn't "remember" them.
  • Garbled memory summaries: If you're using summarized memory, the summary might have lost key details.

The fix:

memory:
  strategy: sliding_window
  max_tokens: 4000
  tool_output_max_tokens: 500
  preserve_system_prompt: true
  preserve_last_n_turns: 4

The key settings: tool_output_max_tokens prevents massive tool outputs from dominating your context window. preserve_last_n_turns ensures the most recent conversation turns are never dropped. preserve_system_prompt keeps your system instructions intact even when older content gets truncated.

If you're working with long conversations, consider chunking your tool outputs and only passing the most relevant portion to the agent, rather than dumping everything into context.

Error Category 5: Multi-Skill Coordination Failures

What you'll see:

One skill passes bad data to another. Or a skill fires when it shouldn't. Or the orchestration between skills produces results that don't make sense even though each skill works fine in isolation.

Why it happens:

This is the "works in unit testing, fails in integration" problem of AI agents. Each skill does its job correctly, but the handoffs between them lose information, introduce type mismatches, or trigger unintended flows.

How to debug it:

debug:
  trace_skill_handoffs: true
  log_inter_skill_data: true
  visualize_flow: true

The visualize_flow option (if your OpenClaw version supports it) generates a visual graph of which skills fired, in what order, and what data passed between them. This is the fastest way to spot coordination issues.

Without visualization, the handoff logs will show you:

[Orchestrator] Routing to skill: order_lookup
[order_lookup] Input: {"customer_id": "CUST-789"}
[order_lookup] Output: {"orders": [...], "status": "success"}
[Orchestrator] Routing to skill: refund_processor
[refund_processor] Input: {"order_data": null}

See the problem? The orchestrator didn't correctly map the output of order_lookup to the input of refund_processor. The order data was there but didn't make it through.

The fix:

This usually comes down to your skill wiring. Make sure output schemas match input schemas at every handoff point. And add explicit validation:

orchestration:
  validate_handoffs: true
  strict_schema_matching: true
  on_mismatch: log_and_retry

The Fastest Way to Get Set Up Right

Here's where I'll be honest with you: configuring all of this from scratch is doable, but it's tedious. You have to get the debug settings right, set up the parsing modes, configure tool validation, wire up loop detection, and tune the memory management. Each piece is straightforward, but getting all of them working together in a coherent setup takes time.

If you don't want to set this all up manually, Felix's OpenClaw Starter Pack on Claw Mart includes pre-configured skills that already have sensible debug settings, output parsing, loop detection, and tool validation baked in. It's $29 and it'll save you a solid afternoon of config wrangling. I wish I'd had something like it when I was first getting started — I spent way too many hours getting the debug logging piped correctly before I could even start fixing the actual errors.

The starter pack is especially useful if you're hitting the multi-skill coordination issues I described above, because the skills come pre-wired with compatible schemas and validated handoffs. You can always customize from there, but starting with a working baseline beats staring at config files.

The Debugging Mindset for AI Agents

Here's the mental model that actually works for OpenClaw debugging:

  1. Always look at the raw output first. Before you change prompts, tweak configs, or blame the LLM, look at what it actually produced. Most errors become obvious when you see the raw text.

  2. Add logging before you add fixes. The instinct is to immediately change things when something breaks. Resist it. First, add enough logging to understand exactly what happened. Then fix it. Otherwise you're guessing, and you'll introduce new bugs.

  3. Test skills in isolation before testing orchestration. Make sure each skill works correctly on its own with various inputs, including edge cases and bad inputs. Only then wire them together.

  4. Embrace non-determinism. AI agents are probabilistic. Something that works 95% of the time will fail 5% of the time. Your job isn't to eliminate all failures — it's to handle failures gracefully and make them easy to diagnose.

  5. Keep a bug journal. Seriously. When you hit a weird error and figure out the fix, write it down. Agent bugs are repetitive. You'll see the same patterns over and over, and having your own reference saves enormous time.

Next Steps

If you're just starting out with OpenClaw debugging:

  1. Turn on log_raw_responses and trace_tool_calls immediately. These two settings alone will help you diagnose 80% of errors.
  2. Set max_iterations and token_budget to reasonable limits. Don't let runaway agents drain your wallet while you're developing.
  3. Use flexible parsing mode unless you have a specific reason not to. It handles the most common LLM output quirks automatically.
  4. Get a working baseline first, whether that's through Felix's Starter Pack or your own config. Then iterate from there.
  5. Join the community. Other OpenClaw builders have hit the same errors you're hitting. Don't debug in isolation when you don't have to.

Debugging AI agents is a skill. It feels chaotic at first, but the error patterns are finite and learnable. Once you internalize the categories above, most issues will take you minutes to diagnose instead of hours. And that's when building with OpenClaw actually starts to get fun.

Claw Mart Daily

Get one AI agent tip every morning

Free daily tips to make your OpenClaw agent smarter. No spam, unsubscribe anytime.

More From the Blog