Claw Mart
← Back to Blog
March 20, 20268 min readClaw Mart Team

OpenClaw Agent Too Verbose? Make It Concise

OpenClaw Agent Too Verbose? Make It Concise

OpenClaw Agent Too Verbose? Make It Concise

Look, I'll save you the frustration I went through. You built an OpenClaw agent, you tested it, and instead of doing the thing you asked, it delivered a 400-token monologue about why it was going to do the thing, how it planned to do the thing, a philosophical reflection on the nature of the thing, and then — maybe — it actually did the thing.

Your agent sounds like a philosophy major on Adderall. You wanted a terse operator. Let's fix it.

This is hands down the most common complaint I hear from people building on OpenClaw, and the good news is that it's extremely solvable once you understand why it happens and which levers to pull. The bad news is that slapping "be concise" onto your system prompt and calling it a day does basically nothing. The model ignores that instruction after about two reasoning steps. Every single time.

So let's get into what actually works.

Why Your OpenClaw Agent Won't Shut Up

Before we fix it, you need to understand the root cause, because the solution changes depending on which flavor of verbosity you're dealing with.

Flavor 1: Chain-of-Thought Leakage

OpenClaw agents use a reasoning loop — think, act, observe, repeat. That's good. That's how they solve complex problems. The issue is that by default, the entire reasoning chain gets surfaced to the user. Your end user doesn't need to see "I should search for the company's pricing page because understanding their pricing strategy will help me assess competitive positioning..." They need the answer.

Flavor 2: Model Personality Priors

The underlying language models — especially the more capable ones — have deeply ingrained "be helpful and explain yourself" tendencies baked in during training. These models want to teach you. They want to show their work. They're like that kid in school who wrote three pages when the teacher asked for a paragraph. This prior fights against every conciseness instruction you give it.

Flavor 3: Redundant Meta-Reasoning

This is the sneaky one. Your agent starts reasoning about its own reasoning. "Now I need to think about what tool to use. The best tool would be the search tool because..." followed by "Having decided to use the search tool, I will now formulate my query by considering..." It's recursion without a base case, and it eats tokens for breakfast.

Flavor 4: Safety Padding

Models hedge. They qualify. They add disclaimers. "Based on the available information, and assuming the data is current, it appears that..." instead of just stating the fact. This adds 30-50% bloat to every single output.

Each of these has a different fix. Let's go through them.

The Nuclear Option: Structured Output Enforcement

This is the single most effective thing you can do, and it should be your first move. Instead of letting your OpenClaw agent free-write its responses, force it into a structured output format that physically limits where verbosity can live.

In your OpenClaw skill configuration, set the output schema explicitly:

output_schema:
  type: object
  properties:
    action:
      type: string
      description: "The tool to call, or 'final_answer'"
    action_input:
      type: object
      description: "Parameters for the tool"
    thought:
      type: string
      maxLength: 100
      description: "One sentence max. Why this action."
    final_output:
      type: string
      description: "Only populated when action is final_answer"
  required: ["action", "action_input"]

The key here is maxLength: 100 on the thought field. You're giving the agent a tiny box to reason in. It can still think — you're not lobotomizing it — but it has to be efficient about it.

On the output side, add a post-processor that strips everything except final_output before it reaches the user:

def clean_agent_output(raw_response):
    """Strip reasoning artifacts, return only what the user needs."""
    if raw_response.get("action") == "final_answer":
        return raw_response.get("final_output", "")
    # For intermediate steps, return nothing to the user
    return None

This alone will cut your visible output by 60-80%. The agent still reasons internally — that trace is still available in your logs for debugging — but the user only sees the clean result.

The System Prompt That Actually Works

I said "be concise" doesn't work, and I meant it. But that doesn't mean the system prompt is useless. It means you need a specific kind of prompt that the model actually respects across multiple reasoning steps.

Here's the system prompt pattern I've found most effective in OpenClaw:

You are a silent executor. You do not explain your reasoning unless the user explicitly asks "why" or "explain."

Rules:
1. Never narrate what you are about to do. Just do it.
2. Never summarize what you just did. Just present the result.
3. Tool calls get ONE sentence of rationale maximum, in the "thought" field only.
4. Final answers are DIRECT. No preamble. No "Based on my research..." openers. No disclaimers unless factually necessary.
5. If the answer is a list, return the list. Not a paragraph about the list.
6. Maximum final response: 3 sentences unless the task explicitly requires more.

You are a terse senior engineer. You respect the user's time.

A few things to notice about this prompt. First, it's specific. It doesn't say "be concise" — it says "never narrate what you are about to do." That's a concrete behavioral instruction the model can follow. Second, it provides an escape hatch ("unless the user explicitly asks why") which, counterintuitively, makes the model more willing to stay quiet by default. Third, it uses identity framing ("terse senior engineer") which models respond to more reliably than pure instruction.

Drop this into your OpenClaw agent's system prompt configuration:

agent_config:
  system_prompt: |
    You are a silent executor. You do not explain your reasoning 
    unless the user explicitly asks "why" or "explain."
    ... [rest of prompt above]
  model: your-preferred-model
  temperature: 0.3

Note the low temperature. Higher temperatures increase verbosity because the model explores more token paths, many of which are filler words and hedging language. For task-execution agents, 0.2-0.4 is the sweet spot. You're not writing poetry here. You want deterministic, tight output.

The Two-Phase Architecture (For When You Need Both)

Here's the thing — sometimes the verbose reasoning is useful. When your agent is doing complex multi-step research, planning a sequence of API calls, or synthesizing information from multiple sources, you actually want it thinking deeply. You just don't want the user to see it.

This is where OpenClaw's skill chaining really shines. Set up a two-phase pipeline:

Phase 1: The Thinker (Hidden)

thinker_skill:
  name: "deep_reasoning"
  visibility: internal  # Never shown to user
  system_prompt: |
    Think step by step. Be thorough. Consider edge cases.
    Output your full analysis.
  output_to: summarizer_skill

Phase 2: The Summarizer (User-Facing)

summarizer_skill:
  name: "concise_output"
  visibility: external  # This is what the user sees
  system_prompt: |
    You receive a detailed analysis. Distill it to the minimum 
    the user needs. No fluff. No qualifiers. Direct answer only.
    If it's a single fact, give the fact.
    If it's a recommendation, give the recommendation and ONE 
    reason. Not three. One.
  max_tokens: 200

The thinker can ramble all it wants. The summarizer compresses. The user only sees the compressed version. Your logs retain the full reasoning chain for debugging.

This pattern is borrowed from production systems that learned the hard way: you don't optimize for conciseness during reasoning; you optimize for it during presentation. These are fundamentally different problems.

A real-world example: I helped someone set up an OpenClaw agent for competitive intelligence. Their original setup produced an average of 620 tokens per response, with most of that being the agent explaining its search strategy and summarizing the significance of each source. After switching to the two-phase architecture, the user-facing output averaged 85 tokens. Latency dropped by roughly 65% because the summarizer step runs on a smaller, faster model. Total cost per query went down too, because even though you're making two model calls, the summarizer call is tiny.

Controlling Verbosity at the Tool Level

Another underrated approach: control what your tools return to the agent. If your tool dumps a massive JSON payload back into the agent's context, the agent will dutifully summarize every field. It can't help itself.

Trim your tool outputs before they hit the agent:

def search_tool(query: str) -> str:
    raw_results = perform_search(query)
    # Don't return 20 results with full snippets
    # Return top 3, title + one-line summary only
    trimmed = []
    for result in raw_results[:3]:
        trimmed.append(f"- {result['title']}: {result['snippet'][:100]}")
    return "\n".join(trimmed)

Less input context = less output verbosity. This is almost mechanical — the model's response length roughly correlates with the input length it's processing. Give it less to chew on, and it produces less output. It's simple, but most people overlook it entirely because they're focused on the system prompt while their tools are firehosing data into the context window.

The Max Tokens Hard Cap

This is the blunt instrument, but sometimes blunt instruments are exactly what you need. Set a hard max_tokens limit on your agent's generation:

agent_config:
  max_tokens: 150  # For the final response
  reasoning_max_tokens: 300  # For each intermediate step

The agent will learn to be concise because it literally runs out of room. It's like giving someone a Post-it note instead of a legal pad — they automatically prioritize. This isn't elegant, and occasionally the agent will get cut off mid-sentence on complex queries, so pair it with the structured output approach to make sure the important content comes first.

What Definitely Does NOT Work

Save yourself the time:

  1. Adding "be concise" or "be brief" to the system prompt alone. The model respects this for maybe two turns, then reverts.

  2. Relying on ReAct-style prompting without customization. The default Thought → Action → Observation format is an academic tool, not a production pattern. It produces the most verbose output of any agent architecture.

  3. Asking the model to "skip the explanation." It interprets this as "give a shorter explanation."

  4. Using high temperature with conciseness instructions. These work against each other. Temperature increases randomness, randomness increases filler.

  5. Prompt-engineering your way out of an architecture problem. If your pipeline surfaces intermediate reasoning to the user, no amount of prompting will make that feel clean. Fix the architecture first.

The Quick Start Path

If you're reading this and thinking "I just want this to work without manually configuring all of these pieces" — I get it. Honestly, the fastest path I've found is Felix's OpenClaw Starter Pack. It's a $29 bundle on Claw Mart that includes pre-configured skills with the structured output schemas, the two-phase thinker/summarizer pipeline, and tuned system prompts that solve exactly this verbosity problem out of the box. I spent a good two weeks dialing in the configuration I described above through trial and error; Felix's pack had basically the same setup ready to import. If you don't want to build all of this from scratch, it's a genuine time-saver and well worth the thirty bucks.

Whether you use that or set it up manually, the configuration patterns are the same. The pack just saves you the iteration cycles.

Debugging Verbosity After You've "Fixed" It

Even after implementing everything above, you'll occasionally get verbose outputs. Here's a quick diagnostic checklist:

  • Check your context window. If previous conversation turns are verbose, the model mirrors that style. Trim conversation history aggressively.
  • Check tool outputs. A single tool returning a massive payload can trigger essay-mode responses.
  • Check temperature. If someone bumped it to 0.7+ for "creativity," that's your verbosity source.
  • Check model selection. Some models are structurally more verbose than others. If you're using a model known for politeness and thoroughness, the two-phase architecture becomes essential rather than optional.
  • Check for prompt injection in user inputs. Users occasionally (accidentally or deliberately) include instructions like "explain your reasoning" which override your system prompt.

Where to Go From Here

Start with the structured output schema. That alone solves 60% of verbosity problems and takes five minutes to implement. Then add the system prompt. Then, if you're building anything user-facing, implement the two-phase pipeline.

The progression is:

  1. Structured output enforcement (immediate win)
  2. Concise system prompt with specific behavioral rules (15-minute setup)
  3. Tool output trimming (depends on how many tools you have)
  4. Two-phase thinker/summarizer pipeline (30-60 minutes, or instant if you use Felix's Starter Pack)
  5. Max token hard caps as a safety net

Each step compounds. By the time you've done all five, your agent will feel like a completely different product — fast, direct, and respectful of your users' time and your token budget.

The verbose agent isn't broken. It's just misconfigured. Fix the configuration, and you get the terse, effective tool you originally wanted to build.

Claw Mart Daily

Get one AI agent tip every morning

Free daily tips to make your OpenClaw agent smarter. No spam, unsubscribe anytime.

More From the Blog