Sub-Agent Orchestration: Building Agent Teams in OpenClaw

Most people who try to build multi-agent systems hit the same wall: everything works great in the demo, then collapses spectacularly the moment you try to do something real. Agents loop endlessly, context windows explode, costs spiral, and you spend more time debugging agent communication than actually solving the problem you set out to solve.

I've been there. And after burning through a frankly embarrassing number of API credits trying to get agents to collaborate reliably, I can tell you that the problem isn't the concept — agent teams are genuinely powerful. The problem is that most people either over-engineer the orchestration or under-engineer the boundaries between agents.

OpenClaw fixes this by giving you a framework that's opinionated where it matters (state management, context passing, error recovery) and flexible where it doesn't (how you structure your teams, what models you use, what tools you give each agent). This post is going to walk you through how sub-agent orchestration actually works in OpenClaw, with real patterns you can steal and code you can adapt.

Let's get into it.

What Sub-Agent Orchestration Actually Means (and Why You'd Want It)

Before we touch any code, let's kill the jargon. Sub-agent orchestration just means: one agent manages other agents to complete a task.

Think of it like a project manager who doesn't write code, design interfaces, or run QA — but knows exactly who to hand each piece to and how to combine the results. That's your supervisor agent. The specialists it delegates to? Those are your sub-agents.

Why would you want this instead of one big, omniscient agent? Three reasons:

Specialization works. A single agent trying to research, analyze, write, and fact-check will do all of those things poorly. An agent that only fact-checks, with a focused system prompt and limited tools, will do that one thing well.
Token economics. Instead of stuffing 50,000 tokens of context into every single call, each sub-agent only gets what it needs. Your research agent gets the source material. Your writing agent gets the research summary. Your editor gets the draft. Costs drop dramatically.
Debugging becomes possible. When something goes wrong in a monolithic agent, good luck figuring out where. When your fact-checking sub-agent returns garbage, you know exactly where to look.

The catch? Orchestration is hard. Which is exactly why you want a framework that handles the plumbing so you can focus on the architecture.

The OpenClaw Approach to Agent Teams

OpenClaw's orchestration model is built around three core concepts: Supervisors, Workers, and Channels.

Supervisors are agents whose primary job is routing, delegating, and synthesizing. They decide which worker to call, what context to send, and how to combine results.
Workers are specialized agents with focused system prompts, constrained tool access, and a single area of responsibility.
Channels are the communication layer between them — structured data passing with built-in compression and state management.

This isn't theoretical. Here's what a basic team definition looks like in OpenClaw:

from openclaw import Supervisor, Worker, Channel, Team

# Define your specialist workers
researcher = Worker(
    name="researcher",
    system_prompt="""You are a research specialist. Given a topic, 
    find relevant facts, statistics, and sources. Return structured 
    findings only. Do not editorialize.""",
    tools=["web_search", "document_reader"],
    output_schema={
        "findings": "list[str]",
        "sources": "list[str]",
        "confidence": "float"
    }
)

writer = Worker(
    name="writer",
    system_prompt="""You are a content writer. Given research findings 
    and a brief, produce clear, engaging prose. Follow the tone and 
    format specified in the brief.""",
    tools=["text_editor"],
    output_schema={
        "draft": "str",
        "word_count": "int"
    }
)

editor = Worker(
    name="editor",
    system_prompt="""You are an editor. Review drafts for accuracy 
    against provided sources, clarity, and tone. Return specific 
    suggested changes.""",
    tools=["diff_tool"],
    output_schema={
        "approved": "bool",
        "changes": "list[dict]",
        "notes": "str"
    }
)

# Define the supervisor
manager = Supervisor(
    name="content_manager",
    system_prompt="""You manage a content production team. Break down 
    incoming requests into research, writing, and editing tasks. 
    Delegate appropriately and synthesize final output.""",
    workers=[researcher, writer, editor],
    strategy="sequential"  # or "parallel", "adaptive"
)

# Create and run the team
team = Team(supervisor=manager)
result = team.run("Write a 1000-word article about sustainable packaging trends in 2026")

A few things to notice here. Every worker has an output_schema. This is huge. Instead of hoping your agent returns something parseable, OpenClaw enforces structured output at each handoff. The supervisor knows exactly what shape of data it's getting back from each worker.

The strategy parameter on the supervisor controls execution flow. "sequential" runs workers one after another (research → write → edit). "parallel" fans out to multiple workers simultaneously. "adaptive" lets the supervisor's LLM decide the execution order dynamically based on the task.

Solving the Context Explosion Problem

The number one complaint I see about multi-agent systems — and this comes up constantly in every developer community I follow — is context blowup. Agent A generates 3,000 tokens. Agent B generates 4,000. By the time you're three levels deep, your supervisor is drowning in 20,000+ tokens of context it doesn't need.

OpenClaw handles this with Channel compression. When data passes between agents, you can define compression rules:

research_to_writer = Channel(
    source="researcher",
    target="writer",
    compression="summarize",  # options: "none", "summarize", "extract", "custom"
    max_tokens=1500,
    preserve_fields=["sources"]  # always pass these through uncompressed
)

writer_to_editor = Channel(
    source="writer",
    target="editor",
    compression="none",  # editor needs the full draft
    include_upstream=["researcher.sources"]  # also pass original sources for fact-checking
)

team = Team(
    supervisor=manager,
    channels=[research_to_writer, writer_to_editor]
)

The "summarize" compression mode uses a lightweight summarization pass (this is a fast, cheap call — not your primary model) to condense the output before passing it to the next agent. The preserve_fields option lets you exempt specific data from compression. And include_upstream lets you pull data from earlier in the chain without passing it through every intermediate step.

This alone saves a fortune in token costs and dramatically reduces the "confused agent" problem where a worker gets overwhelmed by irrelevant context from upstream.

Error Handling That Actually Works

Here's what happens in most multi-agent setups when a sub-agent fails: everything fails. The supervisor gets an error or gibberish response, doesn't know what to do with it, hallucinates a recovery, and the whole thing derails.

OpenClaw's approach is explicit retry and fallback logic at the orchestration level:

researcher = Worker(
    name="researcher",
    system_prompt="...",
    tools=["web_search", "document_reader"],
    output_schema={...},
    retry_policy={
        "max_retries": 3,
        "backoff": "exponential",
        "fallback": "return_partial",  # or "skip", "escalate", "use_cache"
        "validation": lambda output: output["confidence"] > 0.5
    }
)

That validation function is the key. After every sub-agent call, OpenClaw runs the output through your validation logic. If the researcher returns findings with a confidence score below 0.5, it automatically retries. After three failed attempts, the fallback strategy kicks in — in this case, returning whatever partial results were gathered so the pipeline can continue with degraded but non-zero input.

You can also set "fallback": "escalate" which sends the failure back to the supervisor with context about what went wrong, letting the supervisor LLM decide how to adapt. This is surprisingly effective — supervisors can often rephrase the request or try a different approach.

The Supervisor Pattern That Actually Scales

After building dozens of agent teams, here's the pattern I keep coming back to. I call it "constrained adaptive" — the supervisor has flexibility in how it executes, but the workers are tightly locked down:

manager = Supervisor(
    name="project_lead",
    system_prompt="""You coordinate a team of specialists. For each 
    incoming task:
    1. Analyze what's needed
    2. Create a plan (which workers, what order, what inputs)
    3. Execute the plan
    4. Validate results meet the original request
    5. If not, revise the plan and re-execute specific steps
    
    You may call workers multiple times. You may skip workers if 
    they're not needed. Always validate before returning.""",
    workers=[researcher, writer, editor, fact_checker],
    strategy="adaptive",
    max_iterations=5,  # prevent infinite loops
    planning_output_schema={
        "steps": "list[dict]",
        "reasoning": "str"
    }
)

The max_iterations guard is critical. Without it, adaptive supervisors can get caught in loops — "the editor rejected the draft, send it back to the writer, the writer produces something similar, the editor rejects again, forever." Five iterations is usually plenty. If your team can't produce acceptable output in five rounds, the problem is your prompts or your task decomposition, not the iteration count.

Notice also the planning_output_schema. This forces the supervisor to externalize its plan as structured data before executing. This is huge for debugging — you can see exactly what the supervisor intended, not just what happened.

Running Agents in Parallel

Some tasks naturally parallelize. If you need to research three different subtopics, there's no reason to run them sequentially:

manager = Supervisor(
    name="parallel_researcher",
    system_prompt="""You coordinate parallel research tasks. Break the 
    topic into independent subtopics and research them simultaneously. 
    Then synthesize findings.""",
    workers=[researcher_1, researcher_2, researcher_3],
    strategy="parallel",
    merge_strategy="concatenate_and_summarize"
)

OpenClaw handles the fan-out and fan-in automatically. The merge_strategy determines how parallel results get combined before the supervisor processes them. Options include "concatenate" (just smash them together), "concatenate_and_summarize" (combine then compress), and "supervisor_merge" (let the supervisor LLM decide how to synthesize).

Parallel execution typically cuts wall-clock time by 60-70% for tasks with three or more independent sub-tasks. The cost is the same (same number of API calls) but the latency improvement is massive.

Observability: Seeing What Your Agents Are Doing

You can't debug what you can't see. OpenClaw includes built-in tracing for every agent interaction:

team = Team(
    supervisor=manager,
    channels=[...],
    tracing={
        "enabled": True,
        "level": "detailed",  # "summary", "detailed", "debug"
        "output": "console"   # or "file", "webhook", "dashboard"
    }
)

result = team.run("Analyze competitor pricing strategies")

# After execution, inspect the full trace
for step in result.trace:
    print(f"[{step.agent}] {step.action}: {step.summary}")
    print(f"  Tokens: {step.tokens_used} | Latency: {step.latency_ms}ms")
    print(f"  Input compressed: {step.input_tokens_original} → {step.input_tokens_compressed}")

This gives you a complete picture: which agent ran, what it received, what it produced, how many tokens it burned, and how long it took. When something goes wrong, you can trace the exact point of failure instead of staring at a final bad output and guessing.

Where to Start

If this all makes sense but you're wondering about the practical "how do I actually start building" part, the best move is grabbing Felix's OpenClaw Starter Pack. It includes pre-configured team templates for the most common orchestration patterns — supervisor-worker, parallel fan-out, hierarchical multi-level — along with working examples that you can modify rather than building from scratch.

I'm a big believer in starting from something that works and modifying it, rather than staring at a blank file. The starter pack gives you that foundation plus some patterns I haven't covered here (like human-in-the-loop handoffs and conditional branching based on worker output).

Patterns I'd Avoid

A few anti-patterns I've seen people fall into with sub-agent orchestration:

Don't let agents spawn agents dynamically. It sounds cool. It's a nightmare. Every agent in your system should be predefined. The supervisor can choose which ones to call, but the roster should be fixed.

Don't share tools across agents without good reason. If your researcher and your writer both have access to web search, you're going to get confused behaviors. Give each agent the minimum toolset it needs and nothing more.

Don't skip output schemas. "I'll just let the agent return freeform text" is how you get cascading failures. Structured output at every boundary is the single highest-ROI practice in multi-agent systems.

Don't build a multi-agent system when a single agent with good tools would work. Seriously. If your task doesn't naturally decompose into distinct specialist roles, you're adding complexity for no benefit. Two or three focused agents can be transformative. Seven agents in a complex hierarchy for a task that doesn't need it is just expensive and slow.

What to Build Next

If you've followed along this far, here's what I'd do:

Pick one workflow you currently handle manually or with a single overloaded agent.
Identify the natural specialist roles — usually there are 2-4 distinct functions.
Build the workers first with tight prompts and strict output schemas. Test them individually.
Add the supervisor last once you know each worker is reliable on its own.
Add channels and compression to optimize token usage.
Turn on tracing and actually look at what's happening.

Start with strategy="sequential" until you're confident in the flow, then experiment with "adaptive" once the basics are solid.

The gap between "agents that demo well" and "agents that work in production" is almost entirely about the orchestration layer. Get that right, and agent teams go from a party trick to something that genuinely saves you hours every week.

Go build something.