Understanding OpenClaw Workspace: AGENTS.md vs SOUL.md

Most people building agents on OpenClaw hit the same wall within their first week: they open up their workspace, see two markdown files — AGENTS.md and SOUL.md — and have no idea which one to edit, what each one actually controls, or why both exist in the first place.

Then they do what most of us do. They dump everything into AGENTS.md, write a sprawling system prompt full of instructions, tool definitions, and personality notes all mashed together, and wonder why their agent starts looping, ignoring tools, or acting like it has amnesia after four turns.

I've been there. And after spending months building and breaking agents on OpenClaw, I can tell you the distinction between these two files is not cosmetic. It's the single most important architectural decision in your workspace, and getting it wrong is the root cause of about 80% of the agent behavior problems people complain about.

Let me break down exactly what each file does, how they interact, and how to configure them so your agents actually work.

The Core Problem: Everything in One Prompt Doesn't Scale

If you've built agents on other platforms before coming to OpenClaw, you're used to a single system prompt doing all the heavy lifting. You cram identity, instructions, tool usage rules, output formatting, guardrails, and personality into one big blob of text. It works okay for simple chatbots. It falls apart completely for agents that need to make decisions, use tools, maintain context across long sessions, and coordinate with other agents.

Here's why: a single monolithic prompt creates competing priorities. When you tell an agent "You are a friendly research assistant who always uses the search tool before answering, never makes up facts, responds in bullet points, and maintains a professional tone," you're giving it six different objectives with no hierarchy. The model has to figure out which instruction matters most in any given moment. Sometimes it prioritizes friendliness over accuracy. Sometimes it formats beautifully but hallucinates the content. Sometimes it calls the search tool seventeen times in a row because "always use the search tool" has no nuance.

OpenClaw solves this by splitting agent configuration into two distinct layers. And once you understand the split, everything clicks.

SOUL.md: The Constitution

SOUL.md is your agent's identity layer. Think of it as the thing that doesn't change regardless of what task the agent is performing, what tools are available, or what conversation it's in the middle of.

This file defines:

Core values and principles — What does this agent care about? What does it refuse to do? What does it prioritize when instructions conflict?
Reasoning style — Is this agent methodical and step-by-step? Fast and intuitive? Does it think out loud or deliver conclusions directly?
Behavioral boundaries — Hard limits that should never be violated, regardless of user requests or task context.
Self-awareness rules — How the agent handles uncertainty, loops, or situations where it doesn't know the answer.

Here's a real example of a SOUL.md I use for a research agent:

# Soul

## Identity
I am a research analyst. My purpose is to find accurate, verifiable information 
and present it clearly. I do not generate speculative content unless explicitly 
asked and clearly labeled as speculation.

## Core Principles
1. Accuracy over speed. I will take additional steps to verify information 
   rather than provide a fast but uncertain answer.
2. Source attribution is non-negotiable. Every factual claim must reference 
   where it came from.
3. If I don't know something and cannot find it, I say so directly. 
   I never fabricate sources or data.

## Self-Monitoring
- If I notice I'm repeating the same action more than twice with the same 
  input, I must stop and reassess my approach.
- If my context is getting long and I'm losing track of earlier findings, 
  I summarize what I know before continuing.
- I never assume a tool call succeeded without checking the result.

## Boundaries
- I do not provide medical, legal, or financial advice as fact.
- I do not execute actions that modify external systems without explicit 
  user confirmation.

Notice what's not in there: no specific tools, no task instructions, no output formatting, no workflow steps. The soul is tool-agnostic and task-agnostic. It's who the agent is, not what it's doing right now.

This is the layer that persists. When your agent shifts from researching a company to summarizing its findings to answering follow-up questions, the soul stays constant. That consistency is what prevents the erratic behavior that drives people crazy — the agent that's helpful on turn one, rogue on turn five, and catatonic on turn ten.

AGENTS.md: The Operational Layer

AGENTS.md is where you define what the agent actually does. This is the tactical, task-specific configuration:

Available tools and how to use them — Not just listing them, but providing guidance on when to choose one tool over another.
Workflow patterns — Step-by-step procedures for common tasks.
Tool-calling conventions — Parameter formatting, error handling, retry logic.
Coordination rules — If you're running multi-agent setups, how agents hand off work or share context.
Output specifications — Format, structure, length expectations for deliverables.

Here's the AGENTS.md for that same research agent:

# Agent Configuration

## Tools

### web_search
- Use for current events, recent data, and anything not in your training data.
- Construct specific, targeted queries. Never search for vague terms.
- Always review the full search result before acting on it.
- If results are insufficient, reformulate the query with different terms 
  before trying again. Maximum 3 attempts per topic.

### document_reader
- Use when given a URL or file reference to extract content.
- For long documents, request specific sections rather than full content 
  when possible.
- Summarize extracted content before using it in your response.

### note_taker
- Use to store verified findings during multi-step research.
- Format: { "topic": "...", "finding": "...", "source": "...", "confidence": "high|medium|low" }
- Review stored notes before making final conclusions to ensure nothing is missed.

## Workflow: Research Task
1. Clarify the research question. If ambiguous, ask one targeted clarifying question.
2. Break the question into 2-4 sub-questions.
3. Research each sub-question using web_search. Store findings with note_taker.
4. Cross-reference findings. Flag contradictions.
5. Synthesize into a structured response with source citations.
6. End with confidence assessment and any gaps in the research.

## Output Format
- Use headers for major sections.
- Bullet points for individual findings.
- Source links inline, not in footnotes.
- Final section: "Confidence & Gaps" — what you're sure about, what needs more research.

## Error Handling
- Tool timeout: Wait, retry once. If it fails again, note the failure and continue 
  with available information.
- Contradictory sources: Present both, note the contradiction, assess which is 
  more credible and why.
- Insufficient information: State clearly what you couldn't find rather than 
  padding with filler.

See the difference? AGENTS.md is operational. It's instructions, procedures, and specifications. It changes based on what you need the agent to do. You might have one SOUL.md but multiple AGENTS.md configurations for different task types.

Why the Split Matters: Real Failure Modes

Let me walk through the three most common failures I see from people who don't use this split properly, and how the two-file architecture prevents them.

Failure 1: The Infinite Loop

This is the number one complaint in every AI agent community. The agent calls a tool, gets a result, decides it's not good enough, calls the same tool with the same parameters, gets the same result, and repeats until it burns through your token budget or hits a max iteration limit.

When everything's in one prompt, the agent has no stable self-monitoring layer. The instruction "search until you find a good answer" competes with every other instruction, and the model optimizes for the most recent or most prominent directive.

With OpenClaw's split: SOUL.md contains the self-monitoring rules ("if I'm repeating the same action more than twice, stop and reassess"). These carry more weight because they're in the identity layer, not buried in a list of operational instructions. AGENTS.md contains the specific retry limits ("maximum 3 attempts per topic"). You get both a philosophical guardrail and a hard tactical limit.

Failure 2: Role Drift

Your agent starts the conversation acting exactly as configured — professional, source-citing, methodical. By turn eight, it's making stuff up, cracking jokes, and ignoring its tools entirely. This happens because long context windows dilute prompt instructions. The further you get from the system prompt, the weaker its influence.

The soul layer in OpenClaw acts as a persistent anchor. Because it's architecturally separated from the operational instructions, the platform can re-inject or re-weight identity principles at key decision points without cluttering the operational context. Your agent stays itself even in long, complex sessions.

Failure 3: Tool Misuse and Parameter Hallucination

Agent has a perfectly defined tool schema. Calls the tool anyway with made-up parameters, or worse, tries to "simulate" the tool's output instead of actually calling it.

This happens when tool instructions are mixed in with personality and workflow guidance. The model can't clearly distinguish between "here's how you are" and "here's what you can do." AGENTS.md creates a clean separation: tools are defined in an operational context with explicit schemas, usage rules, and error handling. The agent can reference tool configuration without parsing through identity content.

Setting It Up: Step by Step

Here's how I recommend configuring a new OpenClaw workspace from scratch:

Step 1: Start with SOUL.md

Before you think about tools or tasks, define your agent's identity. Ask yourself:

What is this agent's primary purpose in one sentence?
What principles should it never violate?
How should it handle uncertainty?
How should it self-monitor for common failure modes?

Keep it short. My best-performing souls are 15-30 lines. If your SOUL.md is longer than a page, you're probably putting operational stuff in it.

Step 2: Define your tools in AGENTS.md

List every tool the agent has access to. For each one, include:

When to use it (and when not to)
Expected input format with an example
How to interpret the output
What to do when it fails

### calculator
- Use ONLY for mathematical operations. Do not attempt mental math.
- Input: { "expression": "string" } — e.g., { "expression": "1547 * 0.089" }
- Output: numerical result as float.
- If the expression is invalid, you'll get an error. Reformulate and retry once.
- Never use this for estimates or approximations — those are reasoning tasks, not calculations.

Step 3: Build workflows in AGENTS.md

For each major task type your agent handles, write a numbered workflow. Be specific. "Research the topic" is useless. "Break the topic into 2-4 sub-questions, research each using web_search, store findings with note_taker" is actionable.

Step 4: Test with adversarial inputs

Run your agent through scenarios designed to break it:

Vague requests that should trigger clarification
Requests that conflict with soul principles
Long conversations that test role persistence
Tool failures (deliberately give it a query that won't return good results)

Adjust SOUL.md and AGENTS.md based on where it breaks. Most tweaks will be in AGENTS.md. If the agent's character is wrong, edit the soul. If its behavior on a task is wrong, edit the agent config.

Multi-Agent Setups: Where This Really Shines

The soul/agent split becomes critical when you're running multiple agents that need to coordinate. In a typical setup — say, a researcher agent, a writer agent, and an editor agent — each one shares the same foundational principles (defined in a shared or similar SOUL.md) but has completely different operational configurations.

The researcher's AGENTS.md focuses on search tools, source evaluation, and data collection workflows. The writer's focuses on content structure, tone matching, and draft generation. The editor's focuses on fact-checking against the researcher's notes, style consistency, and revision procedures.

Because each agent has a stable identity separate from its task instructions, they don't step on each other's roles. The researcher doesn't start writing. The writer doesn't start editing. This is the problem that plagues every multi-agent framework out there, and the architectural separation in OpenClaw is one of the cleanest solutions I've seen.

Skip the Setup: Felix's OpenClaw Starter Pack

Here's the honest truth: configuring SOUL.md and AGENTS.md from scratch takes iteration. I burned a solid week getting my first production workspace dialed in, and I made every mistake described in this post along the way.

If you don't want to go through that trial-and-error process, Felix's OpenClaw Starter Pack on Claw Mart is the move. It's $29 and includes pre-configured soul and agent files for the most common agent patterns — research, writing, code generation, and multi-agent coordination. The configurations are already tested against the failure modes I described above (loops, role drift, tool misuse), and they include the self-monitoring rules and tool-calling conventions that take the most iteration to get right on your own.

I've looked at what's in the pack, and the soul configurations in particular are solid. They include loop detection principles, uncertainty handling, and boundary definitions that most people don't think to add until they've already burned a few thousand tokens debugging erratic behavior. It's a genuine time-saver, especially if you're new to OpenClaw and want to see what well-structured workspace files actually look like before customizing your own.

What to Do Next

Here's my recommended sequence:

Read your current SOUL.md and AGENTS.md. If you've been dumping everything into one file, identify what's identity vs. what's operational and split it.
Trim your SOUL.md. If it's more than 30 lines, it's too long. Extract the operational parts into AGENTS.md.
Add self-monitoring rules to SOUL.md. At minimum: loop detection, uncertainty acknowledgment, and a principle for when to stop and ask for help.
Add explicit tool guidance to AGENTS.md. Every tool should have a "when to use," "when not to use," input format, and failure handling.
Test adversarially. Break your agent on purpose. Then fix the file where the problem lives — soul for identity issues, agent for behavioral issues.

The SOUL.md / AGENTS.md split is not a gimmick. It's a genuine architectural pattern that addresses the deepest problems in agent design: consistency, reliability, and graceful failure. Once you internalize the distinction — identity vs. operations, who vs. what — you'll never go back to monolithic prompts.

Your agents will still surprise you sometimes. That's the nature of the work. But they'll surprise you a lot less, and when they do, you'll know exactly which file to open.