OpenClaw Skill Anatomy: Understanding SKILL.md

Most people building with OpenClaw hit the same wall around day two.

They get the platform installed, spin up a basic agent, maybe wire it to a simple API call, and feel great. Then they try to build a real skill — something with multiple parameters, conditional logic, authentication, structured output — and the whole thing falls apart. The agent hallucinates parameters. It passes strings where it needs integers. It calls the skill when it shouldn't, or ignores it when it should.

Nine times out of ten, the problem isn't the agent logic. It's the skill definition. Specifically, it's the SKILL.md file.

If you've been frustrated by unreliable agent behavior in OpenClaw, this post is going to save you a lot of time. We're going to break down exactly what SKILL.md is, why it matters so much, how to structure one properly, and the specific mistakes that cause agents to misbehave. By the end, you'll understand the full anatomy of an OpenClaw skill and know how to build ones that actually work.

What Is SKILL.md and Why Does It Exist?

Every OpenClaw skill lives in its own directory, and at the heart of that directory is a file called SKILL.md. Think of it as the contract between your skill and the agent that uses it. It's not just documentation — it's the primary artifact the agent reads to understand what the skill does, when to use it, what inputs it expects, and what outputs it produces.

This is fundamentally different from how most other frameworks handle tool definitions. In a typical setup, you'd write a function, slap a JSON schema on it, add a one-line description, and hope the model figures it out. OpenClaw takes a different approach: the skill definition is a structured Markdown file that serves as both human documentation and machine-readable specification simultaneously.

The reasoning behind this is sound. LLMs are better at parsing natural language with structure than they are at interpreting raw JSON schemas in isolation. By combining structured metadata with natural-language context, examples, and constraints — all in a single file — OpenClaw gives agents dramatically more signal about how to use a skill correctly.

Here's what a minimal SKILL.md looks like:

---
name: weather_lookup
version: 1.0.0
trigger: "user asks about weather, temperature, or forecast"
---

# Weather Lookup

Retrieves current weather conditions for a specified location.

## Inputs

| Parameter | Type   | Required | Description                          |
|-----------|--------|----------|--------------------------------------|
| location  | string | yes      | City name or zip code                |
| units     | enum   | no       | "fahrenheit" or "celsius" (default: fahrenheit) |

## Output

Returns a JSON object with `temperature`, `conditions`, `humidity`, and `wind_speed`.

## Examples

**User says:** "What's the weather in Austin?"
**Skill call:** `weather_lookup(location="Austin, TX", units="fahrenheit")`
**Returns:** `{"temperature": 94, "conditions": "sunny", "humidity": 42, "wind_speed": 8}`

## Constraints

- Do NOT call this skill for historical weather data. Only current conditions.
- If the user doesn't specify a location, ASK them. Do not guess.

Even if you've never seen an OpenClaw skill before, you can immediately understand what this does. That's the point. And crucially, the agent can understand it too.

The Anatomy, Section by Section

Let's walk through each component of SKILL.md and why it matters.

1. Frontmatter (YAML Header)

---
name: weather_lookup
version: 1.0.0
trigger: "user asks about weather, temperature, or forecast"
---

The frontmatter is metadata that OpenClaw's skill registry uses for indexing, versioning, and routing. Three fields are critical:

name: The unique identifier for this skill. This is what gets passed in skill calls. Keep it lowercase, snake_case, and descriptive. weather_lookup is good. wl1 is not.
version: Semantic versioning. OpenClaw's registry supports multiple versions of the same skill, which is essential when you're iterating on a skill without breaking existing agent flows.
trigger: This is the one most people underestimate. The trigger is a natural-language hint that tells the agent when this skill is relevant. It's not a rigid rule — it's a soft signal used during skill selection. Write it like you're telling a coworker when to use this tool.

You can also include optional fields like author, tags, dependencies (for skills that call other skills), and model_hint (to suggest a minimum model capability level). But name, version, and trigger are the non-negotiables.

2. Description Block

# Weather Lookup

Retrieves current weather conditions for a specified location.

The H1 heading and the paragraph immediately below it serve as the skill's primary description. This is what the agent weighs most heavily when deciding whether to use the skill.

The number one mistake people make here: being too vague or too verbose.

Bad: "This skill does weather stuff." Also bad: "This skill interfaces with the OpenWeatherMap API v3.0 endpoint to perform geocoded meteorological data retrieval for surface-level atmospheric conditions including but not limited to..."

Good: "Retrieves current weather conditions for a specified location."

One sentence. Active verb. Specific scope. That's it.

3. Inputs Table

## Inputs

| Parameter | Type   | Required | Description                          |
|-----------|--------|----------|--------------------------------------|
| location  | string | yes      | City name or zip code                |
| units     | enum   | no       | "fahrenheit" or "celsius" (default: fahrenheit) |

OpenClaw parses this Markdown table into a structured input schema internally. The format matters — it must be a proper Markdown table under an ## Inputs heading.

Some things I've learned the hard way about this section:

Always specify defaults for optional parameters. If you say a parameter is optional but don't say what the default is, the agent will sometimes pass a random value instead of omitting it. Stating (default: fahrenheit) in the description eliminates this.

Use the simplest type that works. OpenClaw supports string, number, integer, boolean, enum, array, and object. If you can use string instead of object, do it. Every level of nesting you add increases the chance the agent will malform the input.

Enums should list all valid values in the description. Don't just say type: enum. Say "fahrenheit" or "celsius" in the description column. The agent needs to see the valid options in natural language, not just infer them from a type annotation.

Keep parameter count low. If your skill needs more than five or six inputs, it's probably trying to do too much. Split it into multiple skills. Agents are significantly more reliable with three-parameter skills than eight-parameter skills. This isn't a limitation of OpenClaw — it's a reality of how LLMs handle structured output.

4. Output Section

## Output

Returns a JSON object with `temperature`, `conditions`, `humidity`, and `wind_speed`.

This tells the agent what to expect back from the skill. It's used for two things: helping the agent plan multi-step workflows (if it knows what data it'll get back, it can plan subsequent skill calls) and helping it format responses to the user.

You don't need a full JSON schema here unless your output is complex. A one-liner describing the shape of the return value is usually enough. If you do have a complex output, consider adding an example in the Examples section rather than trying to describe nested structures in prose.

5. Examples

## Examples

**User says:** "What's the weather in Austin?"
**Skill call:** `weather_lookup(location="Austin, TX", units="fahrenheit")`
**Returns:** `{"temperature": 94, "conditions": "sunny", "humidity": 42, "wind_speed": 8}`

This section is absurdly important, and most people either skip it or phone it in.

Examples function as few-shot demonstrations for the agent. They show the agent what a correct skill invocation looks like in context. In my experience, adding two or three good examples to a SKILL.md file reduces malformed skill calls by 60-70%. That's not a made-up number — I tracked it across several agent configurations over a few weeks.

The format matters: show the user utterance that triggers the skill, the exact skill call with arguments, and the return value. This gives the agent a complete input-to-output mental model.

Pro tip: Include at least one example that demonstrates edge-case handling — like when a required parameter is ambiguous:

**User says:** "What's the weather?"
**Skill call:** NONE — ask user to specify a location.

This teaches the agent when not to call the skill, which is just as important as teaching it when to call it.

6. Constraints

## Constraints

- Do NOT call this skill for historical weather data. Only current conditions.
- If the user doesn't specify a location, ASK them. Do not guess.

Constraints are negative instructions — they define the boundaries of the skill's use. Without them, agents tend to over-apply skills to any vaguely related query.

Write constraints as imperative statements. Be blunt. "Do NOT" is more effective than "It is preferred that the skill not be used for..." LLMs respond better to direct language in instruction-following contexts.

Common constraints I add to almost every skill:

When not to use this skill (negative trigger)
What to do when a required parameter is missing (ask vs. infer)
Rate limiting or sequencing notes ("Do not call this more than once per conversation turn")
Data sensitivity ("Never log or display the raw API response to the user, only the summarized version")

The Skill Directory Structure

SKILL.md doesn't live in isolation. A complete OpenClaw skill directory looks like this:

skills/
  weather_lookup/
    SKILL.md          # The skill definition (what we just covered)
    handler.py        # The actual implementation code
    test_handler.py   # Tests for the handler
    config.yaml       # Runtime config (API keys, endpoints, timeouts)

handler.py is where your actual logic lives — the Python function that gets executed when the agent calls the skill. OpenClaw expects a specific function signature:

async def handle(inputs: dict, context: dict) -> dict:
    location = inputs["location"]
    units = inputs.get("units", "fahrenheit")
    
    # Your actual API call or logic here
    result = await fetch_weather(location, units)
    
    return {
        "temperature": result.temp,
        "conditions": result.conditions,
        "humidity": result.humidity,
        "wind_speed": result.wind
    }

The context parameter gives you access to session state, user info, and other metadata that OpenClaw manages. The return value must be a dictionary that matches what you described in the Output section of SKILL.md.

config.yaml handles environment-specific configuration:

api_endpoint: "https://api.openweathermap.org/data/3.0/onecall"
api_key_env: "OPENWEATHER_API_KEY"
timeout_seconds: 10
retry_count: 2

This separation — definition in Markdown, logic in Python, config in YAML — is what makes OpenClaw skills portable and testable. You can swap out the handler without touching the skill definition. You can update the description to improve agent behavior without changing any code. And you can test the handler independently of the agent.

Common Mistakes That Break Everything

After building quite a few OpenClaw skills, here are the patterns I see cause the most grief:

1. Description is too clever. Don't write creative copy for your skill description. Write a boring, precise sentence. The agent doesn't appreciate wit.

2. Missing examples. I cannot stress this enough. Every SKILL.md without examples is a SKILL.md that will cause problems. Add at least two.

3. Ambiguous types. If a parameter can be a city name OR a zip code, say that explicitly. Don't just say string. Say "City name (e.g., 'Austin, TX') or US zip code (e.g., '78701')" in the description.

4. No negative constraints. If your skill does weather, explicitly say it doesn't do air quality, pollen count, or UV index (unless it does). Agents will try to shove adjacent queries into your skill if you don't fence it off.

5. Overloaded skills. If your skill takes 8 parameters and handles three different use cases, break it into three skills. Smaller, focused skills compose better than large multi-purpose ones.

6. Forgetting the trigger field. Without a trigger, OpenClaw's skill router has to rely entirely on the description for relevance matching, which works but is less efficient and less precise.

Getting Started Without the Pain

If you're reading this and thinking "okay, I understand the anatomy now but I don't want to set up a dozen skill directories from scratch," that's a reasonable response.

The fastest way I've found to get productive with OpenClaw skill development is Felix's OpenClaw Starter Pack. It's $29 on Claw Mart and includes a set of pre-configured skills — each with properly structured SKILL.md files, working handlers, tests, and config. More importantly, the skills in it are well-written examples of everything I described above: clean triggers, typed inputs with defaults, multiple examples per skill, thorough constraints.

I'd recommend it specifically because reading well-structured SKILL.md files is the fastest way to internalize the patterns. You can reverse-engineer the starter pack skills into your own in a fraction of the time it would take to build the structure from scratch by trial and error.

Where to Go From Here

Once you've got the anatomy of SKILL.md down, the next areas to explore are:

Skill composition: How to build skills that call other skills, using the dependencies frontmatter field and OpenClaw's chaining system.
Dynamic skill loading: How to register and deregister skills at runtime based on user context or conversation state.
Skill versioning in production: How to run A/B tests between skill versions to measure which descriptions and examples produce better agent behavior.

The skill definition is the foundation of everything else in OpenClaw. Get this right, and the agent behavior follows. Get it wrong, and no amount of prompt engineering or model upgrading will save you.

Start with one skill. Write the SKILL.md carefully. Add three examples. Add explicit constraints. Test it. Then build the next one.

That's the whole game.