Building Custom Skills: Scripts, Assets & Tools
Building Custom Skills: Scripts, Assets & Tools

Let's cut straight to it: the hardest part of building anything useful with OpenClaw isn't getting the agent to run. It's getting your custom skills to actually work the way you imagined them in your head. You fire up a new project, write a skill, hook it up, and then watch in mild horror as the agent passes garbage arguments, ignores your carefully worded descriptions, or calls the wrong skill entirely. Sound familiar? Good. That means you're actually building things.
I've spent more time than I'd like to admit wrestling with custom skills in OpenClaw, and I've landed on a set of patterns, scripts, and asset configurations that reliably work. This post is everything I wish someone had told me on day one. We're going to cover how skills actually function under the hood, how to write ones that don't break, how to manage your assets and tooling, and how to avoid the most common traps that eat hours of your life.
Why Custom Skills Break (And Why It's Mostly Your Fault)
Here's the uncomfortable truth: when your OpenClaw agent calls the wrong skill or passes mangled arguments, the problem is almost never the platform. It's how you defined the skill.
OpenClaw uses a schema-driven approach to expose skills to the agent. The agent reads your skill's name, description, and parameter schema, then decides when and how to invoke it. That's it. There's no magic. The agent is making a decision based on the text you gave it. If that text is vague, ambiguous, or contradictory, you get vague, ambiguous, contradictory behavior.
The three most common failure modes I see:
1. Descriptions that sound good to humans but confuse the agent. A skill named process_data with the description "Processes the data" tells the agent nothing. Process what data? In what way? What does it return? The agent has to guess, and it will guess wrong.
2. Parameter schemas that are too loose. If you define a parameter as type: string when it should be an enum with three possible values, the agent will hallucinate values. Every time.
3. Too many skills loaded at once. This is the silent killer. Past about 10-12 skills in a single agent context, selection accuracy starts dropping noticeably. Past 20, it's a coin flip. OpenClaw gives you the tools to manage this, but most people just dump everything in and wonder why things get weird.
Let's fix all three.
Writing Skills That Actually Work
Here's the skeleton of a well-structured OpenClaw custom skill:
skill:
name: fetch_product_reviews
description: >
Retrieves customer reviews for a specific product from the database.
Use this when the user asks about reviews, ratings, feedback, or
customer opinions for a particular product. Returns a list of review
objects with author, rating (1-5), and text. Returns empty list if
no reviews exist for the given product_id.
parameters:
product_id:
type: string
description: "The unique product identifier (format: PRD-XXXXX)"
required: true
pattern: "^PRD-\\d{5}$"
sort_by:
type: string
description: "How to sort results"
enum: ["newest", "highest_rated", "lowest_rated"]
default: "newest"
limit:
type: integer
description: "Maximum number of reviews to return (1-50)"
minimum: 1
maximum: 50
default: 10
returns:
type: array
description: "List of review objects, each containing author (string), rating (integer 1-5), text (string), and date (ISO 8601)"
Let's break down what's happening and why every piece matters.
The name is specific and verb-noun. fetch_product_reviews not get_reviews or reviews or product_tool. The agent uses the name as a primary signal. Make it unambiguous.
The description does four things: (1) says what the skill does, (2) says when to use it with example trigger phrases, (3) says what it returns, and (4) says what happens in edge cases. This isn't a docstring for human developers. It's an instruction set for an AI. Write it like one.
Parameters are tightly constrained. Notice the pattern on product_id ā this gives the agent a format to follow. The enum on sort_by eliminates hallucination entirely for that field. The minimum/maximum on limit prevents absurd values. Every constraint you add is a guardrail that saves you from debugging later.
The returns block is declared explicitly. Many people skip this. Don't. The agent uses the return description to decide whether this skill's output will be useful for the current task. If you don't describe what comes back, the agent is flying blind on whether to use this skill at all.
The Script Layer: Where Logic Lives
The YAML definition is the skill's interface. The script is its implementation. Here's the corresponding script for the skill above:
def execute(params, context):
"""
Fetch product reviews from the database.
"""
product_id = params.get("product_id")
sort_by = params.get("sort_by", "newest")
limit = params.get("limit", 10)
# Validate product_id format as a safeguard
import re
if not re.match(r"^PRD-\d{5}$", product_id):
return {
"error": True,
"message": f"Invalid product_id format: {product_id}. Expected PRD-XXXXX."
}
# Query your data source
reviews = context.db.query(
"SELECT author, rating, review_text, created_at FROM reviews WHERE product_id = %s ORDER BY %s LIMIT %s",
[product_id, sort_by, limit]
)
# Return structured data, not raw query results
return {
"error": False,
"count": len(reviews),
"reviews": [
{
"author": r["author"],
"rating": r["rating"],
"text": r["review_text"],
"date": r["created_at"].isoformat()
}
for r in reviews
]
}
Key things to notice:
Double-validate in the script. Yes, the schema says product_id should match a pattern. Yes, you should check it again in the script. The schema catches most malformed calls, but defense in depth saves you when edge cases slip through.
Return structured error objects, not exceptions. If your script throws an unhandled exception, the agent gets a generic error message and has no idea what went wrong. If you return a clear {"error": True, "message": "..."} object, the agent can actually reason about the failure and try to fix it ā maybe by asking the user for a corrected product ID.
Use the context object. OpenClaw's context gives your skills access to shared state ā database connections, authentication tokens, previous skill outputs, user session data. This is how you pass complex objects between skills without the serialization nightmare that plagues other frameworks. Don't reinvent this. Use it.
Managing Assets: The Part Everyone Forgets
Skills don't exist in a vacuum. They need assets: API keys, configuration files, prompt templates, reference data, model endpoints. OpenClaw's asset system lets you bundle these with your skills, but most people either ignore it (hardcoding everything) or over-engineer it (building custom config management on top of OpenClaw's).
Here's what a clean asset structure looks like:
project/
āāā skills/
ā āāā fetch_product_reviews.yaml
ā āāā fetch_product_reviews.py
ā āāā create_order.yaml
ā āāā create_order.py
ā āāā summarize_feedback.yaml
āāā assets/
ā āāā config.yaml
ā āāā prompts/
ā ā āāā review_summary_template.txt
ā ā āāā escalation_template.txt
ā āāā data/
ā āāā product_categories.json
āāā openclaw.config.yaml
Your openclaw.config.yaml ties it all together:
agent:
name: product-support-agent
model: default
max_skills: 8
skills:
- skills/fetch_product_reviews
- skills/create_order
- skills/summarize_feedback
assets:
config: assets/config.yaml
prompts_dir: assets/prompts/
data_dir: assets/data/
settings:
retry_on_error: true
max_retries: 2
validation: strict
The max_skills: 8 setting is intentional. I keep it low and use skill groups (more on that in a second) to manage larger toolsets. The validation: strict setting means OpenClaw will reject any skill call that doesn't match the schema before it hits your script. Turn this on. Always.
Skill Groups: Solving the "Too Many Tools" Problem
If your agent needs access to 25 different skills, don't load them all at once. Use OpenClaw's skill groups to organize them by domain and load them contextually:
skill_groups:
product_info:
description: "Skills for looking up product information, reviews, and availability"
skills:
- fetch_product_reviews
- check_inventory
- get_product_details
order_management:
description: "Skills for creating, modifying, and canceling orders"
skills:
- create_order
- modify_order
- cancel_order
- check_order_status
analytics:
description: "Skills for generating reports and analyzing data"
skills:
- summarize_feedback
- generate_sales_report
- trend_analysis
The agent first decides which group is relevant, then selects from the skills within that group. Instead of choosing from 25 skills, it's choosing from 3 groups, then from 3-4 skills. The accuracy improvement is dramatic ā I've seen it go from ~60% correct skill selection to north of 90% just by adding this layer.
Testing Skills Without Losing Your Mind
You cannot meaningfully test agent + skill behavior with traditional unit tests alone. The agent's decision to call a skill is non-deterministic. But you can test the deterministic parts, and you can build replay tests for the non-deterministic parts.
Test your scripts directly:
# test_fetch_reviews.py
from skills.fetch_product_reviews import execute
from unittest.mock import MagicMock
def test_valid_product_id():
context = MagicMock()
context.db.query.return_value = [
{"author": "Jane", "rating": 5, "review_text": "Great!", "created_at": datetime(2026, 1, 1)}
]
result = execute({"product_id": "PRD-00123", "sort_by": "newest", "limit": 5}, context)
assert result["error"] is False
assert result["count"] == 1
assert result["reviews"][0]["rating"] == 5
def test_invalid_product_id():
context = MagicMock()
result = execute({"product_id": "INVALID"}, context)
assert result["error"] is True
assert "Invalid product_id format" in result["message"]
Build replay scenarios for agent decisions:
# tests/scenarios/review_lookup.yaml
scenario: "User asks for product reviews"
input: "What are people saying about PRD-00123?"
expected:
skill_called: fetch_product_reviews
parameters:
product_id: "PRD-00123"
response_contains: "reviews"
Run these against your agent configuration regularly. They won't catch everything, but they'll catch regressions fast.
The Debugging Workflow That Actually Works
When a skill isn't working right, here's my exact debugging order:
-
Check the schema. Read your description as if you've never seen the skill before. Is it obvious when to use it? Are the parameters unambiguous?
-
Check the agent's reasoning. OpenClaw's trace logs show you the agent's internal decision process. Look at why it chose (or didn't choose) your skill.
-
Check the parameters received. Add logging to the top of your script. Are the params what you expected? If not, tighten your schema constraints.
-
Check the return value. Is your script returning something the agent can actually use? A wall of raw JSON is harder for the agent to process than a clean, summarized structure.
-
Check the skill count. If nothing else explains it, you probably have too many skills loaded. Remove unrelated ones or implement skill groups.
This order matters. Most people jump straight to step 3 or 4, but the problem is almost always in step 1.
Skip the Setup: Felix's OpenClaw Starter Pack
I spent weeks building out my first set of properly structured skills, asset configs, testing harnesses, and skill groups. If you don't want to set all of this up manually, Felix's OpenClaw Starter Pack on Claw Mart includes pre-built versions of everything I described here ā properly structured skill templates, a clean project scaffold, pre-configured skill groups, and testing scenarios you can adapt to your use case. It's $29 and it'll save you the trial-and-error phase entirely. I genuinely wish this existed when I started. It's not magic ā you'll still need to customize things ā but starting from a well-architected foundation versus starting from a blank file is a completely different experience.
Where to Go From Here
If you've read this far, here's what to do next:
If you're starting from scratch: Pick one skill. Just one. Define it using the YAML structure above with an obsessively detailed description. Write the script with validation and structured error returns. Test it in isolation. Get it working perfectly before adding a second skill.
If you have skills that aren't working: Run through the debugging workflow above, starting from step 1. I guarantee the description or schema is the problem in at least 80% of cases.
If you're scaling up: Implement skill groups before you hit 10 skills. Don't wait until things break. The architecture cost is minimal upfront and painful to retrofit.
If you want to move fast: Grab the Felix's OpenClaw Starter Pack and start modifying instead of creating from nothing.
The entire game with custom skills is reducing ambiguity. Every vague description, every loosely typed parameter, every missing return value annotation is an opportunity for the agent to misinterpret your intent. Treat your skill definitions like you're writing instructions for someone who is extremely smart, extremely literal, and has zero context about your project. Because that's exactly what you're doing.
Build tight. Build clear. The agent will follow.