Bundling Multiple Skills: How to Create Skill Packs
Bundling Multiple Skills: How to Create Skill Packs

Let's talk about the moment every OpenClaw builder hits a wall.
You've got five or six skills running individually. Maybe a web scraper, a summarizer, a data formatter, something that pings an API, and a little skill that writes clean output to a file. Each one works beautifully in isolation. You test them, they pass, you feel like a genius.
Then you try to make them work together — as a single, coherent agent workflow — and everything falls apart.
The scraper passes raw HTML to the summarizer, which chokes. The formatter expects clean JSON but gets a string with escaped quotes jammed inside it. The API skill fires before the data it depends on is even ready. And somewhere in the chaos, your agent enters a retry loop that burns through tokens like a bonfire while accomplishing exactly nothing.
This is the bundling problem. And it's not a you problem — it's arguably the single most common frustration in the entire AI agent space right now. If you've been lurking in any community where people build autonomous workflows, you've seen the posts: "My agent keeps calling the wrong tool," "Stuck in infinite retry loops," "Works with 3 skills, breaks with 8."
OpenClaw gives you the primitives to solve this properly. But you need to understand how to bundle skills the right way, because just throwing them into a single agent config and hoping the LLM figures it out is a recipe for wasted time and wasted money.
Here's how to actually do it.
Why Individual Skills Work but Bundles Don't
First, let's understand the failure mode, because once you see the pattern you can't unsee it.
When you give an LLM a single skill — one tool, one clear description, one set of parameters — it performs extremely well. The context is tight. There's no ambiguity about which tool to use (there's only one). The parameter schema is easy to follow because there's nothing competing for attention.
The moment you start bundling, you introduce three problems simultaneously:
Selection ambiguity. With multiple skills available, the model has to decide which one to call at each step. Research across every major agent framework consistently shows that once you cross roughly 8–12 bundled tools, models start making poor selections. They'll default to whichever skill has the most generic-sounding description (usually web search) and ignore your carefully built specialized tools.
Parameter bleed. When multiple skills have overlapping parameter names — query, input, url, text — the model starts mixing them up. It'll pass the scraper's URL into the summarizer's text field, or stuff a search query where a file path should go.
State loss between steps. Chained skills need to pass outputs to inputs cleanly. But without explicit management, intermediate results get mangled, truncated, or simply forgotten. The agent "loses track" of what it already did and either repeats work or passes garbage downstream.
OpenClaw's skill pack system is designed to address all three of these. But you have to use it deliberately.
The Anatomy of an OpenClaw Skill Pack
A skill pack in OpenClaw isn't just "a folder with multiple skills in it." It's a structured bundle that defines three critical things:
- The skills themselves (what each one does)
- The routing logic (when each skill should be called)
- The data contracts (what gets passed between skills, and in what format)
Here's a basic skill pack configuration:
skill_pack:
name: "research-and-report"
version: "1.0"
description: "Researches a topic via web search, extracts key data, and produces a formatted report."
skills:
- id: web_search
description: "Searches the web for current information on a given topic. Use this FIRST for any research query."
input_schema:
query:
type: string
required: true
description: "The search query. Be specific and include key terms."
output_schema:
results:
type: array
items:
type: object
properties:
title: { type: string }
url: { type: string }
snippet: { type: string }
- id: page_scraper
description: "Fetches and extracts clean text from a URL. Only use this AFTER web_search has returned URLs."
input_schema:
url:
type: string
required: true
description: "A valid URL returned from web_search results."
output_schema:
content:
type: string
description: "Clean extracted text from the page."
- id: report_writer
description: "Writes a structured report from extracted content. Use this LAST, after all research is gathered."
input_schema:
sources:
type: array
items:
type: object
properties:
content: { type: string }
source_url: { type: string }
report_format:
type: string
enum: ["summary", "detailed", "bullet_points"]
default: "summary"
output_schema:
report:
type: string
citations:
type: array
items: { type: string }
routing:
strategy: "sequential_with_branching"
default_flow:
- web_search
- page_scraper
- report_writer
allow_skip: false
max_retries_per_skill: 2
error_handling:
on_skill_failure: "skip_and_note"
on_parse_failure: "reformat_and_retry"
max_total_retries: 5
fallback: "return_partial_results"
Let's break down what makes this different from just listing three tools in a prompt.
The Routing Strategy Is Everything
Notice the routing section. This is where most people's bundles go from "kind of works" to "actually reliable."
The sequential_with_branching strategy tells OpenClaw that these skills should generally be called in order — search first, then scrape, then write — but allows the agent to branch if needed (for example, scraping multiple URLs from a single search).
Without explicit routing, here's what actually happens: the model sees three tools, decides the report writer sounds easiest, and tries to write a report with no data. Or it calls the scraper first with a made-up URL. Or it searches, gets results, then searches again instead of scraping. I've seen all of these in the wild. Repeatedly.
OpenClaw supports several routing strategies:
sequential— Skills must be called in the defined order. Period. No skipping, no reordering.sequential_with_branching— Skills follow a default order but can be called multiple times or in parallel within a step (e.g., scraping 5 URLs simultaneously).semantic— The agent chooses skills based on relevance to the current sub-task, but constrained by the descriptions and input requirements you've defined.graph— Full explicit control flow where you define exactly which skills can follow which, with conditional transitions.
For most bundles of 3–6 skills, sequential_with_branching is the sweet spot. It gives the model enough freedom to handle variation without letting it go off the rails.
For complex bundles (7+ skills, multiple possible paths), use graph. It's more work to set up but eliminates the selection ambiguity problem almost entirely.
routing:
strategy: "graph"
transitions:
- from: "start"
to: ["web_search", "database_lookup"]
condition: "Choose based on whether the query needs current web data or historical records."
- from: "web_search"
to: ["page_scraper"]
- from: "database_lookup"
to: ["data_formatter"]
- from: "page_scraper"
to: ["report_writer", "page_scraper"]
condition: "Continue scraping if more URLs remain, otherwise move to report writing."
- from: "data_formatter"
to: ["report_writer"]
- from: "report_writer"
to: ["end"]
This is essentially a state machine, and it's far more reliable than hoping the LLM will figure out the optimal path through your skill set.
Data Contracts Prevent the Garbage-In Problem
The input_schema and output_schema definitions aren't just documentation — they're enforcement mechanisms.
When you define that page_scraper expects a url of type string that is required, OpenClaw validates the input before the skill executes. If the model tries to pass an empty string, or an array, or the wrong field entirely, the validation catches it before you waste a tool call.
This is huge. One of the most common complaints across every agent framework is skills receiving malformed inputs and either failing silently (returning empty results that poison downstream steps) or crashing loudly (killing the entire run).
Here's a pro tip that took me longer than I'd like to admit to figure out: make your descriptions absurdly specific about what the input should look like.
Bad:
description: "The URL to scrape."
Good:
description: "A valid URL returned from web_search results. Must start with http:// or https://. Do not pass search queries or partial URLs here."
That extra specificity in the description dramatically reduces parameter bleed. The model reads those descriptions as part of its context, and explicit instructions about what not to pass are surprisingly effective at preventing the most common errors.
Error Handling That Actually Handles Errors
The error_handling config in the skill pack is what prevents the infinite retry death spiral.
error_handling:
on_skill_failure: "skip_and_note"
on_parse_failure: "reformat_and_retry"
max_total_retries: 5
fallback: "return_partial_results"
Let's talk about each option:
skip_and_note means if a skill fails (API timeout, rate limit, unexpected response), the agent skips it, notes what happened, and continues to the next step. The report writer will get a note saying "page_scraper failed for URL X — content unavailable" and can write around the gap.
reformat_and_retry is specifically for parse failures — when the model generated bad JSON or wrong parameter types. Instead of retrying the exact same call, OpenClaw feeds the error back to the model with instructions to fix the formatting. This works remarkably well and usually resolves the issue in one retry.
max_total_retries: 5 puts a hard ceiling on total retries across the entire run. This is your financial circuit breaker. No more $50 retry spirals.
return_partial_results as the fallback means if the whole bundle can't complete, you still get whatever it managed to produce. A partial report with three sources instead of five is infinitely more useful than a crash with nothing.
Compare this to the default behavior in most frameworks: retry the same failing call forever, or crash and lose everything. Neither is acceptable for anything you'd actually put in production.
Building Your First Skill Pack Step by Step
Enough theory. Here's the practical workflow:
Step 1: Build and test each skill individually. Don't even think about bundling until each skill works perfectly on its own with hardcoded inputs. Pass it a known-good input, verify the output format is exactly what you expect.
Step 2: Define your data flow on paper. Literally sketch out: Skill A produces X, which feeds into Skill B as Y, which produces Z for Skill C. If you can't draw this clearly, your bundle won't work.
Step 3: Write strict schemas. Every input and output should have a type, a description, and required/optional designation. Don't leave anything implicit.
Step 4: Choose the simplest routing strategy that works. Start with sequential. Only upgrade to sequential_with_branching or graph if you have a concrete reason.
Step 5: Add error handling before you test. Not after. You will hit errors on the first run. Having handling in place from the start means you get useful debugging information instead of cryptic crashes.
Step 6: Test with verbose logging on. OpenClaw's trace output shows you every skill call, every parameter passed, every routing decision. Read the traces. They'll tell you exactly where things go wrong.
openclaw run --skill-pack ./research-and-report --input "Analyze the current state of AI regulation in the EU" --verbose
Keeping Your Skill Count Manageable
Here's an opinionated take based on real-world usage: keep your skill packs to 5 skills or fewer.
If you need more than 5, you almost certainly need two or more skill packs that can be composed, not one mega-bundle. A "research pack" that feeds into a "report writing pack" is more reliable than a single 10-skill monolith.
OpenClaw lets you compose packs:
workflow:
name: "full-research-pipeline"
steps:
- skill_pack: "research-pack"
output_key: "research_data"
- skill_pack: "report-pack"
input_from: "research_data"
This gives you modularity. The research pack is tested and stable. The report pack is tested and stable. The interface between them is a single, well-defined data contract. If something breaks, you know immediately which pack to investigate.
The Shortcut Worth Taking
If all of this sounds like a lot of configuration to get right — the schemas, the routing, the error handling, the testing — you're not wrong. It is. And getting it wrong in subtle ways leads to agents that work 70% of the time, which is somehow worse than agents that work 0% of the time (because you start trusting them).
This is where I'd genuinely point you toward Felix's OpenClaw Starter Pack. It's $29 on Claw Mart and includes pre-configured skill packs with the routing, schemas, and error handling already set up and tested. If you don't want to build all of this from scratch — especially the tricky parts like data contracts between skills and the graph routing configs — it's a real time-saver.
I spent a few weeks figuring out the patterns I described in this post by trial and error. The starter pack essentially encapsulates those patterns into ready-to-use bundles you can deploy immediately or modify to fit your specific use case. For thirty bucks, the time savings alone makes it worth it, not to mention avoiding the token costs of debugging broken bundles.
Where to Go From Here
Once you've got a working skill pack — whether you built it from scratch or started with a pre-built template — the next steps are:
- Add observability. Track which skills get called most, which fail most, and where latency accumulates. Optimize from data, not hunches.
- Version your packs. When you update a skill's schema, bump the version. This is basic hygiene that prevents "it was working yesterday" mysteries.
- Write integration tests. Feed known inputs through the full pack and assert on the output structure. Automate this. Run it before every deployment.
- Gradually increase complexity. Add one skill at a time to a working pack. Test after each addition. The moment reliability drops, you've found your complexity ceiling.
Bundling skills in OpenClaw isn't inherently hard. What's hard is doing it without structure — throwing tools at an LLM and expecting it to orchestrate them perfectly. The skill pack system gives you the structure. Use it deliberately, keep your bundles focused, enforce your data contracts, and you'll build agents that actually work consistently instead of working "most of the time, except when they don't, which is always in production."
That's the whole game. Go build something.