
Anthropic -- Claude API Integration Expert
SkillSkill
Your Claude API expert that optimizes prompts, manages tokens, and builds production AI applications.
About
name: anthropic description: > Implement Claude API patterns, tool use, streaming, vision, and batch processing. USE WHEN: User needs to implement Claude API, tool use, streaming, vision, batch processing, or prompt engineering. DON'T USE WHEN: User needs OpenAI-specific features. Use OpenAI for GPT models. Use Vector for embedding pipelines. OUTPUTS: API integrations, tool use schemas, streaming handlers, vision pipelines, batch processors, prompt templates, cost optimizations. version: 1.0.0 author: SpookyJuice tags: [anthropic, claude, api, tool-use, streaming, vision, ai] price: 19 author_url: "https://www.shopclawmart.com" support: "brian@gorzelic.net" license: proprietary osps_version: "0.1" content_hash: "sha256:bdd36fea7c4f3cd919abf4924a1e9804f0f4c154fdc26441fb6dee107bfbbeae"
# Anthropic
Version: 1.0.0 Price: $19 Type: Skill
Description
Production-grade Claude API integration for tool use orchestration, streaming, vision pipelines, and batch processing. Anthropic's API is deceptively simple on the surface — a single Messages endpoint — but the real complexity lives in the details: token counting that doesn't match tiktoken, tool use schemas that reject valid JSON Schema features, streaming events that interleave content blocks with tool calls in ways the docs don't fully illustrate, and image token costs that scale quadratically with resolution. The pain points aren't making a call — it's building reliable infrastructure around it: prompt caching that invalidates when you least expect it, retry logic that respects rate limits without dropping messages, model selection trade-offs between Haiku's speed, Sonnet's balance, and Opus's reasoning depth, and batch pipelines that cut costs 50% while handling partial failures gracefully.
Prerequisites
- Anthropic API key with appropriate tier and rate limits
- SDK installed:
pip install anthropic(Python) ornpm install @anthropic-ai/sdk(Node.js) - Python 3.9+ or Node.js 18+ (official SDK support)
- For vision: images as base64 or publicly accessible URLs
- For batch API: understanding of your throughput requirements and cost targets
Setup
- Copy
SKILL.mdinto your OpenClaw skills directory - Set environment variables:
export ANTHROPIC_API_KEY="sk-ant-..." - Install the SDK:
pip install anthropicornpm install @anthropic-ai/sdk - Reload OpenClaw
Commands
- "Implement tool use for [capability]"
- "Build a streaming chat interface for [use case]"
- "Create a vision pipeline to analyze [image type]"
- "Optimize my Claude API costs for [workload]"
- "Set up batch processing for [task]"
- "Design a multi-turn conversation manager for [application]"
- "Debug this Anthropic API error: [error]"
Workflow
Messages API Integration
- Model selection — choose based on task complexity, latency, and cost: Claude Opus (deepest reasoning, complex analysis, $15/$75 per 1M tokens), Claude Sonnet (best balance of quality and speed, $3/$15 per 1M tokens), Claude Haiku (fastest response, high-volume tasks, $0.25/$1.25 per 1M tokens). Start with Sonnet and move to Haiku for latency-sensitive paths or Opus for tasks where Sonnet's reasoning falls short.
- System prompt design — the system prompt is a separate top-level parameter, not a message in the conversation array. Structure it with: role definition, behavioral constraints, output format specifications, and domain knowledge. System prompts are cached automatically after the first request when using prompt caching, so front-load stable instructions and keep variable content in user messages.
- Multi-turn conversation management — maintain the full message array client-side, alternating
userandassistantroles strictly. The API rejects consecutive same-role messages. Implement conversation pruning: summarize older turns into a single assistant message, drop tool-use turns that are no longer relevant, and always keep the system prompt and the most recent 3-5 exchanges intact. - Token counting and budgeting — use the
anthropic.count_tokens()method (Python SDK) to count tokens before sending. Budget each request: system prompt (fixed overhead), conversation history (growing cost), user input (variable), andmax_tokensfor the response. The API returnsusage.input_tokensandusage.output_tokensin every response — log these for cost tracking and alerting. - Error handling and retries — implement exponential backoff with jitter for 429 (rate limit) and 529 (overloaded) errors. The
Retry-Afterheader tells you exactly how long to wait — respect it. For 400 errors, check the error message: token limit exceeded requires context pruning, not retries. Wrap the SDK client with a retry decorator that distinguishes transient from permanent failures. - Response metadata — every API response includes
stop_reason(end_turn,max_tokens,stop_sequence,tool_use),usagestats, and themodelthat actually served the request. Checkstop_reason— if it'smax_tokens, the response was truncated and you may need a continuation request. Log the model field to verify you're getting the model you requested.
Tool Use Orchestration
- Tool schema design — define tools with
name,description, andinput_schema(JSON Schema). The description is critical — Claude uses it to decide when and how to call the tool. Be specific about what the tool does, what it returns, and when it should NOT be used. Keepinput_schematight: userequiredfields, adddescriptionto each property, and avoid open-endedobjecttypes without defined properties. - Input validation — Claude generates tool inputs based on the schema, but it can hallucinate plausible-looking values (fake IDs, incorrect enum values, impossible date ranges). Validate every tool input against your data before executing. Return clear error messages when validation fails — Claude uses these to correct its approach on the next turn.
- Agentic tool use loops — the standard loop: send messages with tools defined, check if
stop_reasonistool_use, extracttool_usecontent blocks, execute each tool, append tool results as ausermessage withtool_resultcontent blocks (matching each result to itstool_use_id), and send again. Continue untilstop_reasonisend_turn. Set a maximum iteration count (5-10) to prevent runaway loops. - Multi-tool coordination — Claude can request multiple tool calls in a single response. Execute them concurrently when they are independent (fetching data from two sources) but sequentially when one depends on another's output. Return all results in a single
usermessage. Design tools to be composable — small, focused tools that Claude can chain are more reliable than monolithic tools. - Error recovery in tool chains — when a tool call fails mid-chain, return the error as a
tool_resultwithis_error: true. Claude will typically retry with corrected parameters, try an alternative tool, or explain the failure to the user. Never silently swallow tool errors — Claude needs the feedback to reason about next steps. - Confirmation patterns for destructive actions — for tools that modify state (delete, send, purchase), implement a preview-then-execute pattern. The first tool call returns a summary of what will happen, Claude presents it to the user, and a second tool call with a confirmation token executes the action. This prevents Claude from executing destructive operations without explicit user consent.
Streaming Implementation
- SSE stream setup — use
client.messages.stream()(Python) orclient.messages.stream()(Node.js) which returns an async iterator of server-sent events. The stream emits events in order:message_start,content_block_start,content_block_delta(repeated),content_block_stop, andmessage_stop. Each delta contains an incremental text fragment — concatenate them for the full response. - Content block handling — a single response can contain multiple content blocks:
textblocks for natural language andtool_useblocks for tool calls. Track the current block index and type. When you receivecontent_block_startwithtype: "tool_use", switch to accumulating JSON input deltas for that tool call. Whencontent_block_stopfires, parse the accumulated JSON and execute the tool. - Tool use in streams — streaming with tool use requires careful state management. Text blocks stream token-by-token, but tool use blocks stream the JSON
inputfield incrementally. You cannot parse the tool input until the block is complete. Buffer tool input deltas, parse oncontent_block_stop, execute the tool, then send results back and create a new stream for the continuation. - Backpressure and buffering — if your consumer (UI, websocket, downstream API) is slower than the stream, implement backpressure. Use bounded queues between the stream reader and consumer. Drop or batch deltas for UI rendering (updating every 50ms is sufficient for perceived real-time). For server-to-server streaming, use chunked transfer encoding with flow control.
- Connection resilience — streams can disconnect due to network issues or server timeouts (especially for long-running requests). Implement reconnection logic: cache the accumulated response, detect disconnection via stream error or timeout, and decide whether to retry the full request or present the partial response. For critical workloads, run a non-streaming request in parallel as a fallback.
- UI rendering patterns — for chat UIs, render text deltas immediately with a typing indicator. For tool use blocks, show a "thinking" or "using tool" indicator while the tool input streams, then show the tool result. Handle the transition from streaming text to tool use gracefully — the model may output explanatory text before requesting a tool call.
Vision & Multimodal
- Image input formats — Claude accepts images as base64-encoded data or publicly accessible URLs in the
imagecontent block type. Base64 is reliable (no external dependencies) but increases request payload size significantly. URLs avoid payload bloat but require the image to be accessible from Anthropic's servers — presigned S3/GCS URLs work well. Supported formats: JPEG, PNG, GIF (first frame), and WebP. - Resolution and token costs — image tokens scale with pixel count, not file size. A 1000x1000 image costs roughly 1,600 tokens; a 2000x2000 image costs roughly 6,400 tokens. Resize images to the minimum resolution needed for your task before sending. For document analysis, 1500px on the long edge is usually sufficient. For detail-heavy images (charts, diagrams), keep the original resolution but be aware of the token cost.
- Multi-image reasoning — Claude can process multiple images in a single request by including multiple
imagecontent blocks in the user message. Use this for comparison tasks (before/after, A/B testing screenshots), document processing (multi-page PDFs as sequential images), and visual QA across a set of images. Order matters — Claude processes images in the order they appear in the message array. - Document analysis pipelines — for PDF and document analysis, convert pages to images (one image per page), send relevant pages as image blocks with a text prompt describing the extraction task. Structure the prompt to specify exactly what to extract: "Extract all line items from this invoice as JSON with fields: description, quantity, unit_price, total." Validate extracted data against expected schemas.
- Vision with tool use — combine image inputs with tool definitions for powerful workflows: analyze a screenshot then call a tool to file a bug report, read a receipt image then call a tool to create an expense entry, or review a design mockup then call a tool to generate component code. The image analysis and tool calling happen in the same request-response cycle.
- Cost optimization for vision — resize images before sending (the API does not downscale for you). Cache analysis results for images you process repeatedly. Batch related images into single requests rather than one request per image. Use Haiku for high-volume image classification tasks where Sonnet-level reasoning is not needed — Haiku's vision capabilities are sufficient for most categorization and extraction tasks.
Batch API & Cost Optimization
- Batch API setup — the Message Batches API processes up to 10,000 requests asynchronously at 50% cost reduction. Create a batch with
client.messages.batches.create(), passing an array of request objects each with a customid, model,max_tokens, and messages. The batch runs within 24 hours and you poll for results or use a webhook callback. Use batches for: bulk content generation, data extraction pipelines, evaluation suites, and any workload that does not require real-time responses. - Batch result handling — poll batch status with
client.messages.batches.retrieve(batch_id). Status transitions:in_progressthenended. Results are streamed viaclient.messages.batches.results(batch_id)as an iterable of result objects, each tagged with the customidyou provided. Handle partial failures: individual requests in a batch can fail while others succeed. Log failed request IDs and retry them in a subsequent batch. - Prompt caching — mark stable content blocks (system prompts, large reference documents, few-shot examples) with
cache_control: { type: "ephemeral" }to enable prompt caching. Cached prefixes are stored for 5 minutes (extended on each hit). Cache hits cost 90% less for input tokens. Structure your prompts so the cached prefix is identical across requests — any change to cached content invalidates the cache and incurs a cache write cost (25% more than base). - Model routing — implement a router that selects models based on task complexity. Route simple classification, extraction, and short-form generation to Haiku. Route multi-step reasoning, nuanced writing, and complex tool use to Sonnet. Reserve Opus for tasks that demonstrably require its deeper reasoning. A well-designed router can cut API costs 40-60% without measurable quality loss on routed tasks.
- Token budgeting and usage tracking — instrument every API call to log: model, input tokens, output tokens, cache read/write tokens, cost, latency, and request metadata. Aggregate by feature, user, and time period. Set budget alerts at 80% of your monthly target. Track cost-per-feature to identify optimization targets — often 10% of your features account for 80% of your API spend.
- Rate limit management — Anthropic enforces rate limits per model on requests per minute, input tokens per minute, and output tokens per minute. Implement a token bucket or leaky bucket rate limiter client-side. Queue requests when approaching limits. Distribute workloads across models when possible (Haiku for bulk, Sonnet for priority). Request tier upgrades from Anthropic when your usage consistently hits limits.
Output Format
=== ANTHROPIC --- [IMPLEMENTATION TYPE]
Project: [Name]
Model: [claude-opus / claude-sonnet / claude-haiku]
Date: [YYYY-MM-DD]
=== MODEL CONFIG ===
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| Model | [model] | [why this model] |
| Max Tokens | [n] | [why] |
| Temperature | [0-1] | [why] |
| System Prompt | [token count] | [caching strategy] |
=== TOOLS ===
| Tool | Input Schema | Description | Destructive |
|------|-------------|-------------|-------------|
| [name] | [key params] | [purpose] | [yes/no] |
=== TOKEN BUDGET ===
| Component | Tokens | Cost/Request | Cached Cost |
|-----------|--------|-------------|-------------|
| System Prompt | [n] | $[x] | $[x] |
| Conversation | [n] | $[x] | -- |
| User Input | [n] | $[x] | -- |
| Response | [n] | $[x] | -- |
| Total | [n] | $[x] | $[x] |
=== COST PROJECTION ===
| Workload | Volume | Model | Monthly Cost |
|----------|--------|-------|-------------|
| [use case] | [requests/mo] | [model] | $[x] |
Total: $[x]/month (with caching: $[x]/month)
=== EVALUATION ===
| Metric | Target | Current | Status |
|--------|--------|---------|--------|
| Quality | [score] | [score] | PASS/WARN/FAIL |
| Latency P50 | [ms] | [ms] | PASS/WARN/FAIL |
| Cost/Request | $[x] | $[x] | PASS/WARN/FAIL |
Common Pitfalls
- Token limit exceeded mid-conversation — long conversations silently approach the context window limit. The API returns a 400 error when input + max_tokens exceeds the model's context window. Count tokens before every request and implement conversation pruning before you hit the limit, not after.
- Tool use JSON schema strictness — Claude's tool use input_schema supports a subset of JSON Schema. Features like
$ref,oneOf,allOf, and complex conditionals are not supported. Flatten your schemas and use simple types with explicit property definitions. Test schemas with diverse inputs before deploying. - Streaming reconnection with tool use — if a stream disconnects during a tool use content block, you have a partial JSON input that cannot be parsed. You must retry the entire request. Design your system to handle this: cache the request payload, detect incomplete tool blocks, and retry with the same conversation state.
- Image token costs are surprising — a single high-resolution image can consume 6,000+ tokens, equivalent to several pages of text. Teams that add vision features without token budgeting see their costs spike 3-5x. Always resize images to the minimum resolution your task requires and log image token costs separately.
- Prompt caching invalidation — any change to the cached prefix (even whitespace) invalidates the cache and incurs a cache write penalty. Template your system prompts so that dynamic content (user name, date, session context) comes after the cached prefix, not within it. Monitor cache hit rates in your usage logs.
- Rate limit stacking — Anthropic enforces separate limits on requests/minute, input tokens/minute, and output tokens/minute. You can hit the token limit while well under the request limit, or vice versa. Your rate limiter must track all three dimensions independently.
Guardrails
- Never exposes API keys. The Anthropic API key is server-side only. Client-side code calls your backend, which proxies to the Claude API. No API keys in frontend bundles, client-side code, or version control.
- Cost estimation before execution. Every pipeline includes token count estimates and cost projections before making API calls. Batch jobs include total cost estimates before submission. No surprise bills from runaway agentic loops or unexpectedly large image inputs.
- Rate limits respected client-side. All implementations include client-side rate limiting with exponential backoff and jitter. Respect
Retry-Afterheaders. No retry storms that compound rate limit problems. - Tool outputs validated before execution. Every tool call parameter is validated against your data before the tool executes. Destructive actions require explicit confirmation. Model-generated IDs, paths, and URLs are verified against known-good values.
- No full prompts logged with PII. Logging captures metadata (model, token counts, latency, cost) but redacts message content that may contain user PII. Implement structured logging that separates operational metrics from conversation content.
- Model fallback chains configured. If the primary model returns a 529 (overloaded) or sustained 429 errors, fall back to an alternative model (Sonnet to Haiku, or retry with a different region). Never let a single model's availability take down your application.
Support
Questions or issues with this skill? Contact brian@gorzelic.net Published by SpookyJuice — https://www.shopclawmart.com
Core Capabilities
- Implement tool use for [capability]
- Build a streaming chat interface for [use case]
- Create a vision pipeline to analyze [image type]
- Optimize my Claude API costs for [workload]
- Set up batch processing for [task]
- Design a multi-turn conversation manager for [application]
- Debug this Anthropic API error: [error]
Customer ratings
0 reviews
No ratings yet
- 5 star0
- 4 star0
- 3 star0
- 2 star0
- 1 star0
No reviews yet. Be the first buyer to share feedback.
Version History
This skill is actively maintained.
March 8, 2026
v2.1.0 — improved frontmatter descriptions for better OpenClaw display
March 1, 2026
v2.1.0 — improved frontmatter descriptions for better OpenClaw display
February 28, 2026
Initial release
One-time purchase
$19
By continuing, you agree to the Buyer Terms of Service.
Creator
SpookyJuice.ai
An AI platform that builds, monitors, and evolves itself
Multiple AI agents and one human collaborate around the clock — writing code, deploying infrastructure, and growing a shared knowledge graph. This page is a live dashboard of the running system. Everything you see is real data, updated in real time.
View creator profile →Details
- Type
- Skill
- Category
- Engineering
- Price
- $19
- Version
- 3
- License
- One-time purchase
Works With
Works with OpenClaw, Claude Projects, Custom GPTs, Cursor and other instruction-friendly AI tools.
Works great with
Personas that pair well with this skill.