Anthropic -- Claude API Integration Expert

Name: Anthropic -- Claude API Integration Expert
Brand: SpookyJuice.ai
Price: 19.00 USD
Availability: InStock

Skill

Your Claude API expert that optimizes prompts, manages tokens, and builds production AI applications.

EngineeringAll platformsv3

About

name: anthropic description: > Implement Claude API patterns, tool use, streaming, vision, and batch processing. USE WHEN: User needs to implement Claude API, tool use, streaming, vision, batch processing, or prompt engineering. DON'T USE WHEN: User needs OpenAI-specific features. Use OpenAI for GPT models. Use Vector for embedding pipelines. OUTPUTS: API integrations, tool use schemas, streaming handlers, vision pipelines, batch processors, prompt templates, cost optimizations. version: 1.0.0 author: SpookyJuice tags: [anthropic, claude, api, tool-use, streaming, vision, ai] price: 19 author_url: "https://www.shopclawmart.com" support: "brian@gorzelic.net" license: proprietary osps_version: "0.1" content_hash: "sha256:bdd36fea7c4f3cd919abf4924a1e9804f0f4c154fdc26441fb6dee107bfbbeae"

#‍‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‍ Anthropic

Version: 1.0.0 Price: $19 Type: Skill

Description

Production-grade Claude API integration for tool use orchestration, streaming, vision pipelines, and batch processing. Anthropic's API is deceptively simple on the surface — a single Messages endpoint — but the real complexity lives in the details: token counting that doesn't match tiktoken, tool use schemas that reject valid JSON Schema features, streaming events that interleave content blocks with tool calls in ways the docs don't fully illustrate, and image token costs that scale quadratically with resolution. The pain points aren't making a call — it's building reliable infrastructure around it: prompt caching that invalidates when you least expect it, retry logic that respects rate limits without dropping messages, model selection trade-offs between Haiku's speed, Sonnet's balance, and Opus's reasoning depth, and batch pipelines that cut costs 50% while handling partial failures gracefully.

Prerequisites

Anthropic API key with appropriate tier and rate limits
SDK installed: pip install anthropic (Python) or npm install @anthropic-ai/sdk (Node.js)
Python 3.9+ or Node.js 18+ (official SDK support)
For vision: images as base64 or publicly accessible URLs
For batch API: understanding of your throughput requirements and cost targets

Setup

Copy SKILL.md into your OpenClaw skills directory
Set environment variables:
```
export ANTHROPIC_API_KEY="sk-ant-..."
```
Install the SDK: pip install anthropic or npm install @anthropic-ai/sdk
Reload OpenClaw

Commands

"Implement tool use for [capability]"
"Build a streaming chat interface for [use case]"
"Create a vision pipeline to analyze [image type]"
"Optimize my Claude API costs for [workload]"
"Set up batch processing for [task]"
"Design a multi-turn conversation manager for [application]"
"Debug this Anthropic API error: [error]"

Workflow

Messages API Integration

Model selection — choose based on task complexity, latency, and cost: Claude Opus (deepest reasoning, complex analysis, $15/$75 per 1M tokens), Claude Sonnet (best balance of quality and speed, $3/$15 per 1M tokens), Claude Haiku (fastest response, high-volume tasks, $0.25/$1.25 per 1M tokens). Start with Sonnet and move to Haiku for latency-sensitive paths or Opus for tasks where Sonnet's reasoning falls short.
System prompt design — the system prompt is a separate top-level parameter, not a message in the conversation array. Structure it with: role definition, behavioral constraints, output format specifications, and domain knowledge. System prompts are cached automatically after the first request when using prompt caching, so front-load stable instructions and keep variable content in user messages.
Multi-turn conversation management — maintain the full message array client-side, alternating user and assistant roles strictly. The API rejects consecutive same-role messages. Implement conversation pruning: summarize older turns into a single assistant message, drop tool-use turns that are no longer relevant, and always keep the system prompt and the most recent 3-5 exchanges intact.
Token counting and budgeting — use the anthropic.count_tokens() method (Python SDK) to count tokens before sending. Budget each request: system prompt (fixed overhead), conversation history (growing cost), user input (variable), and max_tokens for the response. The API returns usage.input_tokens and usage.output_tokens in every response — log these for cost tracking and alerting.
Error handling and retries — implement exponential backoff with jitter for 429 (rate limit) and 529 (overloaded) errors. The Retry-After header tells you exactly how long to wait — respect it. For 400 errors, check the error message: token limit exceeded requires context pruning, not retries. Wrap the SDK client with a retry decorator that distinguishes transient from permanent failures.
Response metadata — every API response includes stop_reason (end_turn, max_tokens, stop_sequence, tool_use), usage stats, and the model that actually served the request. Check stop_reason — if it's max_tokens, the response was truncated and you may need a continuation request. Log the model field to verify you're getting the model you requested.

Tool Use Orchestration

Tool schema design — define tools with name, description, and input_schema (JSON Schema). The description is critical — Claude uses it to decide when and how to call the tool. Be specific about what the tool does, what it returns, and when it should NOT be used. Keep input_schema tight: use required fields, add description to each property, and avoid open-ended object types without defined properties.
Input validation — Claude generates tool inputs based on the schema, but it can hallucinate plausible-looking values (fake IDs, incorrect enum values, impossible date ranges). Validate every tool input against your data before executing. Return clear error messages when validation fails — Claude uses these to correct its approach on the next turn.
Agentic tool use loops — the standard loop: send messages with tools defined, check if stop_reason is tool_use, extract tool_use content blocks, execute each tool, append tool results as a user message with tool_result content blocks (matching each result to its tool_use_id), and send again. Continue until stop_reason is end_turn. Set a maximum iteration count (5-10) to prevent runaway loops.
Multi-tool coordination — Claude can request multiple tool calls in a single response. Execute them concurrently when they are independent (fetching data from two sources) but sequentially when one depends on another's output. Return all results in a single user message. Design tools to be composable — small, focused tools that Claude can chain are more reliable than monolithic tools.
Error recovery in tool chains — when a tool call fails mid-chain, return the error as a tool_result with is_error: true. Claude will typically retry with corrected parameters, try an alternative tool, or explain the failure to the user. Never silently swallow tool errors — Claude needs the feedback to reason about next steps.
Confirmation patterns for destructive actions — for tools that modify state (delete, send, purchase), implement a preview-then-execute pattern. The first tool call returns a summary of what will happen, Claude presents it to the user, and a second tool call with a confirmation token executes the action. This prevents Claude from executing destructive operations without explicit user consent.

Streaming Implementation

SSE stream setup — use client.messages.stream() (Python) or client.messages.stream() (Node.js) which returns an async iterator of server-sent events. The stream emits events in order: message_start, content_block_start, content_block_delta (repeated), content_block_stop, and message_stop. Each delta contains an incremental text fragment — concatenate them for the full response.
Content block handling — a single response can contain multiple content blocks: text blocks for natural language and tool_use blocks for tool calls. Track the current block index and type. When you receive content_block_start with type: "tool_use", switch to accumulating JSON input deltas for that tool call. When content_block_stop fires, parse the accumulated JSON and execute the tool.
Tool use in streams — streaming with tool use requires careful state management. Text blocks stream token-by-token, but tool use blocks stream the JSON input field incrementally. You cannot parse the tool input until the block is complete. Buffer tool input deltas, parse on content_block_stop, execute the tool, then send results back and create a new stream for the continuation.
Backpressure and buffering — if your consumer (UI, websocket, downstream API) is slower than the stream, implement backpressure. Use bounded queues between the stream reader and consumer. Drop or batch deltas for UI rendering (updating every 50ms is sufficient for perceived real-time). For server-to-server streaming, use chunked transfer encoding with flow control.
Connection resilience — streams can disconnect due to network issues or server timeouts (especially for long-running requests). Implement reconnection logic: cache the accumulated response, detect disconnection via stream error or timeout, and decide whether to retry the full request or present the partial response. For critical workloads, run a non-streaming request in parallel as a fallback.
UI rendering patterns — for chat UIs, render text deltas immediately with a typing indicator. For tool use blocks, show a "thinking" or "using tool" indicator while the tool input streams, then show the tool result. Handle the transition from streaming text to tool use gracefully — the model may output explanatory text before requesting a tool call.

Vision & Multimodal

Image input formats — Claude accepts images as base64-encoded data or publicly accessible URLs in the image content block type. Base64 is reliable (no external dependencies) but increases request payload size significantly. URLs avoid payload bloat but require the image to be accessible from Anthropic's servers — presigned S3/GCS URLs work well. Supported formats: JPEG, PNG, GIF (first frame), and WebP.
Resolution and token costs — image tokens scale with pixel count, not file size. A 1000x1000 image costs roughly 1,600 tokens; a 2000x2000 image costs roughly 6,400 tokens. Resize images to the minimum resolution needed for your task before sending. For document analysis, 1500px on the long edge is usually sufficient. For detail-heavy images (charts, diagrams), keep the original resolution but be aware of the token cost.
Multi-image reasoning — Claude can process multiple images in a single request by including multiple image content blocks in the user message. Use this for comparison tasks (before/after, A/B testing screenshots), document processing (multi-page PDFs as sequential images), and visual QA across a set of images. Order matters — Claude processes images in the order they appear in the message array.
Document analysis pipelines — for PDF and document analysis, convert pages to images (one image per page), send relevant pages as image blocks with a text prompt describing the extraction task. Structure the prompt to specify exactly what to extract: "Extract all line items from this invoice as JSON with fields: description, quantity, unit_price, total." Validate extracted data against expected schemas.
Vision with tool use — combine image inputs with tool definitions for powerful workflows: analyze a screenshot then call a tool to file a bug report, read a receipt image then call a tool to create an expense entry, or review a design mockup then call a tool to generate component code. The image analysis and tool calling happen in the same request-response cycle.
Cost optimization for vision — resize images before sending (the API does not downscale for you). Cache analysis results for images you process repeatedly. Batch related images into single requests rather than one request per image. Use Haiku for high-volume image classification tasks where Sonnet-level reasoning is not needed — Haiku's vision capabilities are sufficient for most categorization and extraction tasks.

Batch API & Cost Optimization

Batch API setup — the Message Batches API processes up to 10,000 requests asynchronously at 50% cost reduction. Create a batch with client.messages.batches.create(), passing an array of request objects each with a custom id, model, max_tokens, and messages. The batch runs within 24 hours and you poll for results or use a webhook callback. Use batches for: bulk content generation, data extraction pipelines, evaluation suites, and any workload that does not require real-time responses.
Batch result handling — poll batch status with client.messages.batches.retrieve(batch_id). Status transitions: in_progress then ended. Results are streamed via client.messages.batches.results(batch_id) as an iterable of result objects, each tagged with the custom id you provided. Handle partial failures: individual requests in a batch can fail while others succeed. Log failed request IDs and retry them in a subsequent batch.
Prompt caching — mark stable content blocks (system prompts, large reference documents, few-shot examples) with cache_control: { type: "ephemeral" } to enable prompt caching. Cached prefixes are stored for 5 minutes (extended on each hit). Cache hits cost 90% less for input tokens. Structure your prompts so the cached prefix is identical across requests — any change to cached content invalidates the cache and incurs a cache write cost (25% more than base).
Model routing — implement a router that selects models based on task complexity. Route simple classification, extraction, and short-form generation to Haiku. Route multi-step reasoning, nuanced writing, and complex tool use to Sonnet. Reserve Opus for tasks that demonstrably require its deeper reasoning. A well-designed router can cut API costs 40-60% without measurable quality loss on routed tasks.
Token budgeting and usage tracking — instrument every API call to log: model, input tokens, output tokens, cache read/write tokens, cost, latency, and request metadata. Aggregate by feature, user, and time period. Set budget alerts at 80% of your monthly target. Track cost-per-feature to identify optimization targets — often 10% of your features account for 80% of your API spend.
Rate limit management — Anthropic enforces rate limits per model on requests per minute, input tokens per minute, and output tokens per minute. Implement a token bucket or leaky bucket rate limiter client-side. Queue requests when approaching limits. Distribute workloads across models when possible (Haiku for bulk, Sonnet for priority). Request tier upgrades from Anthropic when your usage consistently hits limits.

Output Format

=== ANTHROPIC --- [IMPLEMENTATION TYPE]
Project: [Name]
Model: [claude-opus / claude-sonnet / claude-haiku]
Date: [YYYY-MM-DD]

=== MODEL CONFIG ===
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| Model | [model] | [why this model] |
| Max Tokens | [n] | [why] |
| Temperature | [0-1] | [why] |
| System Prompt | [token count] | [caching strategy] |

=== TOOLS ===
| Tool | Input Schema | Description | Destructive |
|------|-------------|-------------|-------------|
| [name] | [key params] | [purpose] | [yes/no] |

=== TOKEN BUDGET ===
| Component | Tokens | Cost/Request | Cached Cost |
|-----------|--------|-------------|-------------|
| System Prompt | [n] | $[x] | $[x] |
| Conversation | [n] | $[x] | -- |
| User Input | [n] | $[x] | -- |
| Response | [n] | $[x] | -- |
| Total | [n] | $[x] | $[x] |

=== COST PROJECTION ===
| Workload | Volume | Model | Monthly Cost |
|----------|--------|-------|-------------|
| [use case] | [requests/mo] | [model] | $[x] |
Total: $[x]/month (with caching: $[x]/month)

=== EVALUATION ===
| Metric | Target | Current | Status |
|--------|--------|---------|--------|
| Quality | [score] | [score] | PASS/WARN/FAIL |
| Latency P50 | [ms] | [ms] | PASS/WARN/FAIL |
| Cost/Request | $[x] | $[x] | PASS/WARN/FAIL |

Common Pitfalls

Token limit exceeded mid-conversation — long conversations silently approach the context window limit. The API returns a 400 error when input + max_tokens exceeds the model's context window. Count tokens before every request and implement conversation pruning before you hit the limit, not after.
Tool use JSON schema strictness — Claude's tool use input_schema supports a subset of JSON Schema. Features like $ref, oneOf, allOf, and complex conditionals are not supported. Flatten your schemas and use simple types with explicit property definitions. Test schemas with diverse inputs before deploying.
Streaming reconnection with tool use — if a stream disconnects during a tool use content block, you have a partial JSON input that cannot be parsed. You must retry the entire request. Design your system to handle this: cache the request payload, detect incomplete tool blocks, and retry with the same conversation state.
Image token costs are surprising — a single high-resolution image can consume 6,000+ tokens, equivalent to several pages of text. Teams that add vision features without token budgeting see their costs spike 3-5x. Always resize images to the minimum resolution your task requires and log image token costs separately.
Prompt caching invalidation — any change to the cached prefix (even whitespace) invalidates the cache and incurs a cache write penalty. Template your system prompts so that dynamic content (user name, date, session context) comes after the cached prefix, not within it. Monitor cache hit rates in your usage logs.
Rate limit stacking — Anthropic enforces separate limits on requests/minute, input tokens/minute, and output tokens/minute. You can hit the token limit while well under the request limit, or vice versa. Your rate limiter must track all three dimensions independently.

Guardrails

Never exposes API keys. The Anthropic API key is server-side only. Client-side code calls your backend, which proxies to the Claude API. No API keys in frontend bundles, client-side code, or version control.
Cost estimation before execution. Every pipeline includes token count estimates and cost projections before making API calls. Batch jobs include total cost estimates before submission. No surprise bills from runaway agentic loops or unexpectedly large image inputs.
Rate limits respected client-side. All implementations include client-side rate limiting with exponential backoff and jitter. Respect Retry-After headers. No retry storms that compound rate limit problems.
Tool outputs validated before execution. Every tool call parameter is validated against your data before the tool executes. Destructive actions require explicit confirmation. Model-generated IDs, paths, and URLs are verified against known-good values.
No full prompts logged with PII. Logging captures metadata (model, token counts, latency, cost) but redacts message content that may contain user PII. Implement structured logging that separates operational metrics from conversation content.
Model fallback chains configured. If the primary model returns a 529 (overloaded) or sustained 429 errors, fall back to an alternative model (Sonnet to Haiku, or retry with a different region). Never let a single model's availability take down your application.

Support

Questions or issues with this skill? Contact brian@gorzelic.net Published by SpookyJuice — https://www.shopclawmart.com

Core Capabilities

Implement tool use for [capability]
Build a streaming chat interface for [use case]
Create a vision pipeline to analyze [image type]
Optimize my Claude API costs for [workload]
Set up batch processing for [task]
Design a multi-turn conversation manager for [application]
Debug this Anthropic API error: [error]

Customer ratings

0 reviews

No ratings yet

5 star
0
4 star
0
3 star
0
2 star
0
1 star
0

No reviews yet. Be the first buyer to share feedback.

Version History

This skill is actively maintained.

Version 3Latest

March 8, 2026

v2.1.0 — improved frontmatter descriptions for better OpenClaw display

Version 2

March 1, 2026

v2.1.0 — improved frontmatter descriptions for better OpenClaw display

Version 1

February 28, 2026

Initial release

One-time purchase

$19

By continuing, you agree to the Buyer Terms of Service.

Creator

SpookyJuice.ai

An AI platform that builds, monitors, and evolves itself

Multiple AI agents and one human collaborate around the clock — writing code, deploying infrastructure, and growing a shared knowledge graph. This page is a live dashboard of the running system. Everything you see is real data, updated in real time.

View creator profile →

Details

Type: Skill
Category: Engineering
Price: $19
Version: 3
License: One-time purchase

Works With

OpenClawRaw FilesClaude ProjectsCustom GPTsCursor

Works with OpenClaw, Claude Projects, Custom GPTs, Cursor and other instruction-friendly AI tools.

Works great with

Personas that pair well with this skill.

Developer Skill Pack

Bundle

Four engineering skills in one — Rails, Python, SQL, and API design patterns that make agents write production-quality code

$69

IT Orchestrator Agent

Persona

Keep technical work moving. Reduce operational friction.

$39

Software Architect Agent

Persona

Design systems that are clear, scalable, and actually buildable.

$49