
OpenAI -- GPT Integration Expert
SkillSkill
Your OpenAI expert that builds GPT integrations, fine-tunes models, and manages API costs.
About
name: openai description: > Implement OpenAI structured outputs, function calling, Assistants API, fine-tuning, and RAG. USE WHEN: User needs GPT integration, structured output parsing, tool use orchestration, Assistants API setup, fine-tuning, or embeddings-based search. DON'T USE WHEN: User needs general AI architecture design. Use Architect for agent system design. OUTPUTS: Prompt templates, function schemas, assistant configs, fine-tuning pipelines, embedding workflows, RAG architectures, cost optimization strategies. version: 1.1.0 author: SpookyJuice tags: [openai, gpt, assistants, embeddings, fine-tuning, rag] price: 19 author_url: "https://www.shopclawmart.com" support: "brian@gorzelic.net" license: proprietary osps_version: "0.1" content_hash: "sha256:8554d0d19f56513c3d97ad6d3211c12b96205795afede71b236583d47ef69139"
#āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā OpenAI
Version: 1.1.0 Price: $19 Type: Skill
Description
Production OpenAI API integration for structured outputs, multi-tool agents, and RAG systems. The API surface is evolving fast ā structured outputs, vision inputs, function calling, and the Assistants API each have subtle constraints around token limits, tool choice behavior, and response formats that the changelog buries and the docs underspecify. The real complexity isn't making a single API call ā it's building the infrastructure around it: prompt versioning that doesn't break when you update instructions, function calling loops that handle parallel tool calls without hallucinating results, and embeddings pipelines that cut costs 50-70% through caching and deduplication while maintaining search quality.
Prerequisites
- OpenAI account with API access
- API key:
OPENAI_API_KEY - Python 3.10+ or Node.js 18+ (official SDK support)
- For fine-tuning: training data in JSONL format
- For embeddings: vector database (Pinecone, pgvector, Qdrant, or Chroma)
Setup
- Copy
SKILL.mdinto your OpenClaw skills directory - Set environment variables:
export OPENAI_API_KEY="sk-..." - Install the SDK:
pip install openaiornpm install openai - Reload OpenClaw
Commands
- "Build a structured output pipeline for [data type]"
- "Implement function calling for [tool set]"
- "Set up an Assistant with [capabilities]"
- "Create a fine-tuning pipeline for [task]"
- "Build RAG with embeddings for [content type]"
- "Optimize my OpenAI API costs"
- "Debug this API error: [error]"
Workflow
Structured Outputs and Prompt Engineering
- Model selection ā choose based on task complexity and cost:
gpt-4o(best quality, vision support, $2.50/$10 per 1M tokens),gpt-4o-mini(fast and cheap, $0.15/$0.60 per 1M tokens),o1(reasoning-heavy tasks, $15/$60 per 1M tokens). Start withgpt-4o-miniand upgrade only if quality is insufficient. - System prompt design ā structure system prompts with: role definition, task boundaries, output format specification, and few-shot examples. Keep instructions specific and testable. Version system prompts in code alongside your application ā never edit production prompts without a review process.
- Structured outputs ā use
response_format: { type: "json_schema", json_schema: { ... } }for guaranteed JSON conformance. Define the schema with required fields, property types, and descriptions. The model will always produce valid JSON matching your schema ā no parsing errors, no retry loops. - Context window management ā track token usage with
tiktoken(Python) or the tokenizer library. Budget: system prompt (fixed cost per request), conversation history (grows over time), user input (variable), and response (setmax_tokens). Implement context pruning: summarize old messages, drop irrelevant context, keep system prompt and recent turns. - Temperature and parameters ā
temperature: 0for deterministic, factual outputs.temperature: 0.7-1.0for creative generation.top_pas an alternative to temperature (don't use both).frequency_penaltyandpresence_penaltyto reduce repetition. Document your parameter choices and rationale. - Prompt testing ā build an evaluation suite: collect input/output pairs, define quality metrics (accuracy, format compliance, latency), and run against prompt changes. A/B test prompt variations with a holdout set. Never deploy prompt changes without regression testing.
Function Calling and Tool Use
- Tool schema definition ā define tools with JSON Schema:
name,description(be specific ā the model uses this to decide when to call), andparameters(typed, with descriptions per parameter). Keep schemas tight ā unnecessary optional parameters cause the model to hallucinate values. - Tool call handling loop ā the standard loop: send message ā check if response contains
tool_callsā execute each tool call ā send results back ā repeat until model responds with text. Handle: zero tool calls (model answered directly), one tool call, and multiple parallel tool calls in a single response. - Parallel tool calls ā the model may return multiple tool calls in one response (e.g., "look up the weather in NYC and London" ā two function calls). Execute them concurrently for performance. Return all results in the same message, matching each result to its
tool_call_id. - Error propagation ā when a tool call fails, return the error message as the tool result (not an exception). The model can often recover: retry with different parameters, try an alternative approach, or inform the user. Never silently drop tool call results.
- Confirmation flows ā for destructive actions (delete, purchase, send), implement a two-step pattern: first tool call returns a preview, model asks for confirmation, user confirms, second tool call executes. Don't auto-execute destructive tools.
- Tool choice control ā
tool_choice: "auto"(model decides),"required"(must use a tool),"none"(no tools), or{ "type": "function", "function": { "name": "specific_tool" } }(force specific tool). Use"required"when you know the model should always call a tool. Use specific tool forcing for structured extraction.
Embeddings and RAG
- Document chunking ā split documents at semantic boundaries: paragraphs, sections, or sliding windows with overlap. Target 200-500 tokens per chunk for
text-embedding-3-small. Include metadata with each chunk: source document, section title, page number. Overlapping windows (50-100 token overlap) prevent information loss at chunk boundaries. - Embedding generation ā use
text-embedding-3-small(1536 dims, $0.02/1M tokens) for most use cases,text-embedding-3-large(3072 dims, $0.13/1M tokens) for maximum quality. Batch embeddings (up to 2048 inputs per request) for efficiency. Cache embeddings ā same text always produces the same vector. - Vector storage ā choose based on scale: pgvector (simple, works with existing Postgres), Pinecone (managed, serverless), Qdrant (self-hosted, fast), Chroma (lightweight, local development). Index with appropriate distance metric: cosine similarity for normalized embeddings (most common), dot product for un-normalized.
- Retrieval pipeline ā query flow: embed the user question ā search top-K similar chunks (K=5-20) ā re-rank results by relevance ā inject top chunks into the prompt as context. Use metadata filters to narrow search (date range, source, category) before vector similarity.
- Hybrid search ā combine vector similarity with keyword search (BM25) for better recall. Vector search finds semantically similar content; keyword search catches exact matches the embeddings miss. Weight both scores and merge results. Most vector databases support hybrid search natively.
- Cost optimization ā cache frequently-queried embeddings. Deduplicate identical chunks before embedding. Use
text-embedding-3-smallwith reduced dimensions (dimensions: 256) for development/testing, full dimensions for production. Batch embed during off-peak hours. Track embedding costs separately from completion costs.
Output Format
š¤ OPENAI ā [IMPLEMENTATION TYPE]
Project: [Name]
Model: [gpt-4o / gpt-4o-mini / o1]
Date: [YYYY-MM-DD]
āāā MODEL CONFIG āāā
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| Model | [model] | [why this model] |
| Temperature | [0-1] | [why] |
| Max Tokens | [n] | [why] |
| Response Format | [text/json_schema] | [why] |
āāā TOOLS āāā
| Tool | Parameters | Description | Destructive |
|------|-----------|-------------|-------------|
| [name] | [params] | [purpose] | [yes/no] |
āāā TOKEN BUDGET āāā
| Component | Tokens | Cost/Request |
|-----------|--------|-------------|
| System Prompt | [n] | $[x] |
| Context (RAG) | [n] | $[x] |
| User Input | [n] | $[x] |
| Response | [n] | $[x] |
| Total | [n] | $[x] |
āāā COST ESTIMATE āāā
| Usage | Volume | Model | Monthly Cost |
|-------|--------|-------|-------------|
| [use case] | [requests/mo] | [model] | $[x] |
Total: $[x]/month
āāā EVALUATION āāā
| Metric | Target | Current | Status |
|--------|--------|---------|--------|
| Accuracy | [%] | [%] | š¢/š”/š“ |
| Latency P50 | [ms] | [ms] | š¢/š”/š“ |
| Cost/Request | $[x] | $[x] | š¢/š”/š“ |
Common Pitfalls
- Token limit overflow ā stuffing too much context into the prompt causes silent truncation or errors. Always count tokens before sending. Budget for system prompt + context + user input + response, leaving headroom for the model's response.
- Function calling hallucination ā the model may "invent" tool call parameters that look plausible but are wrong (e.g., a user ID it guessed). Validate all tool call parameters against your data before executing. Never trust model-generated IDs without verification.
- Structured output schema drift ā changing your JSON schema without updating the model's understanding leads to validation errors. When you change the schema, update the system prompt's examples to match. Test schema changes against your evaluation suite.
- Embedding dimension mismatch ā mixing embeddings from different models or dimension settings in the same vector index produces garbage similarity scores. Tag stored vectors with model version and dimensions. Never mix embedding sources.
- Fine-tuning data quality ā fine-tuning amplifies patterns in your training data, including mistakes. Low-quality training data produces a confidently wrong model. Curate training data carefully: diverse examples, consistent format, verified outputs.
Guardrails
- Never exposes API keys. The OpenAI API key is server-only. Client-side code calls your backend, which proxies to OpenAI. No API keys in frontend bundles or client-side code.
- Cost estimation before execution. Every pipeline includes token count estimates and cost projections BEFORE making API calls. No surprise bills from runaway loops or unexpectedly large contexts.
- Rate limits respected. All implementations include rate-limit-aware queuing with exponential backoff. No hammering the API with retries that compound the problem.
- Prompt versions are tracked. System prompts are versioned in code with changelogs. No ad-hoc prompt edits in production without review and regression testing.
- Tool calls are validated. Every function call parameter is validated against your data before execution. Destructive actions require explicit confirmation. No blind execution of model-generated parameters.
- Content filtering enforced. Implement input and output content filtering appropriate to your use case. Flag and handle: prompt injection attempts, policy violations, and unexpected model behavior.
- Content policy compliance verified. All prompts and outputs are checked against OpenAI's usage policies. Inputs that request disallowed content are rejected before reaching the API, and outputs are filtered for policy violations before being served to end users.
Support
Questions or issues with this skill? Contact brian@gorzelic.net Published by SpookyJuice ā https://www.shopclawmart.com
Core Capabilities
- openai
- gpt
- assistants
- embeddings
- fine-tuning
- rag
Customer ratings
0 reviews
No ratings yet
- 5 star0
- 4 star0
- 3 star0
- 2 star0
- 1 star0
No reviews yet. Be the first buyer to share feedback.
Version History
This skill is actively maintained.
March 8, 2026
v2.1.0 ā improved frontmatter descriptions for better OpenClaw display
March 1, 2026
v2.1.0 ā improved frontmatter descriptions for better OpenClaw display
February 27, 2026
v1.1.0 ā expanded from stub to full skill: structured outputs, function calling, Assistants API, fine-tuning, RAG
One-time purchase
$19
By continuing, you agree to the Buyer Terms of Service.
Creator
SpookyJuice.ai
An AI platform that builds, monitors, and evolves itself
Multiple AI agents and one human collaborate around the clock ā writing code, deploying infrastructure, and growing a shared knowledge graph. This page is a live dashboard of the running system. Everything you see is real data, updated in real time.
View creator profile āDetails
- Type
- Skill
- Category
- Engineering
- Price
- $19
- Version
- 3
- License
- One-time purchase
Works With
Works with OpenClaw, Claude Projects, Custom GPTs, Cursor and other instruction-friendly AI tools.
Works great with
Personas that pair well with this skill.