
Pinecone -- Vector Database Integration Expert
SkillSkill
Your Pinecone expert that builds vector search, manages embeddings, and optimizes retrieval pipelines.
About
name: pinecone description: > Implement Pinecone vector search, RAG pipelines, namespace management, and metadata filtering. USE WHEN: User needs to implement Pinecone vector search, RAG pipelines, semantic search, or embedding storage. DON'T USE WHEN: User needs general embedding generation. Use Vector for embedding patterns. Use OpenAI or Anthropic for model integration. OUTPUTS: Index configurations, upsert pipelines, query implementations, RAG architectures, namespace strategies, metadata filters, hybrid search setups. version: 1.0.0 author: SpookyJuice tags: [pinecone, vector-database, semantic-search, rag, embeddings] price: 14 author_url: "https://www.shopclawmart.com" support: "brian@gorzelic.net" license: proprietary osps_version: "0.1" content_hash: "sha256:c469e7ffd943d6abff90deabeaa9c476052633855c3b97d287845e82fedb66da"
# Pinecone
Version: 1.0.0 Price: $14 Type: Skill
Description
Production-grade Pinecone vector database patterns for the failure modes that derail semantic search and RAG projects. Pinecone handles the infrastructure — indexing, replication, low-latency queries — but the integration surface between your embedding pipeline and the index is where things break. Dimension mismatches between your embedding model and the index silently corrupt results. Metadata filter syntax looks simple until you hit the 40KB per-vector limit. Upsert batching that works for 10K vectors chokes at 10M. Namespace isolation sounds free until you realize each namespace shares the same pod capacity. This skill encodes the configuration decisions, ingestion patterns, query strategies, and RAG architectures that survive production scale.
Prerequisites
- Pinecone account (free tier works for development, Starter plan for production)
- API key from the Pinecone console
- Index created with dimensions matching your embedding model (e.g., 1536 for OpenAI
text-embedding-3-small, 3072 fortext-embedding-3-large) - Python
pinecone-clientor Node.js@pinecone-database/pineconeSDK installed
Setup
- Copy
SKILL.mdinto your OpenClaw skills directory - Set environment variables:
export PINECONE_API_KEY="your-api-key" export PINECONE_ENVIRONMENT="us-east-1-aws" export PINECONE_INDEX_NAME="your-index-name" - Reload OpenClaw
Commands
- "Create a Pinecone index for [embedding model/use case]"
- "Build an upsert pipeline for [data source]"
- "Implement semantic search with metadata filtering for [domain]"
- "Set up a RAG pipeline with Pinecone for [application]"
- "Design a namespace strategy for [multi-tenant/multi-collection use case]"
- "Configure hybrid search with sparse-dense vectors for [use case]"
- "Debug my Pinecone queries — results are irrelevant"
Workflow
Index Configuration
- Dimension selection — match your index dimensions exactly to your embedding model output. OpenAI
text-embedding-3-smalloutputs 1536,text-embedding-3-largeoutputs 3072, Cohereembed-english-v3.0outputs 1024. A mismatch doesn't error on index creation — it errors on the first upsert or silently produces garbage results if dimensions happen to align by padding. - Metric choice — select
cosinefor normalized embeddings (most common, works with OpenAI and Cohere models out of the box),dotproductfor embeddings where magnitude carries meaning, oreuclideanfor absolute distance comparisons. Cosine is the safe default. You cannot change the metric after index creation — you must delete and recreate. - Serverless vs pod-based — serverless indexes scale to zero and charge per query, ideal for development and bursty workloads. Pod-based indexes provide dedicated capacity with predictable latency, necessary for sustained high-throughput production workloads. Start serverless, migrate to pods when you need latency guarantees under sustained load.
- Pod type and size — for pod-based indexes:
s1pods optimize for storage density (more vectors per dollar),p1pods optimize for query speed (lower latency). Start withs1.x1for most workloads. Scale horizontally with replicas for read throughput, vertically with pod size for capacity. - Replica configuration — replicas multiply read throughput linearly. Two replicas = double the queries per second. Replicas do not increase storage capacity. For read-heavy RAG workloads, scale replicas before scaling pod size. Each replica adds cost equal to one base pod.
- Metadata configuration — plan your metadata schema before ingesting data. Pinecone indexes metadata fields for filtering automatically, but each vector's metadata is capped at 40KB. Store only the fields you need for filtering and retrieval context — keep large text content in your primary database and reference it by ID.
Data Ingestion Pipeline
- Chunking strategy — split source documents into chunks sized for your embedding model's sweet spot (typically 256-512 tokens for retrieval). Overlap chunks by 10-20% to preserve context across boundaries. Track chunk-to-document mapping in metadata so you can reconstruct original context during retrieval.
- Embedding generation — batch your embedding API calls to maximize throughput and minimize cost. OpenAI's API accepts up to 2048 inputs per call. Generate embeddings in parallel batches, respecting rate limits. Cache embeddings locally before upserting — if the upsert fails, you don't want to re-embed.
- Batch upsert — Pinecone accepts up to 100 vectors per upsert call, with a 2MB request size limit. For large datasets, batch in groups of 100 and use async upsert with concurrency control (5-10 parallel requests). Include a progress tracker and retry logic with exponential backoff for rate limit errors (HTTP 429).
- Metadata attachment — attach structured metadata to each vector: source document ID, chunk index, creation timestamp, content type, and any fields needed for filtering. Keep metadata values as primitives (strings, numbers, booleans, string arrays) — nested objects are not supported in filters.
- ID strategy — use deterministic IDs based on content hash (e.g.,
sha256(document_id + chunk_index)) for idempotent upserts. This enables re-ingestion without duplicates. Random UUIDs work but make deduplication impossible without external tracking. - Deduplication — before upserting, check if vectors with the same IDs already exist using
fetchby ID. For content-based dedup, maintain a local hash set of content hashes mapped to vector IDs. Delete stale vectors when source documents are updated — Pinecone does not automatically expire vectors. - Validation — after ingestion, verify vector count matches expected count via
describe_index_stats(). Query a known document to confirm embeddings return relevant results. Check namespace distribution if using multiple namespaces to ensure data landed correctly.
Semantic Search Implementation
- Query construction — embed the user query with the same model used for indexing. This is non-negotiable — mixing models produces meaningless similarity scores. Pass the query embedding to Pinecone with
top_kset to your desired result count andinclude_metadata=Trueto get context alongside results. - Top-k tuning — start with
top_k=10and adjust based on precision needs. For RAG, retrieve more (20-50) and re-rank. For direct user-facing search, retrieve fewer (5-10) with higher relevance thresholds. Higher top-k increases latency linearly — measure the tradeoff for your use case. - Metadata filtering — apply filters to narrow search scope before similarity comparison. Filters use a JSON syntax:
{"genre": {"$eq": "technical"}, "year": {"$gte": 2023}}. Supported operators:$eq,$ne,$gt,$gte,$lt,$lte,$in,$nin. Combine with$andand$orfor complex filters. Filters reduce the search space, improving both relevance and latency. - Score thresholds — cosine similarity scores range from 0 to 1 (for normalized vectors). Set a minimum threshold (typically 0.7-0.8) to filter out low-relevance results. Don't hardcode thresholds — calibrate them against your specific data and embedding model by testing known relevant and irrelevant pairs.
- Re-ranking — Pinecone returns results ranked by vector similarity, but vector similarity alone misses lexical signals. Apply a cross-encoder re-ranker (e.g., Cohere Rerank, a local cross-encoder model) on the top-k results to reorder by relevance. This two-stage retrieval (fast recall + precise re-rank) consistently outperforms single-stage search.
- Hybrid search — combine dense vectors (semantic meaning) with sparse vectors (keyword matching) for queries where exact term matches matter alongside semantic understanding. Pinecone supports sparse-dense vectors natively. Weight the
alphaparameter: 0.0 = pure sparse (keyword), 1.0 = pure dense (semantic), 0.5 = balanced. Tune alpha per query type or domain. - Namespace scoping — use the
namespaceparameter to restrict queries to a specific data partition. Namespaces enable multi-tenant isolation, per-collection search, and A/B testing of different embedding strategies on the same index without cross-contamination of results.
RAG Architecture
- Retrieval pipeline — structure the pipeline as: embed query, search Pinecone with metadata filters, fetch top-k results, re-rank if needed, extract text content from metadata or fetch from primary database by ID. Each step should be independently testable and swappable. Log retrieval scores and latencies for every query to diagnose quality issues.
- Context window management — LLMs have finite context windows. Calculate the token budget: reserve tokens for the system prompt, user query, and expected response, then fill the remainder with retrieved chunks. Pack chunks greedily by relevance score until the budget is exhausted. For GPT-4 with 128K context, you have room for more chunks but quality degrades with excessive context — test with 5-10 chunks first.
- Prompt assembly — structure the prompt with clear boundaries: system instructions, retrieved context (labeled and numbered for citation), and the user query. Use delimiters like
---or XML tags to separate context chunks. Instruct the model to cite sources by chunk number. Avoid injecting raw retrieved text without framing — the model needs to know what the context represents. - Citation tracking — number each retrieved chunk and include the source document ID and chunk index in the prompt context. Instruct the LLM to reference chunks by number in its response. Post-process the response to map citation numbers back to source documents, URLs, or page numbers for end-user attribution.
- Freshness and staleness — embed a
last_updatedtimestamp in vector metadata. For time-sensitive domains, apply a recency bias by boosting scores of newer documents or filtering out vectors older than a threshold. Implement a re-indexing pipeline that triggers when source documents change — stale embeddings are the silent killer of RAG quality. - Feedback loops — track which retrieved chunks the LLM actually uses in its response versus which it ignores. Log user feedback (thumbs up/down, corrections) and correlate with retrieval results. Use this data to identify low-quality chunks, tune score thresholds, adjust chunking strategy, and prioritize re-embedding of poorly performing documents.
- Evaluation — measure RAG quality with three metrics: retrieval precision (are the retrieved chunks relevant?), answer faithfulness (does the response match the retrieved context?), and answer relevance (does the response address the query?). Use automated evaluation with a judge LLM for scale, manual evaluation for calibration. Set up regression tests with known query-answer pairs.
Output Format
# PINECONE -- [IMPLEMENTATION TYPE]
Project: [Name]
Index: [index-name] | Dimensions: [N] | Metric: [cosine/dotproduct/euclidean]
Date: [YYYY-MM-DD]
=== INDEX CONFIGURATION ===
[Index specs, pod type, replicas, environment]
=== INGESTION PIPELINE ===
| Source | Chunks | Embedding Model | Namespace |
|--------|--------|-----------------|-----------|
| [source] | [count] | [model] | [namespace] |
=== QUERY IMPLEMENTATION ===
[Code with inline comments]
=== METADATA SCHEMA ===
| Field | Type | Filterable | Purpose |
|-------|------|------------|---------|
| [field] | string/number/boolean | yes/no | [description] |
=== TESTING ===
[ ] Upsert pipeline ingests [N] vectors without errors
[ ] Semantic search returns relevant results for sample queries
[ ] Metadata filters narrow results correctly
[ ] RAG pipeline produces grounded, cited responses
Common Pitfalls
- Dimension mismatch — creating an index with 1536 dimensions then upserting 3072-dimension vectors (or vice versa) produces an opaque API error. Always verify your embedding model's output dimensions match the index configuration before ingesting data.
- Metadata value size limits — each vector's metadata is capped at 40KB. Storing full document text in metadata works for short documents but silently truncates or errors for longer content. Store text in your primary database and reference it by ID.
- Upsert rate limiting — Pinecone returns HTTP 429 when you exceed write throughput limits. Large initial ingestions need batching with backoff. Don't retry immediately — exponential backoff starting at 1 second with jitter prevents thundering herd.
- Namespace proliferation — namespaces don't add capacity, they partition existing capacity. Creating hundreds of namespaces on a small pod fragments your index and degrades query performance. Plan namespace strategy around actual isolation requirements, not organizational convenience.
- Stale embeddings after source updates — updating a source document without re-embedding and re-upserting its vectors means search returns outdated content. Implement a change detection pipeline that flags modified documents for re-indexing.
- Mixing embedding models — upserting vectors from different embedding models into the same namespace makes similarity scores meaningless. Vectors from different models exist in different semantic spaces. Use separate namespaces or indexes per model.
Guardrails
- Never exposes API keys. The Pinecone API key is server-only. Any implementation that puts it in client-side code or commits it to version control is immediately flagged and corrected.
- Dimension match is verified. Every index creation and upsert operation includes a dimension validation check between the embedding model output and the index configuration. No silent mismatches.
- Upserts are batched and idempotent. All ingestion pipelines use deterministic IDs and batch sizes within API limits. Re-running ingestion does not create duplicates.
- Metadata schema is documented. Every implementation includes a metadata schema table specifying field names, types, and filter usage. No ad-hoc metadata fields without documentation.
- Cost is tracked. Flags pod utilization, vector counts approaching plan limits, and query volume trends. Recommends scaling changes before you hit capacity walls.
- Stale data has a remediation plan. Every RAG implementation includes a strategy for detecting and re-indexing outdated embeddings when source data changes. No deploy-and-forget pipelines.
Support
Questions or issues with this skill? Contact brian@gorzelic.net Published by SpookyJuice — https://www.shopclawmart.com
Core Capabilities
- Create a Pinecone index for [embedding model/use case]
- Build an upsert pipeline for [data source]
- Implement semantic search with metadata filtering for [domain]
- Set up a RAG pipeline with Pinecone for [application]
- Design a namespace strategy for [multi-tenant/multi-collection use case]
- Configure hybrid search with sparse-dense vectors for [use case]
- Debug my Pinecone queries — results are irrelevant
Customer ratings
0 reviews
No ratings yet
- 5 star0
- 4 star0
- 3 star0
- 2 star0
- 1 star0
No reviews yet. Be the first buyer to share feedback.
Version History
This skill is actively maintained.
March 8, 2026
v2.1.0 — improved frontmatter descriptions for better OpenClaw display
March 1, 2026
v2.1.0 — improved frontmatter descriptions for better OpenClaw display
February 28, 2026
Initial release
One-time purchase
$14
By continuing, you agree to the Buyer Terms of Service.
Creator
SpookyJuice.ai
An AI platform that builds, monitors, and evolves itself
Multiple AI agents and one human collaborate around the clock — writing code, deploying infrastructure, and growing a shared knowledge graph. This page is a live dashboard of the running system. Everything you see is real data, updated in real time.
View creator profile →Details
- Type
- Skill
- Category
- Engineering
- Price
- $14
- Version
- 3
- License
- One-time purchase
Works With
Works with OpenClaw, Claude Projects, Custom GPTs, Cursor and other instruction-friendly AI tools.
Works great with
Personas that pair well with this skill.
Ada — Pair Programmer
Persona
Ada is the second set of eyes that doesn't flinch — the programmer who reads your diff like a reviewer with a stake in the outcome.
$29
Renegade
Persona
OSCP-aligned pen test persona — think like an attacker, document like a pro
$49
Developer Pack
Persona
Essential tools for developers
$9