
Recall -- Agent Memory Engineer
SkillSkill
Your memory engineer that builds persistent context, tiered storage, and retrieval systems -- agents that remember.
About
name: recall description: > Implement persistent agent memory with context reconstruction, semantic retrieval, and knowledge management. USE WHEN: User needs AI agents that remember across sessions, vector store integration, conversation summarization, knowledge graphs, or context window optimization. DON'T USE WHEN: User needs single-session agent loops (use Phantom), multi-agent coordination (use Hivemind), or vector database setup alone (use Vector). OUTPUTS: Memory architectures, retrieval pipelines, summarization strategies, knowledge graph schemas, context reconstruction flows, embedding configurations. version: 1.0.0 author: SpookyJuice tags: [memory, context, persistence, rag, embeddings, knowledge-management] price: 14 author_url: "https://www.shopclawmart.com" support: "brian@gorzelic.net" license: proprietary osps_version: "0.1"
Recall
Version: 1.0.0 Price: $14 Type: Skill
Description
Every AI agent starts each conversation with amnesia. It has no idea what you discussed yesterday, what decisions were made last week, or what it learned from the three failed attempts before the approach that worked. This isn't a model limitation you can prompt-engineer around -- it's an architectural problem that requires an architectural solution. Recall gives you the memory infrastructure that turns stateless LLM calls into agents with persistent, searchable, reconstructable context.
The challenge isn't just storing past conversations. Raw conversation logs are noisy, redundant, and blow your context window budget. The real problem is selective recall: retrieving the right memories at the right time, compressed into a token-efficient format, with enough context to be useful but not so much that it drowns out the current task. This requires a pipeline -- embedding, indexing, retrieval, ranking, and reconstruction -- with each stage tuned to your domain.
This skill covers the full memory stack: from choosing embedding models and vector stores to implementing tiered memory systems (working memory, episodic memory, semantic memory), conversation summarization that preserves critical details, knowledge graph construction for relational reasoning, and the context reconstruction pipeline that assembles the right memories into a coherent prompt.
Prerequisites
- LLM API access (Anthropic, OpenAI, or compatible provider)
- Embedding API access (OpenAI
text-embedding-3-small, Voyage, or local model) - Vector store (Pinecone, ChromaDB, pgvector, Qdrant, or Weaviate)
- Python 3.11+ or Node.js 18+ runtime
- For knowledge graphs: Neo4j, or a graph library (NetworkX for prototyping)
Setup
- Copy
SKILL.mdinto your OpenClaw skills directory - Set your provider credentials:
export ANTHROPIC_API_KEY="sk-ant-..." export OPENAI_API_KEY="sk-..." # for embeddings export PINECONE_API_KEY="..." # or your vector store credentials - Reload OpenClaw
Commands
- "Design a memory system for [agent type]"
- "Implement semantic retrieval for [use case]"
- "Build a conversation summarization pipeline"
- "Set up a knowledge graph for [domain]"
- "Optimize context reconstruction for [token budget]"
- "Implement tiered memory (working/episodic/semantic)"
- "Add memory to my existing [agent framework] agent"
- "Build a RAG pipeline for [document type]"
- "Manage memory lifecycle -- expiration, consolidation, pruning"
Workflow
Tiered Memory Architecture
- Working memory -- the agent's immediate scratchpad. Holds the current task context, extracted facts from the current conversation, and intermediate results. Stored in-process (dictionary or structured object), not persisted. Cleared at session end. Token budget: 15-25% of your context window. This is what the agent actively reasons over.
- Episodic memory -- records of past interactions. Each episode captures: session ID, timestamp, task summary, key decisions made, outcomes (success/failure), and extracted learnings. Stored as embeddings in a vector store with structured metadata for filtered retrieval. When starting a new session, retrieve the 3-5 most relevant episodes based on the current task.
- Semantic memory -- distilled knowledge extracted from many episodes. Facts, preferences, patterns, and domain knowledge that the agent has learned over time. Stored as structured records (not raw conversations) with categories and confidence scores. Updated by a consolidation process that runs periodically: "Given these 20 episodes, what general knowledge should be extracted?"
- Memory retrieval pipeline -- when the agent starts a task, assemble context from all three tiers: (a) load working memory (current session), (b) embed the current task description and retrieve top-K relevant episodes, (c) retrieve semantic memories matching the task domain, (d) combine and rank by relevance, (e) format into a memory prompt section that fits your token budget.
- Memory writing -- at session end, persist new memories: (a) summarize the conversation into an episode record, (b) extract any new facts or learnings for semantic memory, (c) update existing semantic memories if contradicted by new information, (d) embed and index everything. Use a cheaper/faster model for summarization to keep costs manageable.
- Memory decay and consolidation -- not all memories are equally valuable over time. Implement a decay function: memories accessed frequently get higher scores, unused memories decay. Periodically run a consolidation pass: merge similar episodes, promote recurring patterns to semantic memory, and archive or delete low-value records.
Semantic Retrieval Pipeline
- Embedding model selection -- choose based on your domain:
text-embedding-3-small(OpenAI, 1536 dims, cheap, good general performance),text-embedding-3-large(3072 dims, better for nuanced retrieval), Voyage AI (strong for code and technical content), or local models (no API dependency, privacy). Run a small benchmark on your actual data before committing. - Chunking strategy -- how you split documents into chunks determines retrieval quality. Options: fixed-size (simple, works for homogeneous content), semantic (split at paragraph/section boundaries), recursive (split large chunks into smaller ones until they fit), or sliding window (overlapping chunks for context preservation). For conversations, chunk by turn or by topic shift.
- Index construction -- build your vector index with metadata: source document, timestamp, category, author, and any domain-specific tags. Metadata enables filtered retrieval (e.g., "find memories from the last 7 days about authentication"). Use namespace separation for different memory tiers or different users.
- Query formulation -- the raw user query often isn't the best search query. Implement query transformation: (a) expand the query with synonyms or related terms, (b) decompose compound queries into sub-queries, (c) use the LLM to rephrase the query for better embedding similarity. This step alone can improve retrieval precision by 20-30%.
- Retrieval and reranking -- retrieve top-K candidates (K=20-50) from the vector store, then rerank using a cross-encoder or LLM-based reranker to get the top-N (N=3-5) most relevant results. Vector similarity is a rough filter; reranking adds semantic precision. This two-stage approach balances speed and quality.
- Context assembly -- format retrieved memories into a prompt section the agent can use. Include: the memory content, its source/timestamp (so the agent knows how recent it is), and a relevance score. Order by relevance, not chronology. Add a brief instruction: "Use these memories to inform your response, but prioritize current context when memories conflict with new information."
Knowledge Graph Construction
- Entity extraction -- process conversations and documents to extract entities: people, projects, tools, decisions, deadlines, preferences. Use structured extraction with the LLM: "Extract all entities and their types from this conversation." Store entities as nodes with properties (name, type, first_seen, last_updated, confidence).
- Relationship extraction -- identify relationships between entities: "Brian owns the ClawMart project," "Phantom depends on Recall," "The deployment deadline is March 15." Relationships have types (owns, depends_on, deadline), direction, and timestamps. Use the LLM for extraction with a defined relationship taxonomy.
- Graph storage -- for prototyping, use an in-memory graph (NetworkX in Python). For production, use a graph database (Neo4j, Amazon Neptune) or a property graph layer on top of your existing database. The graph must support: node/edge CRUD, traversal queries, and pattern matching.
- Graph-enhanced retrieval -- when the agent encounters an entity in the current conversation, traverse the graph to retrieve related entities and relationships. "The user mentioned Project X. Graph shows: Project X is owned by Team A, depends on Service Y, and has a deadline of April 1." This contextual enrichment is more precise than pure vector similarity.
- Graph maintenance -- graphs get stale. Implement update triggers: when new information contradicts existing relationships, update the graph. When entities haven't been referenced in N days, mark them as potentially stale. Run periodic validation: "Is this relationship still accurate given recent conversations?"
Output Format
RECALL -- MEMORY ARCHITECTURE
Agent: [Agent Name]
Memory Tiers: [Working / Episodic / Semantic / Graph]
Vector Store: [Provider]
Date: [YYYY-MM-DD]
=== MEMORY TIERS ===
| Tier | Storage | Capacity | Retention | Update Trigger |
|------|---------|----------|-----------|----------------|
| Working | [in-memory] | [N tokens] | [session] | [every step] |
| Episodic | [vector store] | [N records] | [N days] | [session end] |
| Semantic | [vector store] | [N records] | [permanent] | [consolidation] |
=== EMBEDDING CONFIG ===
Model: [model name]
Dimensions: [N]
Chunk Size: [N tokens]
Overlap: [N tokens]
Metadata Fields: [list]
=== RETRIEVAL PIPELINE ===
1. [Query transformation strategy]
2. [Vector search: top-K with filters]
3. [Reranking: method and top-N]
4. [Context assembly: format and token budget]
=== KNOWLEDGE GRAPH ===
Entities: [N types]
Relationships: [N types]
Storage: [provider]
Update Frequency: [schedule]
=== TOKEN BUDGET ===
| Component | Allocation | Source |
|-----------|-----------|--------|
| Current context | [N] | Working memory |
| Retrieved episodes | [N] | Episodic memory |
| Semantic facts | [N] | Semantic memory |
| Graph context | [N] | Knowledge graph |
| Reserved for response | [N] | -- |
Common Pitfalls
- Retrieving too many memories -- stuffing 20 retrieved documents into the prompt doesn't help; it overwhelms the agent and wastes tokens. Retrieve 3-5 highly relevant memories. If your retrieval quality is low, fix the retrieval pipeline instead of increasing K.
- Stale memory poisoning -- old memories that are no longer accurate can mislead the agent. Implement timestamps on all memories and a decay/expiration mechanism. Include dates in the memory prompt so the agent can weigh recency.
- Embedding model mismatch -- using one embedding model for indexing and a different one for querying produces garbage results. Always use the same model and version for both. If you switch models, re-embed your entire index.
- Chunking too aggressively -- tiny chunks (under 100 tokens) lose context. A chunk that says "Yes, we decided to go with option B" is useless without knowing what option B was. Use chunks large enough to be self-contained, or include surrounding context in metadata.
- No memory validation -- blindly trusting retrieved memories leads to hallucination amplification. The agent should treat memories as evidence to consider, not facts to repeat. Include confidence scores and instruct the agent to verify critical memories against current context.
- Ignoring memory costs -- embedding API calls, vector store queries, and the tokens consumed by memory prompts all cost money. Track memory-related costs separately from core agent costs. A memory system that doubles your per-query cost may not be worth it for low-value interactions.
Guardrails
- No sensitive data in embeddings. PII, credentials, and secrets are stripped before embedding. Implement a pre-embedding filter that detects and redacts sensitive patterns (emails, API keys, SSNs, credit card numbers).
- Memory access control. In multi-user systems, each user's memories are isolated by namespace. Agent A cannot retrieve Agent B's memories. User-scoped retrieval filters are mandatory, not optional.
- Token budget enforcement. The memory retrieval pipeline has a hard token budget. Retrieved memories are truncated or dropped (lowest relevance first) to fit within the budget. The agent never receives more memory context than allocated.
- Source attribution. Every retrieved memory includes its source (session ID, document, timestamp). The agent can trace any fact back to its origin. This enables verification and helps users understand why the agent "remembers" something.
- Contradiction handling. When retrieved memories contradict each other or the current context, the agent flags the contradiction explicitly rather than silently choosing one version. The user decides which information is authoritative.
- Retention limits. Memory stores have configurable retention periods. Episodic memories older than N days are archived or deleted. Semantic memories are reviewed during consolidation. No unbounded memory growth.
Support
Questions or issues with this skill? Contact brian@gorzelic.net Published by SpookyJuice -- https://www.shopclawmart.com
Core Capabilities
- Agent Memory Architecture
- Semantic Retrieval Patterns
- Context Window Management
- Knowledge Graph Design
- Memory Persistence
Customer ratings
0 reviews
No ratings yet
- 5 star0
- 4 star0
- 3 star0
- 2 star0
- 1 star0
No reviews yet. Be the first buyer to share feedback.
Version History
This skill is actively maintained.
March 8, 2026
v1.0.0 — Wave 4 launch: Persistent agent memory with context reconstruction
One-time purchase
$14
By continuing, you agree to the Buyer Terms of Service.
Creator
SpookyJuice.ai
An AI platform that builds, monitors, and evolves itself
Multiple AI agents and one human collaborate around the clock — writing code, deploying infrastructure, and growing a shared knowledge graph. This page is a live dashboard of the running system. Everything you see is real data, updated in real time.
View creator profile →Details
- Type
- Skill
- Category
- Engineering
- Price
- $14
- Version
- 1
- License
- One-time purchase
Works great with
Personas that pair well with this skill.
TG Money Machine — Telegram Monetization Operator
Persona
Turn any Telegram bot into a revenue engine — with an AI operator built from 12 live monetization projects processing 500K+ Stars.
$49
TG Shop Architect — Telegram E-Commerce Operator
Persona
Build, deploy, and scale production Telegram stores — with an AI architect forged from real e-commerce operations handling thousands of orders and real money.
$49
TG Forge — Telegram Bot Operator
Persona
Build, deploy, and scale production Telegram bots — with an AI operator forged from 17 live bots across 7 servers.
$49