Advanced OpenClaw: Building Your Own Custom Memory Schema

Most people building agents on OpenClaw hit the same wall around week two.

The first few days feel incredible. You wire up a skill, connect it to a goal, watch the agent run, and think this is the future. Then you start building something real — a research assistant, a customer onboarding agent, a content pipeline — and the cracks show fast. Your agent forgets what it learned three interactions ago. It retrieves irrelevant garbage from its memory store. It confuses User A's data with User B's preferences. Or, my personal favorite: it works perfectly until you restart, at which point it has the memory of a goldfish.

The default memory configuration in OpenClaw is fine for demos. It is not fine for anything you'd actually want to ship. The good news is that OpenClaw's memory system is genuinely flexible — probably the most extensible of any agent framework I've used. The bad news is that almost nobody documents the advanced patterns well, which means you're left reverse-engineering things from source code and half-finished Discord threads.

This post is everything I've figured out about building custom memory schemas in OpenClaw after months of trial, error, and a few spectacularly broken agents. We're going to cover the architecture, walk through actual implementation, and get you to a memory system that doesn't fall apart after five conversations.

Why Default Memory Breaks Down

Let's be specific about what goes wrong, because "memory doesn't work" isn't useful.

OpenClaw's default memory behavior is essentially a conversation buffer. It appends interactions to a running log and feeds that log (or a truncated version of it) back into the agent's context on each turn. This is the ConversationBuffer schema, and it has three fatal flaws for production use:

1. Context window bloat. After 8-12 meaningful interactions, you're eating thousands of tokens on memory alone. Your agent gets slower, more expensive, and — counterintuitively — dumber, because the signal-to-noise ratio in its context collapses.

2. No relevance filtering. Everything is treated equally. A throwaway clarification from turn two carries the same weight as a critical user preference stated in turn nine. The agent can't distinguish between what matters and what's noise.

3. Zero persistence. Kill the process, lose the memory. This is obviously unusable for anything beyond a single-session toy.

If you've hit any of these, you're not doing anything wrong. You've simply outgrown the default. Time to build something better.

The Architecture You Actually Want

After building and rebuilding memory systems for multiple OpenClaw agents, I've converged on a three-tier architecture that handles most real-world use cases cleanly. I call it hierarchical memory, though that's not an official OpenClaw term — it's just the pattern that works.

Here's the structure:

Tier 1: Working Memory — The immediate conversation context. Short, recent, and ruthlessly pruned. Think of it as the agent's "RAM."
Tier 2: Structured Facts — Explicit, editable key-value knowledge. User preferences, extracted entities, confirmed facts. This is the agent's "notebook."
Tier 3: Long-Term Retrieval — Vector-embedded memories for semantic search. Past interactions, documents, learned patterns. This is the agent's "library."

The magic isn't in any single tier. It's in how they interact. Working memory stays small and fast. When a conversation ends (or hits a threshold), important information gets extracted and routed to either Tier 2 (if it's a discrete fact) or Tier 3 (if it's contextual knowledge). On each new turn, the agent pulls from all three tiers, but with different strategies: Tier 1 is always included, Tier 2 is filtered by relevance tags, and Tier 3 is retrieved via semantic similarity with recency weighting.

Let's build it.

Step 1: Define Your Custom Memory Schema

OpenClaw lets you define memory schemas as configuration objects that control how memories are stored, retrieved, scored, and expired. The default ConversationBuffer is just one such schema. We're going to replace it.

Create a new file in your OpenClaw project — I usually put mine at memory/custom_schema.py:

from openclaw.memory import MemorySchema, MemoryTier, StorageBackend
from openclaw.memory.scoring import recency_weight, relevance_score, importance_flag

class HierarchicalMemory(MemorySchema):
    """
    Three-tier memory: working context, structured facts, long-term retrieval.
    """

    def __init__(self, config):
        super().__init__(config)

        # Tier 1: Working Memory (last N turns, always in context)
        self.working = MemoryTier(
            name="working_memory",
            storage=StorageBackend.IN_MEMORY,
            max_items=10,          # Keep last 10 interactions
            eviction="oldest",     # Simple FIFO
            always_include=True,   # Always injected into prompt
        )

        # Tier 2: Structured Facts (persistent, editable, tagged)
        self.facts = MemoryTier(
            name="structured_facts",
            storage=StorageBackend.POSTGRES,  # or REDIS, SQLITE
            schema_fields={
                "fact_key": str,        # e.g., "user_diet_preference"
                "fact_value": str,       # e.g., "vegetarian since March"
                "category": str,         # e.g., "personal", "project", "config"
                "confidence": float,     # 0.0 to 1.0
                "source_turn": int,      # which conversation turn created this
                "last_updated": "auto",  # automatic timestamp
            },
            max_items=500,
            eviction="lowest_confidence",
            retrieval_mode="filtered",  # Retrieved by category/key match
        )

        # Tier 3: Long-Term Semantic Memory (vector store)
        self.long_term = MemoryTier(
            name="long_term",
            storage=StorageBackend.VECTOR,  # Uses configured vector provider
            embedding_model="default",       # Uses your OpenClaw embedding config
            max_items=10000,
            retrieval_mode="semantic",
            retrieval_count=5,               # Top 5 relevant memories per query
            scoring_function=lambda mem, query: (
                0.7 * relevance_score(mem, query) +
                0.2 * recency_weight(mem, decay_days=30) +
                0.1 * importance_flag(mem)
            ),
            metadata_fields=["user_id", "session_id", "topic", "created_at"],
        )

        self.register_tiers([self.working, self.facts, self.long_term])

A few things to notice here:

The scoring function on Tier 3 is critical. Pure cosine similarity for memory retrieval is a disaster. You'll get semantically similar but contextually irrelevant results constantly. That weighted scoring function — 70% relevance, 20% recency, 10% importance — is the ratio I've found works best for most conversational agents. Adjust the recency decay_days based on your use case. A daily standup agent might use 7 days. A long-running research agent might use 90.

Structured facts use filtered retrieval, not semantic search. This is intentional. When you know a user is vegetarian, you don't want to "semantically search" for that — you want to look it up by key. Mixing retrieval strategies across tiers is what makes this architecture robust.

Storage backends are per-tier. Working memory lives in RAM because it's ephemeral and fast. Facts go in Postgres because they need to persist, be editable, and support structured queries. Long-term goes in your vector store. OpenClaw's backend abstraction handles the connection details — you just configure providers in your openclaw.config.

Step 2: Configure the Memory Pipeline

Having tiers isn't enough. You need to define how information flows between them. This is where most people's custom memory implementations fall apart — they build the storage but forget the routing logic.

from openclaw.memory import MemoryPipeline, ExtractionRule

class MemoryRouter(MemoryPipeline):
    """
    Routes information between memory tiers after each interaction.
    """

    def __init__(self, memory_schema):
        super().__init__(memory_schema)

        # After each turn, check if working memory has extractable facts
        self.add_rule(ExtractionRule(
            source_tier="working_memory",
            target_tier="structured_facts",
            trigger="every_turn",
            extraction_prompt="""
            Review the latest interaction. Extract any discrete, factual 
            information worth remembering long-term. Return as JSON:
            [{"fact_key": "...", "fact_value": "...", "category": "...", "confidence": 0.0-1.0}]
            If nothing worth extracting, return empty array: []
            """,
            dedup_strategy="update_existing",  # Update if fact_key already exists
        ))

        # When working memory hits capacity, summarize and archive
        self.add_rule(ExtractionRule(
            source_tier="working_memory",
            target_tier="long_term",
            trigger="on_eviction",  # When items are evicted from working memory
            extraction_prompt="""
            Summarize the following conversation turns into a concise memory 
            that captures the key context, decisions made, and any unresolved 
            questions. One paragraph max.
            """,
            metadata_auto=["user_id", "session_id", "topic"],
        ))

        # Periodic fact consolidation
        self.add_rule(ExtractionRule(
            source_tier="structured_facts",
            target_tier="structured_facts",
            trigger="every_n_turns",
            trigger_n=20,
            extraction_prompt="""
            Review all stored facts. Identify any that are contradictory, 
            outdated, or redundant. Return a list of updates:
            [{"action": "update|delete", "fact_key": "...", "new_value": "..."}]
            """,
        ))

The dedup_strategy: "update_existing" flag on the first rule is doing a lot of heavy lifting. Without it, every time a user mentions their name, you get a new memory entry. With it, the system checks if a fact with that key already exists and updates it in place. This solves the "I told my agent the wrong fact and now it's permanently polluted" problem — new information naturally overwrites old information.

The periodic fact consolidation rule (every 20 turns) is my favorite pattern. It has the LLM review its own knowledge base and clean house. Contradictions get resolved. Outdated info gets pruned. It's like giving your agent a built-in reflection habit.

Step 3: Wire It Into Your Agent

Now connect the custom schema and pipeline to your OpenClaw agent:

from openclaw import Agent, AgentConfig
from memory.custom_schema import HierarchicalMemory
from memory.router import MemoryRouter

config = AgentConfig(
    name="research_assistant",
    goal="Help users research topics thoroughly and remember their preferences",
    model="your-configured-model",

    memory={
        "schema": HierarchicalMemory,
        "pipeline": MemoryRouter,
        "vector_provider": "qdrant",  # or chroma, pinecone, pgvector
        "persistence": {
            "postgres_url": "postgresql://localhost:5432/openclaw_memory",
            "auto_save": True,
            "save_interval_seconds": 30,
        },
        "isolation": {
            "strategy": "user_id",  # Memories scoped per user
            "enforce_strict": True,  # Never leak between users
        },
    },
)

agent = Agent(config)

The isolation config is one of those things you don't think about until User B gets User A's medical history in their response. Set it up from day one. enforce_strict: True adds a hard filter on all memory retrievals — no memory without a matching user_id is ever returned, regardless of similarity score.

Step 4: Give Your Agent Memory Tools

This is the part most people skip, and it's arguably the most impactful. Instead of relying entirely on automatic memory extraction, give your agent explicit tools to manage its own memory:

from openclaw.skills import Skill

class MemoryTools(Skill):
    name = "memory_management"
    description = "Tools for the agent to explicitly manage its memory"

    def remember_fact(self, key: str, value: str, category: str = "general") -> str:
        """Explicitly store an important fact for later recall."""
        self.agent.memory.facts.upsert({
            "fact_key": key,
            "fact_value": value,
            "category": category,
            "confidence": 1.0,  # Explicitly stated = high confidence
        })
        return f"Remembered: {key} = {value}"

    def forget(self, key: str) -> str:
        """Remove a specific fact from memory."""
        self.agent.memory.facts.delete(key=key)
        return f"Forgot: {key}"

    def recall_facts(self, category: str = None) -> str:
        """List all known facts, optionally filtered by category."""
        facts = self.agent.memory.facts.list(category=category)
        return "\n".join([f"- {f['fact_key']}: {f['fact_value']}" for f in facts])

    def search_memory(self, query: str, limit: int = 5) -> str:
        """Search long-term memory for relevant past context."""
        results = self.agent.memory.long_term.search(query, limit=limit)
        return "\n---\n".join([r["content"] for r in results])

Add these skills to your agent's skill list, and suddenly your agent can decide when to store something important, actively search its own memory when uncertain, and clean up mistakes. It sounds minor. In practice, it transforms agent reliability. An agent that can say "let me check what I know about your project" and explicitly search its memory before answering is dramatically more accurate than one relying on passive memory injection alone.

Debugging: The Memory Inspector

You cannot debug what you cannot see. Add an inspection endpoint or CLI command early:

# Quick memory inspection utility
def inspect_memory(agent, user_id=None):
    print("=== WORKING MEMORY ===")
    for item in agent.memory.working.list():
        print(f"  [{item['turn']}] {item['content'][:100]}...")

    print("\n=== STRUCTURED FACTS ===")
    for fact in agent.memory.facts.list(user_id=user_id):
        print(f"  {fact['fact_key']}: {fact['fact_value']} "
              f"(confidence: {fact['confidence']}, updated: {fact['last_updated']})")

    print(f"\n=== LONG-TERM MEMORY ({agent.memory.long_term.count()} items) ===")
    recent = agent.memory.long_term.list(limit=5, sort="created_at_desc")
    for mem in recent:
        print(f"  [{mem['created_at']}] {mem['content'][:120]}...")

Run this after every few interactions during development. You'll immediately see when extraction is pulling garbage, when facts are duplicating, or when long-term memory is filling up with noise. I cannot overstate how much time this saves.

Common Gotchas and How to Fix Them

"My extraction prompt keeps returning garbage." The quality of your extraction_prompt matters enormously. Be specific. Tell it what format to return. Give it examples. I've found that spending an hour refining these prompts saves ten hours of debugging downstream.

"Vector search returns irrelevant memories." Check your scoring function weights. If recency is too low, old memories dominate. If it's too high, your agent develops amnesia for anything older than a few days. Also check that your embeddings model is appropriate — some embedding models are terrible at capturing factual content.

"My Postgres table is growing unbounded." Set up a background cleanup job. Memories with confidence below 0.3 that haven't been accessed in 30 days can usually be safely pruned. OpenClaw's memory tiers support eviction policies, but you should also have a manual cleanup for edge cases.

"Multi-agent setups share memory incorrectly." If you're running multiple agents in a pipeline (common with OpenClaw's orchestrator pattern), each agent needs its own memory scope. Use the namespace parameter in your tier configs to prevent bleed.

Skip the Setup: Felix's OpenClaw Starter Pack

If you've read this far and thought "this is exactly what I need but I really don't want to spend a weekend configuring Postgres, vector stores, and extraction prompts" — I get it. Honestly, I built most of this from scratch the hard way, and I wouldn't necessarily recommend that path for everyone.

Felix's OpenClaw Starter Pack on Claw Mart includes a pre-built version of this hierarchical memory pattern along with other pre-configured skills for $29. The memory schema it ships with is very close to what I've described here — three-tier architecture, scored retrieval, fact extraction pipeline, memory tools for the agent. It's not identical to my setup (Felix makes some different choices around the scoring weights and uses a slightly different consolidation strategy), but it's solid and it works out of the box. If you don't want to set all this up manually, it's the fastest way to get a production-quality memory system running on OpenClaw.

I still recommend understanding the architecture even if you use the starter pack, because you'll inevitably want to customize the extraction prompts and scoring weights for your specific use case. But starting from a working foundation beats starting from zero.

What This Gets You

An agent running this memory architecture behaves fundamentally differently from one on default memory. Here's what changes:

Conversations stay coherent past 10+ turns because working memory is compact and relevant, not a bloated transcript.
The agent remembers user-specific details across sessions because structured facts persist and are recalled by key, not by vibes.
Old context doesn't pollute current interactions because long-term retrieval uses scored, weighted search instead of dumping everything into the prompt.
Mistakes are fixable because the agent (or user, or admin) can explicitly update or delete facts.
You can actually see what the agent knows because inspection is built into the design, not bolted on as an afterthought.

I've been running agents with this pattern for months now. The difference in reliability is night and day compared to my early buffer-memory attempts. The setup takes effort upfront — maybe a few hours if you're configuring everything from scratch, or under an hour if you start from a pre-built kit — but it pays for itself almost immediately.

Next Steps

Start with Tier 2 (structured facts) alone if the full three-tier setup feels overwhelming. Even just adding persistent key-value memory to your agent is a massive upgrade.
Invest time in your extraction prompts. They're the bottleneck for memory quality. Test them in isolation before connecting them to the pipeline.
Set up the memory inspector from day one. You will need it.
Run a 50-turn conversation as a stress test before deploying. Most memory issues don't surface until you're past 20 turns.
Review OpenClaw's memory documentation for the latest API changes — the memory system is under active development and new backend options are being added regularly.

Build the memory right and everything else about your OpenClaw agent gets easier. Skip it and you'll spend forever debugging symptoms of a problem that lives at the foundation. Do the work now. Your future self will thank you.