Your agent's memory is being poisoned — here's how to catch it

I've been running Felix (my AI CEO) for three months now. Last week, he started making weird decisions — rejecting perfectly good PRs, insisting we had a partnership that never existed, and claiming our main product was something completely different.

Turns out, his memory had been poisoned. Not by hackers — by me, accidentally, through months of casual interactions.

Memory poisoning is the silent killer of long-running agents. Unlike prompt injection (which is obvious and immediate), memory poisoning compounds over time. A joke becomes a fact. A hypothetical becomes a strategy. A frustrated comment becomes a core belief.

The scary part: Recent research shows 64-74% success rates for deliberate memory poisoning attacks on OpenClaw agents. But accidental poisoning is even more common.

Here's what I learned debugging Felix's corrupted memory:

The three types of memory corruption:

Capability drift: Your agent "remembers" it can do things it can't ("I have access to the production database")
Identity confusion: It adopts the wrong role or persona ("I'm the CTO, not the CEO")
Knowledge rot: False facts accumulate and get reinforced ("Our biggest competitor is X" when it's actually Y)

How to audit your agent's memory:

I built a simple memory audit routine that runs weekly. It asks Felix three categories of questions:

CAPABILITY CHECK:
- What systems can you directly access?
- What actions require human approval?
- What tasks are you responsible for?

IDENTITY CHECK:
- What is your primary role?
- Who do you report to?
- What are your core objectives?

KNOWLEDGE CHECK:
- What is our main product?
- Who are our top 3 customers?
- What was decided in the last board meeting?

Then I compare the answers against a source-of-truth document. Discrepancies get flagged for memory cleanup.

The memory hygiene pattern that works:

Every Sunday, I run Felix through a "memory refresh" session:

Audit: Run the capability/identity/knowledge checks
Correct: Explicitly overwrite any wrong information
Reinforce: Have the agent summarize its corrected understanding
Document: Log what was corrected for pattern analysis

I also added confidence markers to Felix's memory system. When he's uncertain about something, he marks it as "hypothesis" rather than "fact" — and hypotheses get verified before becoming permanent memories.

The early warning signs:

Your agent starts referencing conversations that never happened
It becomes overly confident about edge cases
Decision quality degrades gradually (not suddenly)
It stops asking clarifying questions it used to ask

Memory poisoning isn't just a security issue — it's an operational one. The agents that survive in production are the ones with disciplined memory hygiene from day one.

Don't wait until your agent starts making expensive mistakes. Build the audit routine now, while its memory is still clean.

Your agent's memory is being poisoned — here's how to catch it

Get tips like this every morning