OpenClaw Sessions vs Persistent Agents: When to Use Which
OpenClaw Sessions vs Persistent Agents: When to Use Which

Let me be real about something that took me way too long to figure out: the difference between running an OpenClaw session and running a persistent agent is the difference between a sticky note and a filing cabinet. Both hold information. One of them is useful past Thursday.
If you've been messing around with OpenClaw for more than a week, you've hit this wall. You spin up a session, the claw starts clicking around your screen, it fills in three form fields beautifully, and then... the context window gets stuffed with screenshots, the model forgets what it was doing, you burn $8 in tokens, and you're back to doing the task manually while questioning your life choices.
The answer isn't "OpenClaw doesn't work." The answer is you're using the wrong architecture for the task. Let me explain.
The Core Problem Nobody Explains Well
OpenClaw gives your agent a mouse, a keyboard, and a screenshot loop. Every action the agent takes generates a new screenshot, which gets fed back into the context so the model can see what happened. Simple, elegant, and totally fine for short tasks.
Here's where it breaks down: every screenshot is roughly 50,000 to 150,000 tokens depending on your screen resolution and complexity. After 15 turns, you've potentially burned through a million tokens. Your context window is packed. The model starts losing the plot ā literally forgetting the original objective and doing bizarre stuff like opening random tabs or clicking the same pixel in an infinite death spiral.
This isn't an OpenClaw bug. It's a fundamental architectural mismatch. You're using a stateless session pattern for a stateful, long-running task.
Sessions: What They Are and When They're Great
An OpenClaw session is essentially a single continuous conversation between the agent and the screen. You start it, the agent acts, and when the session ends (or crashes, or times out), everything is gone. There's no memory, no checkpoint, no way to pick up where you left off.
Sessions are perfect for:
- Quick, atomic tasks ā rename a file, send a specific email, change one setting
- Demonstration and testing ā showing that an automation concept works before you invest in persistence
- Tasks under ~10 actions ā anything the model can complete before context pressure builds up
- One-shot extractions ā grab a specific piece of data from a screen and return it
Here's what a basic session looks like in OpenClaw:
from openclaw import Session
session = Session(
display_resolution=(1280, 720), # Lower res = fewer tokens per screenshot
max_turns=15
)
result = session.run(
task="Open the company wiki, find the PTO policy, and extract the number of vacation days for US employees.",
on_complete="return_text"
)
print(result.extracted_data)
This is clean. It works. The task is scoped tightly enough that the agent can knock it out in 5-8 actions without context issues. You get your answer, the session ends, everyone's happy.
The mistake people make is trying to stretch this pattern to handle things like "process all 47 invoices in my email" or "apply to 20 jobs on LinkedIn." That's not a session task. That's an agent task.
Persistent Agents: The Architecture That Actually Scales
A persistent agent in OpenClaw is a fundamentally different beast. Instead of treating the interaction as a single unbroken conversation, you're building a stateful process that:
- Checkpoints after every meaningful action ā stores progress to a database so you can resume anytime
- Summarizes history instead of replaying it ā instead of feeding 15 screenshots back into context, it feeds a text summary of what's been done
- Uses hybrid state representation ā combines accessibility tree data, selective screenshots, and structured element lists instead of raw full-desktop images
- Has explicit error recovery ā detects loops, validates that actions actually changed the screen, and can roll back to previous checkpoints
Here's the difference in practice. This is how you'd set up a persistent agent for a multi-step task:
from openclaw import PersistentAgent, CheckpointStore
from openclaw.memory import SummarizingMemory
from openclaw.state import HybridStateEncoder
# Set up persistent storage
store = CheckpointStore(
backend="sqlite", # Also supports postgres, redis
db_path="./agent_checkpoints.db"
)
# Configure memory management
memory = SummarizingMemory(
short_term_window=5, # Keep last 5 actions in full detail
summarize_after=5, # Summarize older actions to text
include_screenshots=False, # Don't include old screenshots in summary
summary_strategy="progressive" # Each summary builds on the last
)
# Use hybrid state instead of raw screenshots
state_encoder = HybridStateEncoder(
use_accessibility_tree=True, # Extract DOM/UI structure as text
use_ocr=True, # OCR visible text elements
screenshot_mode="selective", # Only screenshot when ambiguous
crop_to_focus=True # Crop screenshots to active area only
)
agent = PersistentAgent(
checkpoint_store=store,
memory=memory,
state_encoder=state_encoder,
checkpoint_interval=3, # Checkpoint every 3 actions
max_loop_detection=3, # Abort if same action repeated 3x
display_resolution=(1280, 720)
)
result = agent.run(
task="Process all unread invoices in Gmail. For each invoice, extract vendor name, amount, and due date. Save to invoices.csv.",
session_name="invoice-processing-jan"
)
The magic here is what happens under the hood. After every 3 actions, the agent saves its state. If your laptop dies, if the API rate-limits you, if you need to go to lunch ā it doesn't matter. You pick up right where you left off:
# Resume a previously interrupted agent
agent = PersistentAgent(
checkpoint_store=store,
memory=memory,
state_encoder=state_encoder
)
result = agent.resume(session_name="invoice-processing-jan")
The agent loads its last checkpoint, reads its summarized history ("You have processed 12 of 47 invoices. Last completed: Acme Corp invoice #4521 for $3,200 due Feb 15. The Gmail tab is open to page 2 of unread messages."), and continues working.
This is the pattern that turns OpenClaw from a cool demo into something you can actually rely on for real work.
The Hybrid State Trick That Saves You 90% on Tokens
This single optimization is worth the entire blog post. Stop sending full desktop screenshots for every action.
The HybridStateEncoder I showed above does something clever: instead of (or alongside) raw screenshots, it extracts the accessibility tree ā a structured text representation of every UI element on screen. This gives the model a precise map of buttons, text fields, labels, and their positions without the insane token cost of an image.
Here's what the model actually sees with hybrid state enabled:
[Accessibility Tree - Current Screen]
Window: "Gmail - Google Chrome"
āāā Header
ā āāā SearchBox: "Search mail" (focused: false)
ā āāā Button: "Compose" (x:84, y:156)
ā āāā UserAvatar: "john@company.com"
āāā Sidebar
ā āāā Link: "Inbox (23)" (x:62, y:234)
ā āāā Link: "Starred" (x:62, y:268)
ā āāā Link: "Sent" (x:62, y:302)
āāā EmailList
ā āāā EmailRow (unread): "Acme Corp - Invoice #4522" (x:450, y:312)
ā āāā EmailRow (unread): "Globex Inc - Payment Due" (x:450, y:356)
ā āāā EmailRow (read): "Team Standup Notes" (x:450, y:400)
[Summary of completed actions]
Processed 12 invoices. Current CSV has 12 rows. Now on page 2 of inbox.
[Selective Screenshot]
(cropped 400x200 image of just the email list area ā ~8,000 tokens instead of 120,000)
That text representation might be 2,000 tokens. A full desktop screenshot of the same thing? Easily 80,000-120,000 tokens. One Discord user reported going from ~$8 per task to under $0.50 with this approach. That's not a marginal improvement. That's the difference between "interesting experiment" and "I'm actually using this every day."
When to Use Which: The Decision Framework
Here's my simple rule, and it's served me well across dozens of automations:
Use a Session when:
- The task takes fewer than 10 actions
- You don't need to resume if interrupted
- You're extracting a single piece of information
- You're testing or prototyping a workflow
- The task is fully self-contained (no dependencies on previous runs)
Use a Persistent Agent when:
- The task involves processing multiple items (emails, invoices, applications, records)
- Total actions will exceed 15-20
- You need reliability ā failure shouldn't mean starting over
- The task might span multiple sittings or hit rate limits
- You're building something for repeated use, not a one-off
Use a Persistent Agent with Hybrid State when:
- You care about cost at all (and you should)
- The UI you're navigating has standard web elements (most SaaS tools, email, spreadsheets)
- You're running multiple agents or sessions
- The task is long enough that token costs compound
There's also a middle ground that's worth knowing about:
from openclaw import Session
from openclaw.state import HybridStateEncoder
# A session with hybrid state ā no persistence, but way cheaper
session = Session(
state_encoder=HybridStateEncoder(
use_accessibility_tree=True,
screenshot_mode="selective"
),
display_resolution=(1280, 720),
max_turns=20
)
This gives you the token savings of hybrid state without the full persistence infrastructure. Good for medium-length tasks where you're watching the agent and don't need resume capability.
Error Recovery: The Part Nobody Sets Up (and Then Regrets)
The number one failure mode people post about in forums is the infinite click loop. The model clicks a button. The screen doesn't change (maybe there was a loading delay, maybe the click missed). The model sees the same screen, decides it needs to click the button again, and this repeats until you've burned through your entire token budget watching a robot poke the same pixel.
Persistent agents in OpenClaw have built-in loop detection, but you can configure it to be smarter:
from openclaw import PersistentAgent
from openclaw.recovery import RecoveryPolicy
recovery = RecoveryPolicy(
max_identical_actions=2, # Flag after 2 identical actions
screen_change_threshold=0.05, # Screen must change by at least 5%
on_loop_detected="reflect", # Ask model to analyze what went wrong
on_reflection_fail="rollback", # Roll back to last checkpoint if stuck
wait_after_action=1.5 # Wait 1.5s for UI to settle before screenshot
)
agent = PersistentAgent(
recovery_policy=recovery,
# ... other config
)
That wait_after_action parameter alone fixes probably 30% of loop issues. Most of the time the model isn't wrong about what to click ā the screenshot just fires before the page finishes loading.
The reflect strategy is particularly effective. When a loop is detected, the agent pauses and explicitly asks itself: "My last two actions were identical and the screen didn't change. What might have gone wrong? What should I try differently?" This metacognitive step breaks the loop far more reliably than just killing the action.
Real Example: The Invoice Processing Workflow
Let me walk through a concrete scenario that puts all of this together, because abstract architecture talk only gets you so far.
A user (pulled from a real community discussion, lightly adapted) needed to process monthly invoices from Gmail. The task: open each unread invoice email, extract vendor name, invoice number, amount, and due date, then add a row to a Google Sheet.
Attempt 1: Basic Session. Got through 4 invoices before the context window was stuffed with screenshots. Total cost: $6.40. Total invoices processed: 4 of 31. Task abandoned.
Attempt 2: Persistent Agent with Hybrid State. Configured with checkpointing every 2 invoices, summarizing memory, and accessibility tree extraction. Processed all 31 invoices across two sittings (hit a rate limit after invoice 18, resumed the next morning). Total cost: $2.10. Zero loops. One rollback when Gmail loaded a promotional overlay that confused the agent briefly.
The persistent agent configuration for this was essentially the code I showed above with one addition ā a task-specific instruction that helped the agent maintain structure:
agent.run(
task="""Process all unread invoices in Gmail.
For each invoice email:
1. Open the email
2. Extract: vendor name, invoice number, total amount, due date
3. Switch to the Google Sheet tab
4. Add a new row with the extracted data
5. Return to Gmail inbox
6. Mark the email as read
Repeat until all unread invoice emails are processed.""",
session_name="invoice-processing-jan-2026"
)
The explicit numbered steps in the task description matter more than you'd think. They give the agent a clear internal loop to follow, which dramatically reduces the "model suddenly decides to go do something else" problem that plagues long-running sessions.
Skip the Setup: Felix's OpenClaw Starter Pack
Now look ā everything I've described above works. I've tested it, community members have tested it, and the patterns are solid. But I'll be honest: wiring up the checkpoint store, configuring the hybrid state encoder, tuning the recovery policies, and getting the memory summarization prompts dialed in takes a solid afternoon of fiddling. Maybe more if you're new to OpenClaw.
If you don't want to set all this up manually, Felix's OpenClaw Starter Pack on Claw Mart includes pre-built versions of exactly these patterns. It's $29 and comes with pre-configured skills for persistent agent sessions, hybrid state encoding, checkpoint management, and recovery policies that are already tuned. The invoice processing workflow I described? There's a variant of it in the pack. Same for a few other common patterns like form filling, data extraction from multi-page workflows, and repetitive SaaS tasks.
I'm not saying you can't build all this yourself ā you obviously can, and I just showed you how. But if your goal is to be productive with persistent agents by tomorrow instead of next week, the starter pack saves you the entire configuration headache. It's the kind of thing I wish existed when I was debugging checkpoint schemas at 11 PM.
Where to Go From Here
If you're currently running everything as one-shot sessions and wondering why OpenClaw feels unreliable, you now know why. The platform is capable of far more than most people realize ā they're just using the wrong abstraction for their task.
Here's what I'd do:
-
Audit your current workflows. Anything over 10 actions should probably be a persistent agent. Anything over 20 actions absolutely should be.
-
Switch to hybrid state immediately. Even if you don't adopt full persistence, the token savings from accessibility tree extraction are too significant to ignore. This is the single highest-leverage change you can make.
-
Add checkpointing before you need it. You will hit a rate limit, a timeout, or a crash eventually. The question is whether that costs you 5 seconds (resume from checkpoint) or 45 minutes (start over).
-
Tune your recovery policies. The default loop detection is fine. Adding the
reflectstrategy and await_after_actiondelay makes it substantially better. -
Be explicit in task instructions. Numbered steps, clear completion criteria, and specific output formats help the agent stay on track during long-running tasks. Vague instructions plus long horizons equals chaos.
OpenClaw is genuinely powerful once you move past the session-only mental model. The persistent agent architecture transforms it from "cool party trick" to "thing that actually does my busywork." It just takes a bit of setup to get there ā or $29 if you'd rather skip straight to the good part.