Claw Mart
← Back to Blog
March 21, 20268 min readClaw Mart Team

Troubleshooting High Memory Usage in Multi-Agent OpenClaw

Troubleshooting High Memory Usage in Multi-Agent OpenClaw

Troubleshooting High Memory Usage in Multi-Agent OpenClaw

If you've been running multi-agent workflows in OpenClaw for more than about thirty minutes, you've probably watched your system monitor with a mixture of fascination and horror. The RAM counter climbs like it's training for Everest. 4 GB. 8 GB. 12 GB. Then your machine freezes, Docker throws an OOMKilled error, and you're left staring at a dead terminal wondering what went wrong.

You're not alone. High memory usage is the single most common complaint in the OpenClaw community, and it's the reason a lot of people give up on long-horizon agent tasks before they ever get results. The good news: this is a solvable problem. The bad news: the defaults are working against you, and nobody tells you that upfront.

I've spent the last few months running multi-agent OpenClaw setups for everything from automated web research to bulk form filling to monitoring dashboards. I've hit every memory wall there is. Here's what's actually going on, why it happens, and — step by step — how to fix it so you can run agents for hours without your machine catching fire.

The Real Reasons Your OpenClaw Agents Are Eating All Your RAM

Before you start randomly tuning things, you need to understand the five root causes. Almost every memory issue in OpenClaw traces back to one (or usually several) of these.

1. The Screenshot Pipeline Is a Memory Hog

This is the big one. OpenClaw's browser-control loop works by capturing a screenshot at every step, encoding it to base64, holding it in memory, and sending it to the vision pipeline. That's how the agent "sees" what's on screen.

The problem is that each screenshot, even at modest resolution, is a hefty chunk of data. A single 1920×1080 PNG screenshot can be 2–5 MB raw. Base64 encoding inflates that by about 33%. And Python's garbage collector is notoriously lazy about reclaiming memory from large byte strings, especially when references linger in your agent's history.

After 40 steps, you might have 100–200 MB of screenshot data just sitting in your Python process's memory, doing nothing. After 200 steps, you're looking at a gigabyte or more — and that's just the screenshots.

2. Unbounded Agent History

By default, most OpenClaw agent configurations keep every single observation, thought, and action in memory as part of the conversation context. This is great for short tasks. For anything longer than about 20 steps, it's a death sentence.

The history grows linearly with every step. Screenshots compound it, but even text-only observations (DOM snapshots, accessibility trees) add up fast. A single page's DOM can easily be 50–200 KB of text. Multiply that by a hundred steps and you've got tens of megabytes of context that the agent is hauling around like a backpack full of bricks.

3. Browser Instance Bloat

OpenClaw uses headless Chromium under the hood (via Playwright). A single Chromium instance typically consumes 800 MB–2.5 GB of RAM depending on the pages it's rendering. If you're running multi-agent workflows where each agent gets its own browser, you're multiplying that by the number of agents.

Three agents with their own browsers? That's potentially 7.5 GB just for the browsers before OpenClaw's own processes even start.

4. Python Memory Fragmentation

Agent loops create and destroy thousands of small objects per run: Pydantic models, tool outputs, LLM response objects, parsed HTML fragments. Python allocates memory in pools, and when you create a ton of small objects and then free most of them, the memory doesn't get returned to the OS. It stays allocated in fragmented pools.

This is why you'll see RAM usage climb steadily even when you're being careful about deleting objects. The process appears to "leak" memory, but it's actually just fragmentation.

5. Framework and Model Overhead

Before your agents even take their first action, you're carrying the weight of the loaded dependencies: the vision model (if running locally), Playwright, and the OpenClaw framework itself. Depending on your setup, idle baseline RAM can be 4–8 GB. That leaves precious little headroom on a 16 GB machine.

How to Actually Fix It

Alright, enough diagnosis. Here's the treatment plan, ordered from highest impact to lowest.

Step 1: Switch to Text-Based Observations Wherever Possible

This single change can cut your memory usage by 50–70% for most workflows. OpenClaw supports multiple observation modes, and you don't need a screenshot for every single step.

For navigation, form filling, clicking buttons, and reading page content, the accessibility tree or a simplified DOM extraction gives the agent everything it needs at a tiny fraction of the memory cost.

In your OpenClaw agent configuration, set the observation mode to hybrid:

agent_config = {
    "observation_mode": "hybrid",
    "screenshot_frequency": "on_demand",  # Only capture screenshots when the agent explicitly requests one
    "text_observation": {
        "mode": "accessibility_tree",  # Options: "accessibility_tree", "simplified_dom", "full_dom"
        "max_length": 8000  # Truncate to keep context manageable
    }
}

With screenshot_frequency set to on_demand, the agent only gets visual observations when it calls a specific "take_screenshot" tool. For most web tasks, the agent will quickly learn to rely on the text observations and only request screenshots when it's genuinely confused about layout.

This is the single biggest win. Do this first.

Step 2: Implement Aggressive History Compression

Don't let the agent carry around its entire life story. After every N steps, summarize the old history and replace it.

Here's a practical pattern that works well in OpenClaw:

class MemoryCompressor:
    def __init__(self, compress_every=15, keep_recent=5):
        self.compress_every = compress_every
        self.keep_recent = keep_recent
        self.step_count = 0
    
    def maybe_compress(self, agent):
        self.step_count += 1
        if self.step_count % self.compress_every != 0:
            return
        
        history = agent.get_history()
        if len(history) <= self.keep_recent:
            return
        
        old_history = history[:-self.keep_recent]
        recent_history = history[-self.keep_recent:]
        
        # Use the agent's own LLM to summarize
        summary_prompt = (
            "Summarize the following agent trajectory into a concise paragraph. "
            "Include: what was accomplished, what failed, current state, and next goal.\n\n"
            + format_history(old_history)
        )
        
        summary = agent.llm.generate(summary_prompt, max_tokens=500)
        
        # Replace history with summary + recent steps
        agent.set_history([
            {"role": "system", "content": f"Previous trajectory summary: {summary}"}
        ] + recent_history)
        
        # Force cleanup of old objects
        del old_history
        gc.collect()

Plug this into your agent loop:

import gc

compressor = MemoryCompressor(compress_every=15, keep_recent=5)

for step in agent.run():
    compressor.maybe_compress(agent)
    # ... rest of your loop

This caps your effective history at roughly 5 recent steps plus a compressed summary. Memory usage stays flat instead of climbing linearly.

Step 3: Share Browser Instances Across Agents

If you're running multiple agents, do not give each one its own browser. Use a shared browser with separate contexts:

from playwright.async_api import async_playwright

async def setup_shared_browser():
    pw = await async_playwright().start()
    
    # One browser, multiple isolated contexts
    browser = await pw.chromium.launch(
        args=[
            '--disable-dev-shm-usage',
            '--disable-gpu',
            '--single-process',
            '--no-zygote',
            '--disable-extensions',
            '--disable-background-networking',
            '--disable-default-apps',
            '--no-first-run',
        ]
    )
    
    return browser

async def create_agent_context(browser):
    # Each agent gets an isolated context (like an incognito window)
    # but they share the same browser process
    context = await browser.new_context(
        viewport={'width': 1280, 'height': 720},  # Smaller viewport = smaller screenshots
        device_scale_factor=1,  # Don't use 2x retina scaling
    )
    page = await context.new_page()
    return context, page

Notice the Chromium flags. --disable-dev-shm-usage is critical in Docker environments. --single-process and --no-zygote prevent Chromium from spawning multiple sub-processes. These flags alone can save 500 MB–1 GB per browser instance.

Also notice the viewport: 1280×720 instead of 1920×1080. Smaller viewport means smaller screenshots, which means less memory per capture. For most agent tasks, 720p is plenty.

Step 4: Explicit Cleanup After Every Step

Python's garbage collector needs help. After each agent step, explicitly clean up the large objects:

import gc
import ctypes

def force_cleanup():
    """Aggressive memory cleanup between agent steps."""
    gc.collect()
    
    # On Linux, try to release memory back to OS
    try:
        ctypes.CDLL("libc.so.6").malloc_trim(0)
    except (OSError, AttributeError):
        pass  # Not on Linux, skip

# In your agent loop:
for step in agent.run():
    result = step.execute()
    
    # Explicitly delete large objects
    if hasattr(result, 'screenshot'):
        del result.screenshot
    
    force_cleanup()

The malloc_trim call is a Linux-specific trick that asks glibc to return freed memory to the OS. It won't help on macOS, but on Linux (and in Docker, which is Linux), it can recover hundreds of megabytes.

Step 5: Compress Screenshots When You Do Use Them

When the agent does need a screenshot, don't send a full-quality PNG. Use JPEG compression and reduce the resolution:

from PIL import Image
import io
import base64

def compress_screenshot(raw_png_bytes, quality=40, max_width=1024):
    """Compress a screenshot to reduce memory footprint."""
    img = Image.open(io.BytesIO(raw_png_bytes))
    
    # Resize if too large
    if img.width > max_width:
        ratio = max_width / img.width
        new_size = (max_width, int(img.height * ratio))
        img = img.resize(new_size, Image.LANCZOS)
    
    # Convert to JPEG with aggressive compression
    buffer = io.BytesIO()
    img.save(buffer, format='JPEG', quality=quality, optimize=True)
    compressed_bytes = buffer.getvalue()
    
    # Clean up
    del img, buffer
    
    return base64.b64encode(compressed_bytes).decode('utf-8')

A 1920×1080 PNG screenshot might be 3–5 MB. After this compression (resize to 1024px wide, JPEG quality 40), you're looking at 30–80 KB. That's a 50–100x reduction, and for most web pages, the agent can still interpret the layout perfectly fine.

Step 6: Monitor and Profile

You can't fix what you can't measure. Use tracemalloc to find out exactly where memory is being allocated:

import tracemalloc

tracemalloc.start()

# ... run your agent for a while ...

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("\nTop 20 memory allocations:")
for stat in top_stats[:20]:
    print(stat)

Run this after 50 steps and you'll immediately see which lines of code are responsible for the most memory allocation. In my experience, it's almost always screenshot handling and history accumulation, but occasionally you'll find a surprise — like a logging handler that's keeping references to every LLM response, or a caching decorator that never evicts.

The "I Just Want It to Work" Starting Point

If you're new to OpenClaw and you want to skip the part where you spend a week debugging memory issues, grab Felix's OpenClaw Starter Pack. It's a preconfigured bundle that includes sensible defaults for memory management — history compression is already wired up, the observation mode defaults to hybrid, and the browser configuration includes all the low-memory Chromium flags I mentioned above.

I wish I'd had it when I started. Would have saved me about two weekends of watching htop in a cold sweat. If you eventually want to customize everything (and you probably will), the Starter Pack gives you a working baseline to modify rather than building from scratch and discovering each memory pitfall the hard way.

Putting It All Together: A Production-Ready Agent Loop

Here's what a memory-conscious multi-agent OpenClaw setup looks like when you combine everything:

import asyncio
import gc
import ctypes
from openclaw import Agent, AgentConfig

async def run_memory_efficient_agents():
    # Shared browser with low-memory flags
    browser = await setup_shared_browser()
    
    # Configure agents for minimal memory
    config = AgentConfig(
        observation_mode="hybrid",
        screenshot_frequency="on_demand",
        screenshot_compression={"format": "jpeg", "quality": 40, "max_width": 1024},
        text_observation={"mode": "accessibility_tree", "max_length": 8000},
        history_limit=None,  # We'll handle compression ourselves
    )
    
    agents = []
    compressors = []
    
    for i in range(3):  # Three agents
        ctx, page = await create_agent_context(browser)
        agent = Agent(config=config, page=page)
        agents.append(agent)
        compressors.append(MemoryCompressor(compress_every=15, keep_recent=5))
    
    # Run agents with periodic cleanup
    step_count = 0
    while not all(a.is_done() for a in agents):
        for agent, compressor in zip(agents, compressors):
            if not agent.is_done():
                await agent.step()
                compressor.maybe_compress(agent)
        
        step_count += 1
        
        # Global cleanup every 10 steps
        if step_count % 10 == 0:
            force_cleanup()
    
    await browser.close()

asyncio.run(run_memory_efficient_agents())

With this setup, I've run three concurrent agents for over four hours on a machine with 16 GB of RAM. Total memory usage stayed under 8 GB the entire time. Without these optimizations, the same workload would OOM within 30–45 minutes.

Quick Reference: Memory Savings by Technique

Here's roughly what each optimization saved in my testing with three agents running concurrently:

TechniqueMemory SavedEffort
Hybrid observation mode3–5 GBLow (config change)
History compression every 15 steps1–3 GBMedium (code addition)
Shared browser + Chromium flags2–4 GBLow (config change)
Screenshot JPEG compression500 MB–2 GBLow (utility function)
Explicit gc.collect + malloc_trim200–800 MBLow (one function)
Smaller viewport (720p)200–500 MBTrivial (config change)

The observation mode change and shared browser are the highest-impact, lowest-effort wins. Start there.

When to Give Up on Local and Go Remote

There's a honest cutoff here. If you're running a local vision model (even a quantized 7B) alongside multiple browser agents, you probably need at least 32 GB of RAM. No amount of optimization will compress a 7B vision model into 2 GB of RAM.

Options if you're hardware-constrained:

  • Use OpenClaw's API mode to send screenshots to a remote vision endpoint instead of running a model locally. This immediately frees 4–12 GB.
  • Run the browser in a separate container with strict memory limits (docker run --memory=2g) and communicate via OpenClaw's remote browser interface.
  • Use a cloud browser service and point OpenClaw at it. The browser RAM becomes someone else's problem.

What's Next

If you've implemented all of the above and you're still having issues, the next place to look is your specific task pattern. Some tasks (like monitoring a page that constantly changes, or navigating sites with heavy JavaScript) generate much larger observations than others.

Start with tracemalloc profiling on your actual workload, not a toy example. Find the specific allocation that's growing unbounded, and you'll usually discover it's a cache, a log buffer, or a reference cycle that's unique to your setup.

And if you haven't started yet and want to avoid all of this pain from day one, seriously — the Felix's OpenClaw Starter Pack from Claw Mart has these optimizations baked in. It's the difference between learning from your own suffering and learning from someone else's. Both work. One's faster.

Claw Mart Daily

Get one AI agent tip every morning

Free daily tips to make your OpenClaw agent smarter. No spam, unsubscribe anytime.

More From the Blog