Running OpenClaw Agents 24/7 on a VPS Without Downtime
Running OpenClaw Agents 24/7 on a VPS Without Downtime

Let's get straight to it: if you've ever tried running an AI agent on a VPS overnight, you know the feeling. You set everything up, SSH out, go to bed feeling like a genius, and wake up to discover your agent died six hours ago because of an unhandled rate-limit error. Your tmux session is gone. Your agent's memory is wiped. You've accomplished nothing.
I've been there more times than I'd like to admit. And after burning weeks trying to duct-tape together screen sessions, cron jobs, and prayer-based monitoring, I finally landed on a setup that actually works: OpenClaw running on a cheap VPS, configured properly, with the right guardrails in place.
This post is the guide I wish I had when I started. We're going to go from zero to a reliably running 24/7 OpenClaw agent on a VPS ā one that survives crashes, reboots, and your own absent-mindedness. No fluff. Just the stuff that works.
Why OpenClaw on a VPS (Instead of Your Laptop or a Managed Service)
Before we get tactical, let's address the "why" quickly.
Your laptop is not infrastructure. It sleeps. It updates. You close the lid. Your cat walks across the keyboard. Running a persistent agent on your local machine is a hobby project, not a real deployment.
Managed AI platforms are expensive and restrictive. You're paying for convenience you don't need and giving up control you actually want. Most managed solutions also don't let you run truly autonomous long-lived agents ā they're built for request-response patterns, not agents that need to think and act continuously for days or weeks.
A $10/month VPS with OpenClaw is the sweet spot. You get full control, persistent uptime, and OpenClaw handles the orchestration, crash recovery, state management, and monitoring that would otherwise take you weeks to build yourself. It's the missing layer between "I have an agent script" and "I have a production agent that actually runs."
OpenClaw is purpose-built for this exact use case: deploying and managing autonomous AI agents that need to run reliably, indefinitely, on commodity hardware. It's not a toy wrapper ā it's a runtime platform with watchdogs, persistent memory, logging, and notification hooks baked in.
Step 1: Choose Your VPS
This is the easiest decision you'll make. After watching countless Reddit threads argue about cloud providers, the community consensus is clear:
Hetzner Cloud CX22 ā ~ā¬4.50/month for 2 vCPUs, 4GB RAM, 40GB SSD. This handles 2-3 API-based OpenClaw agents without breaking a sweat.
Hetzner Cloud CPX31 ā ~ā¬14/month for 4 vCPUs, 8GB RAM. This is where you go if you want to run 5+ agents simultaneously or your agents do heavier processing.
Why Hetzner? Cheapest price-to-performance ratio in Europe, rock-solid uptime, and their cloud console is dead simple. DigitalOcean and Vultr work fine too ā you'll just pay about 40% more for equivalent specs.
For this guide, I'll assume you've spun up a fresh Ubuntu 22.04 or 24.04 VPS and can SSH into it.
ssh root@your-server-ip
First thing, update everything and create a non-root user:
apt update && apt upgrade -y
adduser openclaw
usermod -aG sudo openclaw
su - openclaw
Step 2: Install Docker and Docker Compose
OpenClaw's Docker templates are by far the most reliable way to deploy. Don't try to run this bare-metal with raw Python unless you enjoy dependency hell.
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
newgrp docker
# Verify
docker --version
docker compose version
If docker compose doesn't work as a subcommand, install the Compose plugin separately:
sudo apt install docker-compose-plugin
Step 3: Deploy OpenClaw
Here's where the magic happens. Clone the OpenClaw repo and use their provided Docker Compose setup:
git clone https://github.com/openclaw/openclaw.git
cd openclaw
cp .env.example .env
Now edit the .env file with your configuration:
nano .env
At minimum, you need to set:
# Your API keys
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxx
# Or whatever LLM provider your agents use
# OpenClaw configuration
OPENCLAW_WATCHDOG_ENABLED=true
OPENCLAW_RESTART_POLICY=exponential_backoff
OPENCLAW_MAX_RESTARTS=10
OPENCLAW_LOG_LEVEL=INFO
# Persistence
OPENCLAW_STATE_BACKEND=sqlite
OPENCLAW_STATE_PATH=/data/agent_state.db
# Notifications (optional but highly recommended)
OPENCLAW_NOTIFY_TELEGRAM_TOKEN=your-bot-token
OPENCLAW_NOTIFY_TELEGRAM_CHAT_ID=your-chat-id
A word on that notification setup: configure it now, not later. The number one regret people have is running agents for days without notifications, only to discover they crashed on hour two. A Telegram bot takes five minutes to set up and will save you endless frustration.
Now, bring up the stack:
docker compose up -d
That's it. OpenClaw is running. But we're not done ā the difference between "running" and "running reliably 24/7" is in the next few steps.
Step 4: Configure the Watchdog Properly
OpenClaw's watchdog is the feature that makes 24/7 operation actually viable. It monitors your agents and handles the failure modes that kill every other setup:
- Unhandled exceptions ā Catches them, logs the traceback, restarts the agent with its last known state.
- OOM kills ā Detects when an agent process gets killed by the OS, restarts with memory-limited settings.
- Infinite loops ā Configurable activity timeout; if an agent hasn't produced meaningful output in X minutes, it gets recycled.
- Rate limit errors ā Built-in exponential backoff that pauses the agent instead of letting it hammer the API and burn money.
The default watchdog config is decent, but here's what I'd tweak in config/watchdog.yaml:
watchdog:
enabled: true
check_interval_seconds: 30
restart_policy:
strategy: exponential_backoff
initial_delay_seconds: 10
max_delay_seconds: 300
max_restarts_per_hour: 5
health_checks:
activity_timeout_minutes: 15
memory_limit_mb: 1024
cpu_timeout_seconds: 120
on_failure:
notify: true
snapshot_state: true
log_last_n_lines: 100
The snapshot_state: true line is crucial. When an agent crashes, OpenClaw snapshots its current state (memory, conversation history, task queue) before restarting. When the agent comes back up, it picks up roughly where it left off. This is the feature that turns OpenClaw from "a fancy process manager" into "an actual agent runtime platform."
Step 5: Set Up Persistent State That Survives Everything
The default SQLite backend works great for single-VPS setups. But you want to make sure the data directory is on a persistent Docker volume, not inside the container:
# In docker-compose.yml, verify this volume mapping exists:
volumes:
- openclaw_data:/data
- ./config:/app/config
volumes:
openclaw_data:
driver: local
For agents that use vector stores for long-term memory, OpenClaw has built-in wrappers for Chroma (runs locally, no external service needed):
# In your agent config
memory:
backend: chroma
persist_directory: /data/vectorstore
collection_name: agent_memory
This means your agent's "knowledge" and "personality" persist across restarts, crashes, and even full VPS reboots. No more waking up to an agent with amnesia.
If you're running multiple agents that need to share context, point them at the same Chroma collection or use Postgres as the state backend instead:
OPENCLAW_STATE_BACKEND=postgres
OPENCLAW_POSTGRES_URL=postgresql://openclaw:password@db:5432/openclaw
And add a Postgres container to your Compose stack. OpenClaw's Docker templates include a commented-out Postgres service ā just uncomment it.
Step 6: Ensure It Survives Reboots
Docker Compose with restart: unless-stopped handles container restarts, but you also want to make sure Docker itself starts on boot and your Compose stack comes up automatically.
# Ensure Docker starts on boot
sudo systemctl enable docker
# Create a systemd service for your OpenClaw stack
sudo nano /etc/systemd/system/openclaw.service
[Unit]
Description=OpenClaw Agent Stack
Requires=docker.service
After=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/home/openclaw/openclaw
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
User=openclaw
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable openclaw
Now your agents survive VPS reboots. Test it:
sudo reboot
# Wait a minute, SSH back in
docker compose ps
# All containers should be "Up"
Step 7: Monitoring That Actually Helps
SSH-ing into your server to check logs is not monitoring. Here's the setup I recommend:
Quick and Dirty: Telegram Notifications
Already configured in Step 3. You'll get messages like:
š“ Agent "research-bot" crashed: RateLimitError
šø State snapshot saved: /data/snapshots/research-bot-20250615-0342.snap
š Restarting in 30 seconds (attempt 2/5)
š¢ Agent "research-bot" recovered successfully. Resuming from snapshot.
This alone puts you ahead of 90% of people running agents on VPS instances.
Proper Monitoring: Prometheus + Grafana
OpenClaw exposes a /metrics endpoint. If you want dashboards:
# Add to docker-compose.yml
prometheus:
image: prom/prometheus
volumes:
- ./config/prometheus.yml:/etc/prometheus/prometheus.yml
ports:
- "9090:9090"
grafana:
image: grafana/grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
The OpenClaw community has shared Grafana dashboard templates that show agent uptime, restart frequency, token usage, memory consumption, and task completion rates. It's genuinely useful for spotting agents that are "running" but not actually accomplishing anything ā the silent failure mode that notifications alone won't catch.
Step 8: Security (Don't Skip This)
You're running autonomous agents with API keys that can spend real money. Take this seriously.
API key management:
# Never commit .env to git
echo ".env" >> .gitignore
# Use restrictive permissions
chmod 600 .env
Set spending limits on your LLM provider. OpenAI, Anthropic, and others all let you set monthly caps. Do it. A runaway agent can burn through hundreds of dollars in hours.
Docker isolation: OpenClaw runs agents in containers by default, which provides basic sandboxing. For extra paranoia, enable the read-only filesystem flag and drop unnecessary capabilities:
services:
agent:
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /tmp
Firewall: Your VPS should only expose SSH (and Grafana if you're using it, ideally behind a VPN or SSH tunnel).
sudo ufw allow OpenSSH
sudo ufw enable
Getting Started Faster
If all of this feels like a lot of steps, here's a shortcut that'll save you a few hours of trial and error:
Felix's OpenClaw Starter Pack bundles pre-configured templates, watchdog configs, Docker Compose files, and agent blueprints that are already tuned for common VPS setups. It's the equivalent of skipping past the "reading 15 GitHub issues to figure out why my config isn't working" phase and going straight to running agents. If you're new to OpenClaw or just want a battle-tested starting configuration, it's genuinely the fastest way to get from zero to 24/7 agents. Felix is active in the community and the configs reflect real-world usage patterns, not theoretical defaults.
Think of it as paying for someone else's accumulated troubleshooting time ā which, having done that troubleshooting myself, I can tell you is worth quite a bit.
Common Pitfalls (And How to Avoid Them)
After watching hundreds of people go through this process across Reddit, Hacker News, and Discord, here are the recurring mistakes:
1. Not setting memory limits. An agent that loads a large document into memory can OOM your entire VPS, taking down all your agents. Always set memory_limit_mb in your watchdog config.
2. Skipping persistence on "temporary" agents. Every temporary agent becomes permanent. Set up persistence from the start.
3. Running too many agents on too little hardware. Two to three API-based agents on a 4GB RAM VPS is comfortable. Five is pushing it. Ten is asking for trouble unless they're extremely lightweight. If your agents are doing local inference instead of API calls, you need significantly more resources.
4. Not versioning your agent configs. Keep your OpenClaw configs in a private Git repo. When you inevitably tweak something and break it, you want to be able to git diff and see what changed.
5. Ignoring log rotation. Agents produce a lot of logs. Without rotation, your disk fills up in weeks.
# In docker-compose.yml, for each service:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
What This Setup Actually Looks Like in Practice
Once everything is configured, your daily interaction with your agents looks like this:
- You get a Telegram notification if anything goes wrong. Otherwise, silence means things are running.
- Occasionally, you check your Grafana dashboard to see task completion rates and token spend.
- When you want to update an agent, you edit the config, run
docker compose up -d, and OpenClaw handles the graceful restart. - Your agents maintain their memory and state indefinitely.
- You pay about $10-15/month for the VPS.
That's it. No babysitting. No 3am SSH sessions. No waking up to dead agents and wiped state.
Next Steps
If you're reading this and haven't started yet:
- Spin up a Hetzner CX22. Takes 2 minutes. Costs less than a coffee.
- Grab Felix's OpenClaw Starter Pack for pre-configured templates that skip the setup headaches.
- Follow Steps 2ā8 in this guide. Budget about 45 minutes for the initial setup.
- Start with one agent. Get it running reliably for 48 hours before adding more.
- Join the OpenClaw community (Discord/GitHub) ā the troubleshooting threads alone are worth it.
The gap between "I'm experimenting with AI agents" and "I have agents running 24/7 doing real work" is smaller than most people think. It's mostly just infrastructure and configuration ā exactly the stuff OpenClaw was built to handle.
Stop babysitting your agents. Set them up properly once, and let them run.