Claw Mart
← Back to Blog
February 18, 20268 min readClaw Mart Team

How to Run OpenClaw on Hetzner — Europe's Best Value VPS

Hetzner offers €4-8/mo VPS with excellent performance. Europe's best value for running your AI agent. Here's the complete setup guide.

How to Run OpenClaw on Hetzner — Europe's Best Value VPS

If you've been paying attention to European hosting, you already know Hetzner is the move. German engineering, absurd pricing, data centers that actually respect GDPR — it's the default choice for anyone who wants serious infrastructure without the AWS sticker shock.

And if you've been paying attention to self-hosted AI, you know OpenClaw is the move. It's the open-source inference engine built specifically for Mixture-of-Experts models — Grok-1, Mixtral 8x22B, DeepSeek-V2 — running on AMD Instinct GPUs with ROCm. No NVIDIA tax. No API rate limits. No sending your data to someone else's servers.

The combination of these two is, frankly, one of the best value propositions in self-hosted AI right now. You get dedicated AMD GPU hardware at auction prices that would make AWS blush, running an inference engine that was purpose-built for exactly this hardware.

Let me walk you through the entire setup, start to finish. By the end, you'll have OpenClaw serving an OpenAI-compatible API on Hetzner dedicated hardware, connected to your Telegram bot, running 24/7 for a fraction of what you'd pay for managed API access.


Why Hetzner for OpenClaw (and Why Not Hetzner Cloud)

First, let's clear up a common confusion. Hetzner has two products that matter here:

Hetzner Cloud — VPS instances with NVIDIA GPUs (A4000, A100, H100). These are great for a lot of things, but they're completely useless for OpenClaw. OpenClaw runs exclusively on AMD Instinct GPUs via ROCm. No CUDA. No NVIDIA. If you try to run OpenClaw on an NVIDIA box, nothing will happen except frustration.

Hetzner Dedicated Servers — Physical servers you rent monthly through robot.hetzner.com. This is where the magic happens. Hetzner's server auction regularly has AMD Instinct machines (MI210, MI250X, and increasingly MI300X configs) at prices that don't make sense. We're talking 50–80% cheaper than comparable hardware elsewhere.

Here's what you're looking at cost-wise:

ConfigurationGPUsCPU/RAMMonthly (€, est.)
Auction MI2101x MI210 64GBEPYC 64c / 256GB€300–600
Auction MI250X4x MI250X 128GB ea.EPYC 128c / 1TB€1,500–3,000
Auction/New MI300X4x MI300X 192GB ea.EPYC 384c / 3TB€4,000–8,000
Custom MI300X/MI325X8x MI300X/MI325XEPYC 768c / 6TB+€10k–20k+

Add a Hetzner Volume for model storage at €0.048/GB/month (a 1TB volume for your models costs €48/month), and your starter setup comes in around €450/month total. That's for a dedicated AMD GPU server running your own inference engine with zero per-token costs.

Compare that to API pricing for Mixtral 8x22B through any managed provider, and the math gets very obvious very fast — especially if you're running any kind of volume.

Traffic is basically free: inbound is unlimited, outbound is €1/TB. For an inference API that's mostly receiving short prompts and returning text, your bandwidth costs will be negligible.


Step 0: Get Your Hetzner Server

Head to robot.hetzner.com and go to the Server Auction section.

Filter for "AMD", "GPU", or "Instinct." Availability varies — these are physical machines, and the good ones go fast. If you don't see what you need, check back daily. Hetzner refreshes auction inventory regularly.

When you find a suitable machine (minimum MI210 64GB for running quantized Mixtral-class models), reserve it and select Ubuntu 22.04 LTS as your installation image.

Important: Ubuntu 22.04 specifically. Not 24.04, not Debian, not Fedora. ROCm is certified for 22.04 (Jammy), and trying to get it working on anything else will cost you hours of debugging kernel module issues. Don't be clever here. Use 22.04.

Add your SSH key during provisioning. Once the server is ready (usually 15–60 minutes for auction servers), SSH in:

ssh root@your-server-ip
apt update && apt upgrade -y && reboot

Wait for the reboot, SSH back in, and you're ready to build.


Step 1: Install ROCm

This is where most people get tripped up. ROCm installation on Hetzner dedicated servers has a few quirks, but if you follow these steps exactly, it works cleanly.

First, install the prerequisites:

apt install -y wget gnupg2 curl software-properties-common ca-certificates linux-headers-$(uname -r)

That linux-headers package is critical. Without matching kernel headers, the AMD GPU driver modules won't compile, and you'll get cryptic errors that send you down a two-hour rabbit hole.

Now install the AMD GPU installer package:

wget https://repo.radeon.com/amdgpu-install/6.1.2/ubuntu/jammy/amdgpu-install_6.1.60202-1_all.deb
dpkg -i amdgpu-install_6.1.60202-1_all.deb
apt update

Run the actual installation with the use cases we need:

amdgpu-install --usecase=rocm,hiplibsdk,mlsdk,docker:rocm --no-dkms

The --no-dkms flag is important for Hetzner's kernel setup. If you're running their default generic kernel, DKMS sometimes fights with pre-built modules. Skip it and save yourself the headache.

Add your user to the required groups:

usermod -a -G render,video $USER

Reboot:

reboot

After reboot, verify everything is working:

/opt/rocm/bin/rocminfo

You should see your AMD Instinct GPU(s) listed with their agent info. If you see "HSA Agent" entries for your GPUs, you're golden.

Check GPU status:

rocm-smi

This shows temperature, utilization, and memory usage. Bookmark this command — you'll use it constantly.

Gotcha: If rocminfo doesn't list your GPUs, check if Secure Boot is enabled. Access Hetzner's IPMI console (robot.hetzner.com → your server → Console) and disable Secure Boot in BIOS. This trips up about 30% of first-time Hetzner AMD setups.


Step 2: Docker and GPU Passthrough

Docker should have been installed as part of the ROCm setup (the docker:rocm use case). Verify it:

systemctl start docker
systemctl enable docker

Test GPU passthrough with the official ROCm PyTorch container:

docker run --rm --device=/dev/kfd --device=/dev/dri -it rocm/pytorch:rocm6.1_ubuntu22.04_py3.10_pytorch_release-2.4.0 /bin/bash

Inside the container, run:

python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.device_count())"

Yes, it says "cuda" — ROCm implements the CUDA API via HIP translation. If this prints True and the correct number of GPUs, your Docker GPU passthrough is working perfectly.

Exit the container (exit) and move on.


Step 3: Prepare Model Storage

Large models need large storage. Mixtral 8x22B quantized (Q4_K_M) is around 80–140GB depending on the quantization. You need space.

If you added a Hetzner Volume during provisioning, format and mount it:

mkfs.ext4 /dev/nvme2n1
mkdir -p /mnt/models
mount /dev/nvme2n1 /mnt/models

Add it to /etc/fstab so it persists across reboots:

echo '/dev/nvme2n1 /mnt/models ext4 defaults 0 2' >> /etc/fstab

Now download your model. For Mixtral 8x7B Instruct (a good starting point — runs comfortably on a single MI210):

apt install -y git-lfs
cd /mnt/models
# Using huggingface-cli or direct download
pip install huggingface-hub
huggingface-cli download TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf --local-dir .

This will take a while on Hetzner's 1Gbps connection. Go make coffee. Or lunch.


Step 4: Deploy OpenClaw

Now the actual deployment. Pull the OpenClaw container:

docker pull ghcr.io/open-claw/open-claw:latest

For a single GPU setup, run:

docker run -d \
  --name openclaw \
  --device=/dev/kfd --device=/dev/dri \
  --group-add video --group-add render \
  -p 8080:8080 \
  -v /mnt/models:/models \
  --shm-size=32g \
  ghcr.io/open-claw/open-claw:latest \
  --model /models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf \
  --host 0.0.0.0 --port 8080 \
  --tp 1

The --tp 1 flag sets tensor parallelism to 1 (single GPU). If you have 4 GPUs, set --tp 4 to split the model across all of them. OpenClaw handles this natively — one of its big advantages over cobbling together your own inference stack.

The --shm-size=32g is important for inter-process communication when working with large models. Don't skip it.

For production deployments, use docker-compose. Create a docker-compose.yml:

version: '3.8'
services:
  openclaw:
    image: ghcr.io/open-claw/open-claw:latest
    container_name: openclaw
    restart: unless-stopped
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - video
      - render
    ports:
      - "8080:8080"
    volumes:
      - /mnt/models:/models
    shm_size: 32g
    command: >
      --model /models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf
      --host 0.0.0.0
      --port 8080
      --tp 1

Start it:

docker-compose up -d

The restart: unless-stopped policy ensures OpenClaw comes back up after server reboots. True always-on inference.


Step 5: Test Your Endpoint

Give OpenClaw 30–60 seconds to load the model into GPU memory, then test:

curl http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mixtral",
    "prompt": "Explain quantum computing in one paragraph.",
    "max_tokens": 150
  }'

You should get back a JSON response with a completion. On MI300X hardware, expect 50–150 tokens/second with Q4_K_M quantization. MI210 will be slower (20–50 t/s) but still very usable for real-time applications.

Check the logs if anything looks off:

docker logs openclaw

And monitor GPU utilization while generating:

rocm-smi

You should see VRAM usage jump and GPU utilization spike during inference. If VRAM is maxed out, you need a smaller model or more GPUs with higher --tp.


Step 6: Lock Down the Server

Your inference API is now exposed on port 8080. Before you do anything else, secure it:

ufw allow 22/tcp
ufw allow 8080/tcp
ufw enable

Also configure the Hetzner Firewall through the robot console — add rules for ports 22 and 8080, and restrict 8080 to your known IPs if possible.

For production, put nginx in front as a reverse proxy with basic auth:

apt install -y nginx apache2-utils
htpasswd -c /etc/nginx/.htpasswd apiuser

Configure nginx (/etc/nginx/sites-available/openclaw):

server {
    listen 443 ssl;
    server_name your-domain.com;

    ssl_certificate /etc/letsencrypt/live/your-domain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/your-domain.com/privkey.pem;

    location / {
        auth_basic "OpenClaw API";
        auth_basic_user_file /etc/nginx/.htpasswd;
        proxy_pass http://localhost:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

Get a free SSL cert with certbot, enable the site, reload nginx. Now your API is behind HTTPS and basic auth. Not Fort Knox, but enough to keep the bots out.


Step 7: Connect Your Telegram Bot

This is where OpenClaw becomes genuinely useful as a personal AI agent. Since OpenClaw exposes an OpenAI-compatible API, any client that speaks the OpenAI format works out of the box.

Here's a minimal Python Telegram bot that uses your self-hosted OpenClaw endpoint:

import requests
from telegram import Update
from telegram.ext import Application, MessageHandler, filters, ContextTypes

OPENCLAW_URL = "https://your-domain.com/v1/chat/completions"
OPENCLAW_AUTH = ("apiuser", "your-password")

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE):
    user_message = update.message.text

    response = requests.post(
        OPENCLAW_URL,
        auth=OPENCLAW_AUTH,
        json={
            "model": "mixtral",
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": user_message}
            ],
            "max_tokens": 500
        }
    )

    reply = response.json()["choices"][0]["message"]["content"]
    await update.message.reply_text(reply)

app = Application.builder().token("YOUR_TELEGRAM_BOT_TOKEN").build()
app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
app.run_polling()

Add this as another service in your docker-compose, and you've got a personal AI agent running on your own hardware, responding to Telegram messages in real-time.


The Real Cost Breakdown

Let's get specific about what this costs monthly for a starter setup:

ItemMonthly Cost (€)
Hetzner Auction MI210 Server~€400
1TB Hetzner Volume€48
Domain + SSL (Cloudflare)Free
Bandwidth (minimal for text API)~€1
Total~€449/mo

For that €449, you get unlimited inference. No per-token costs. No rate limits. No data leaving your server. Run it 24/7, serve thousands of requests per day, and the cost stays flat.

If you're currently spending more than €449/month on API calls to any managed LLM provider, this setup pays for itself immediately. And it scales — add a second server, bump up to MI300X, run bigger models. The economics only get better with volume.


Find What You Need on Claw Mart

Here's where Claw Mart comes in. Whether you need pre-configured OpenClaw containers, optimized model weights for AMD hardware, Telegram bot templates that plug into OpenClaw's API, or monitoring dashboards for tracking inference performance — the Claw Mart marketplace has listings from builders who've already solved these problems.

Check the OpenClaw category for ready-to-deploy configs, quantized models tested on Instinct hardware, and integration tools that save you hours of setup time. Sellers on Claw Mart specialize in exactly this stack.


What's Next

You've got OpenClaw running on Hetzner. Your inference API is live. Your Telegram bot is responding. Here's where to go from here:

  1. Scale up: Move to MI300X for 3–5x the throughput. Set --tp 4 for four-GPU tensor parallelism.
  2. Add monitoring: Prometheus + Grafana for tracking tokens/second, latency, GPU temps. Essential for production.
  3. Try bigger models: DeepSeek-V2, Mixtral 8x22B, Grok-1. OpenClaw's MoE optimization means these run far more efficiently than dense models of equivalent quality.
  4. Browse Claw Mart: Find optimized configs, model weights, and integration tools from other OpenClaw operators who've already figured out the edge cases.

The whole setup takes about an hour if you follow the steps above. No hand-holding needed. No vendor lock-in. Just your hardware, your models, your data.

That's the point.

More From the Blog