How to Run OpenClaw on Contabo — Ultra-Cheap European VPS

You want to self-host a fast, OpenAI-compatible LLM inference API without hemorrhaging money on AWS or GCP. Contabo is the answer nobody talks about — dirt-cheap European VPS plans with dedicated GPUs, no sharing, and enough VRAM to run serious models. Pair that with OpenClaw, the open-source inference engine that punches way above its weight, and you've got a production-ready AI API for a fraction of what the big cloud providers charge.

I'm going to walk you through the entire thing: picking the right Contabo plan, deploying your VPS, installing OpenClaw, loading a model, and hitting your first API endpoint. You can realistically have this running in under 30 minutes.

Let's go.

Why This Combination Works

Before we get into the weeds, here's the pitch in one sentence: Contabo gives you dedicated GPU hardware at budget prices, and OpenClaw turns that hardware into a blazing-fast inference server with an OpenAI-compatible API.

OpenClaw is a high-performance inference engine for large language models. Think of it as the lightweight, self-hosted alternative to vLLM or TensorRT-LLM. It supports tensor parallelism, FP8/INT4 quantization, continuous batching, and multi-GPU setups. It's primarily CUDA-based for NVIDIA GPUs but has experimental ROCm support for AMD — which matters because Contabo offers both.

The key features that make OpenClaw ideal for this setup:

OpenAI-compatible API — drop-in replacement for any app already using the OpenAI SDK
Quantization support — run 70B parameter models on hardware that shouldn't be able to handle them
Continuous batching — handle multiple concurrent requests without falling over
Docker-first deployment — no dependency hell, no build nightmares

And Contabo's edge? They don't do the "shared GPU" nonsense. When you rent a GPU VPS, that GPU is yours. Full VRAM, full compute. Their images come pre-configured with the correct drivers (ROCm for AMD, CUDA for NVIDIA), so you skip the worst part of any GPU setup.

The Cost Breakdown

This is where Contabo gets interesting. Here's what you're looking at as of late 2026:

Plan	GPU	VRAM	vCPU / RAM / SSD	Hourly	Monthly	Best For
GPU M	1x AMD MI210	64 GB	32 vCPU / 128 GB / 1.6 TB NVMe	€0.14	~€99	7B–30B models, testing, dev
GPU L	1x AMD MI300X	192 GB	96 vCPU / 512 GB / 3.8 TB NVMe	€0.41	~€299	70B+ models, production
GPU XL	1x H100	80 GB	48 vCPU / 256 GB / 3.8 TB NVMe	€0.69	~€499	High-throughput FP8/INT4
GPU XXL	2x H100	160 GB	96 vCPU / 512 GB / 7.6 TB NVMe	€1.37	~€999	Multi-GPU tensor parallel

Prices exclude VAT. Traffic is essentially unlimited (fair use around 10 TB/month). No setup fees.

For context: an equivalent H100 instance on AWS runs you roughly $3–4/hour. Contabo's €0.69/hour is less than half that. RunPod is somewhere in between. If you're running inference 24/7, the monthly pricing makes it even more aggressive.

My recommendation for most people: Start with the GPU M (MI210, 64 GB VRAM) at €99/month. That's enough to run Llama 3.1 70B in Q4 quantization comfortably. If you need NVIDIA stability and ecosystem support, jump to the GPU XL with the H100.

If you're just testing or doing a proof of concept, use the hourly billing. Spin up, test, tear down. You'll spend a few euros.

Step 1: Deploy Your Contabo GPU VPS

First things first — you need a Contabo account. Head to contabo.com, sign up, verify your email and phone. It's free to create an account.

Then:

Log into my.contabo.com
Navigate to VPS → Add VPS → GPU VPS
Select your plan (I'll use the GPU M / MI210 for this walkthrough)
Choose Ubuntu 22.04 LTS as your OS — this is the recommended image, and it comes with ROCm 6.0+ pre-installed for AMD GPUs (or CUDA 12.4 for NVIDIA plans)
Pick your datacenter location — Frankfurt if you want the lowest latency for European users
Set a root password or upload your SSH key (SSH key is strongly preferred)
Hit deploy

The VPS provisions almost instantly. You'll get an IP address and your credentials. SSH in:

ssh root@YOUR_IP_ADDRESS

You're in. Let's set up the environment.

Step 2: Prepare the System

Contabo's GPU images come with drivers pre-installed, which saves you the single worst experience in all of computing (manually installing CUDA or ROCm). But we still need to update the system and install some basics.

# Update everything
apt update && apt upgrade -y

# Install essentials
apt install -y curl wget git build-essential python3 python3-pip python3-venv

# Verify your GPU is visible
# For AMD (MI210/MI300X):
rocm-smi

# For NVIDIA (H100/A100):
# nvidia-smi

You should see your GPU listed with its full VRAM. If you're on the MI210 plan, you'll see 64 GB HBM2e. If the GPU doesn't show up, reboot the VPS (reboot) and try again — occasionally the first boot after provisioning needs a kick.

That's it for system prep. No driver installation, no kernel module compilation, no existential dread. Contabo handled it.

Step 3: Install OpenClaw

Docker is the way to go here. The OpenClaw docs recommend it, and it eliminates the "works on my machine" problem entirely.

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh && sh get-docker.sh
systemctl start docker && systemctl enable docker

# Verify Docker is running
docker --version

Now clone the OpenClaw repository and build the Docker image:

# Clone OpenClaw
git clone https://github.com/open-claw/open-claw.git
cd open-claw

# Build the Docker image
# For NVIDIA GPUs:
docker build -t openclaw:latest -f docker/Dockerfile.cuda .

# For AMD GPUs (ROCm):
docker build -t openclaw:latest -f docker/Dockerfile.rocm .

Alternatively, if pre-built images are available on the GitHub Container Registry:

# Pull pre-built (NVIDIA)
docker pull ghcr.io/open-claw/open-claw:latest-cuda12

# Pull pre-built (AMD)
docker pull ghcr.io/open-claw/open-claw:latest-rocm6

Building from source takes 10–15 minutes on Contabo's hardware. Pulling a pre-built image is faster if one's available for your platform. Check the OpenClaw GitHub releases for the latest tags.

Step 4: Download Your Model

You need actual model weights to serve. Let's grab Llama 3.1 70B in a quantized format — this is one of the best open-weight models available and runs well within the MI210's 64 GB VRAM when quantized to Q4.

# Install the Hugging Face CLI
pip install huggingface-hub

# Log in (required for gated models like Llama)
huggingface-cli login
# Paste your HF token when prompted — get one at huggingface.co/settings/tokens

# Create a models directory
mkdir -p /models

# Download the model
huggingface-cli download meta-llama/Llama-3.1-70B \
  --local-dir /models/llama-3.1-70b \
  --local-dir-use-symlinks False

This is a big download — roughly 40 GB for the quantized version. On Contabo's 1 Gbps connection, expect it to take 10–15 minutes. Go make coffee.

Tip: If you're running smaller experiments first, grab a 7B or 8B model instead. Something like meta-llama/Llama-3.1-8B will download in under a minute and lets you verify everything works before committing to the big model.

Step 5: Launch OpenClaw

Here's where it all comes together. One Docker command and you have a live inference API:

docker run -d \
  --name openclaw-server \
  --gpus all \
  -p 8000:8000 \
  -v /models:/models \
  openclaw:latest \
  python -m open_claw.serve \
    --model /models/llama-3.1-70b \
    --host 0.0.0.0 \
    --port 8000 \
    --quantization q4_k_m

Breaking this down:

--gpus all — exposes all GPUs to the container (use --device flags for ROCm if needed)
-p 8000:8000 — maps port 8000 to the host
-v /models:/models — mounts your model directory
--quantization q4_k_m — Q4_K_M quantization, which cuts VRAM usage by 50–75% with minimal quality loss
--host 0.0.0.0 — listens on all interfaces (not just localhost)

Watch the logs to make sure everything loads:

docker logs -f openclaw-server

You'll see it loading model shards, allocating VRAM, and eventually printing something like Serving on 0.0.0.0:8000. Model loading takes 1–3 minutes depending on size.

Now test it:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "max_tokens": 100
  }'

You should get a JSON response with the model's answer. Congratulations — you have a self-hosted, OpenAI-compatible inference API running on a €99/month VPS.

From any machine, you can now hit this endpoint:

curl http://YOUR_IP:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-70b",
    "messages": [{"role": "user", "content": "Explain quantum computing in simple terms."}]
  }'

Or use it with the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(
    base_url="http://YOUR_IP:8000/v1",
    api_key="not-needed"  # OpenClaw doesn't require auth by default
)

response = client.chat.completions.create(
    model="llama-3.1-70b",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

This is the beauty of OpenClaw's OpenAI-compatible API. Any application, library, or framework that works with the OpenAI API works with your self-hosted endpoint. Just change the base_url.

Step 6: Productionize It

A raw Docker container on an open port is fine for testing. For production, you need a few more things.

Firewall

ufw allow 22/tcp    # SSH
ufw allow 8000/tcp  # OpenClaw API
ufw enable

Nginx Reverse Proxy with SSL

Install Nginx and Certbot for HTTPS:

apt install -y nginx certbot python3-certbot-nginx

# Create Nginx config
cat > /etc/nginx/sites-available/openclaw <<EOF
server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host \$host;
        proxy_set_header X-Real-IP \$remote_addr;
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
    }
}
EOF

ln -s /etc/nginx/sites-available/openclaw /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx

# Get SSL cert
certbot --nginx -d your-domain.com

Now your API is accessible at https://your-domain.com/v1/chat/completions.

Docker Compose for Auto-Restart

Create a docker-compose.yml so your server survives reboots:

version: "3.8"
services:
  openclaw:
    image: openclaw:latest
    container_name: openclaw-server
    restart: always
    ports:
      - "8000:8000"
    volumes:
      - /models:/models
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    command: >
      python -m open_claw.serve
      --model /models/llama-3.1-70b
      --host 0.0.0.0
      --port 8000
      --quantization q4_k_m

docker compose up -d

Monitoring

OpenClaw exposes Prometheus-compatible metrics. Pair with Grafana for dashboards:

# Check metrics endpoint
curl http://localhost:8000/metrics

You'll get request counts, latency percentiles, VRAM usage, tokens/second — everything you need to know if your server is healthy.

Performance Expectations

Based on community benchmarks and user reports from places like r/LocalLLaMA:

Llama 70B Q4 on MI210 (64 GB): ~25–35 tokens/second, sub-1-second time to first token
Llama 70B Q4 on H100 (80 GB): ~80–120 tokens/second
Llama 8B FP16 on MI210: ~150+ tokens/second

Contabo's GPUs may undervolt slightly compared to bare-metal colocation, but the difference in real-world inference is marginal — maybe 5–10%. At these prices, nobody's complaining.

Troubleshooting the Common Gotchas

GPU not detected in Docker: Make sure you're using --gpus all for NVIDIA or the correct --device flags for ROCm. Reboot the VPS if it just provisioned.

Out of memory errors: Use heavier quantization (q4_k_m instead of fp16), reduce --max-seq-len to 4096 or 2048, or step up to a bigger plan. Monitor with watch -n1 rocm-smi or watch -n1 nvidia-smi.

Gated model download fails: You need to accept the model's license on Hugging Face's website first, then use huggingface-cli login with a valid token.

Slow Docker build: Contabo's bandwidth is 1 Gbps, which is fine, but building from source compiles CUDA kernels which is CPU-bound. Use pre-built images when available.

High latency from US/Asia: Choose a different Contabo datacenter, or put Cloudflare Tunnel in front of your endpoint for global edge caching of static responses.

The Real Cost Comparison

Let's put this in perspective. Running Llama 70B inference 24/7:

Contabo GPU M (MI210): €99/month
RunPod equivalent: ~$200/month
AWS g5.4xlarge (A10G, 24GB — not even enough VRAM): ~$1,200/month
OpenAI API (equivalent throughput): $2,000–5,000/month depending on usage

Contabo is not the fastest. It's not the most polished cloud provider. The control panel looks like it was designed in 2010. But for raw price-to-performance on GPU compute, nothing touches it in Europe.

What to Build With This

Now that you have a self-hosted OpenClaw API, the possibilities open up. Check out the Claw Mart listings for pre-built tools and integrations that plug directly into OpenAI-compatible endpoints — chatbots, RAG pipelines, agent frameworks, and more. Since OpenClaw serves the standard /v1/chat/completions endpoint, anything in the Claw Mart ecosystem that works with OpenAI works with your server.

A few ideas to start:

Private AI assistant for your team — no data leaving your infrastructure
RAG pipeline with your company's documents
Multi-agent system where cost-per-token is zero after the VPS bill
Fine-tuned model serving for domain-specific applications

Next Steps

Sign up for Contabo and deploy a GPU M instance — use hourly billing to test first
Follow the steps above to get OpenClaw running with a small model (8B) for validation
Scale up to your target model once everything checks out
Browse Claw Mart for OpenClaw-compatible tools to build on top of your new inference server
Join the OpenClaw community on GitHub for updates, benchmarks, and support

You now have a production-grade LLM inference API running on your own hardware, in a European datacenter, for less than a Netflix family plan. That's the kind of asymmetric advantage that makes self-hosting worth the 30 minutes of setup.