How to Run OpenClaw on Contabo — Ultra-Cheap European VPS
Contabo offers ultra-cheap VPS in Europe. Budget option for running OpenClaw. Here's the deployment guide.

You want to self-host a fast, OpenAI-compatible LLM inference API without hemorrhaging money on AWS or GCP. Contabo is the answer nobody talks about — dirt-cheap European VPS plans with dedicated GPUs, no sharing, and enough VRAM to run serious models. Pair that with OpenClaw, the open-source inference engine that punches way above its weight, and you've got a production-ready AI API for a fraction of what the big cloud providers charge.
I'm going to walk you through the entire thing: picking the right Contabo plan, deploying your VPS, installing OpenClaw, loading a model, and hitting your first API endpoint. You can realistically have this running in under 30 minutes.
Let's go.
Why This Combination Works
Before we get into the weeds, here's the pitch in one sentence: Contabo gives you dedicated GPU hardware at budget prices, and OpenClaw turns that hardware into a blazing-fast inference server with an OpenAI-compatible API.
OpenClaw is a high-performance inference engine for large language models. Think of it as the lightweight, self-hosted alternative to vLLM or TensorRT-LLM. It supports tensor parallelism, FP8/INT4 quantization, continuous batching, and multi-GPU setups. It's primarily CUDA-based for NVIDIA GPUs but has experimental ROCm support for AMD — which matters because Contabo offers both.
The key features that make OpenClaw ideal for this setup:
- OpenAI-compatible API — drop-in replacement for any app already using the OpenAI SDK
- Quantization support — run 70B parameter models on hardware that shouldn't be able to handle them
- Continuous batching — handle multiple concurrent requests without falling over
- Docker-first deployment — no dependency hell, no build nightmares
And Contabo's edge? They don't do the "shared GPU" nonsense. When you rent a GPU VPS, that GPU is yours. Full VRAM, full compute. Their images come pre-configured with the correct drivers (ROCm for AMD, CUDA for NVIDIA), so you skip the worst part of any GPU setup.
The Cost Breakdown
This is where Contabo gets interesting. Here's what you're looking at as of late 2026:
| Plan | GPU | VRAM | vCPU / RAM / SSD | Hourly | Monthly | Best For |
|---|---|---|---|---|---|---|
| GPU M | 1x AMD MI210 | 64 GB | 32 vCPU / 128 GB / 1.6 TB NVMe | €0.14 | ~€99 | 7B–30B models, testing, dev |
| GPU L | 1x AMD MI300X | 192 GB | 96 vCPU / 512 GB / 3.8 TB NVMe | €0.41 | ~€299 | 70B+ models, production |
| GPU XL | 1x H100 | 80 GB | 48 vCPU / 256 GB / 3.8 TB NVMe | €0.69 | ~€499 | High-throughput FP8/INT4 |
| GPU XXL | 2x H100 | 160 GB | 96 vCPU / 512 GB / 7.6 TB NVMe | €1.37 | ~€999 | Multi-GPU tensor parallel |
Prices exclude VAT. Traffic is essentially unlimited (fair use around 10 TB/month). No setup fees.
For context: an equivalent H100 instance on AWS runs you roughly $3–4/hour. Contabo's €0.69/hour is less than half that. RunPod is somewhere in between. If you're running inference 24/7, the monthly pricing makes it even more aggressive.
My recommendation for most people: Start with the GPU M (MI210, 64 GB VRAM) at €99/month. That's enough to run Llama 3.1 70B in Q4 quantization comfortably. If you need NVIDIA stability and ecosystem support, jump to the GPU XL with the H100.
If you're just testing or doing a proof of concept, use the hourly billing. Spin up, test, tear down. You'll spend a few euros.
Step 1: Deploy Your Contabo GPU VPS
First things first — you need a Contabo account. Head to contabo.com, sign up, verify your email and phone. It's free to create an account.
Then:
- Log into my.contabo.com
- Navigate to VPS → Add VPS → GPU VPS
- Select your plan (I'll use the GPU M / MI210 for this walkthrough)
- Choose Ubuntu 22.04 LTS as your OS — this is the recommended image, and it comes with ROCm 6.0+ pre-installed for AMD GPUs (or CUDA 12.4 for NVIDIA plans)
- Pick your datacenter location — Frankfurt if you want the lowest latency for European users
- Set a root password or upload your SSH key (SSH key is strongly preferred)
- Hit deploy
The VPS provisions almost instantly. You'll get an IP address and your credentials. SSH in:
ssh root@YOUR_IP_ADDRESS
You're in. Let's set up the environment.
Step 2: Prepare the System
Contabo's GPU images come with drivers pre-installed, which saves you the single worst experience in all of computing (manually installing CUDA or ROCm). But we still need to update the system and install some basics.
# Update everything
apt update && apt upgrade -y
# Install essentials
apt install -y curl wget git build-essential python3 python3-pip python3-venv
# Verify your GPU is visible
# For AMD (MI210/MI300X):
rocm-smi
# For NVIDIA (H100/A100):
# nvidia-smi
You should see your GPU listed with its full VRAM. If you're on the MI210 plan, you'll see 64 GB HBM2e. If the GPU doesn't show up, reboot the VPS (reboot) and try again — occasionally the first boot after provisioning needs a kick.
That's it for system prep. No driver installation, no kernel module compilation, no existential dread. Contabo handled it.
Step 3: Install OpenClaw
Docker is the way to go here. The OpenClaw docs recommend it, and it eliminates the "works on my machine" problem entirely.
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh && sh get-docker.sh
systemctl start docker && systemctl enable docker
# Verify Docker is running
docker --version
Now clone the OpenClaw repository and build the Docker image:
# Clone OpenClaw
git clone https://github.com/open-claw/open-claw.git
cd open-claw
# Build the Docker image
# For NVIDIA GPUs:
docker build -t openclaw:latest -f docker/Dockerfile.cuda .
# For AMD GPUs (ROCm):
docker build -t openclaw:latest -f docker/Dockerfile.rocm .
Alternatively, if pre-built images are available on the GitHub Container Registry:
# Pull pre-built (NVIDIA)
docker pull ghcr.io/open-claw/open-claw:latest-cuda12
# Pull pre-built (AMD)
docker pull ghcr.io/open-claw/open-claw:latest-rocm6
Building from source takes 10–15 minutes on Contabo's hardware. Pulling a pre-built image is faster if one's available for your platform. Check the OpenClaw GitHub releases for the latest tags.
Step 4: Download Your Model
You need actual model weights to serve. Let's grab Llama 3.1 70B in a quantized format — this is one of the best open-weight models available and runs well within the MI210's 64 GB VRAM when quantized to Q4.
# Install the Hugging Face CLI
pip install huggingface-hub
# Log in (required for gated models like Llama)
huggingface-cli login
# Paste your HF token when prompted — get one at huggingface.co/settings/tokens
# Create a models directory
mkdir -p /models
# Download the model
huggingface-cli download meta-llama/Llama-3.1-70B \
--local-dir /models/llama-3.1-70b \
--local-dir-use-symlinks False
This is a big download — roughly 40 GB for the quantized version. On Contabo's 1 Gbps connection, expect it to take 10–15 minutes. Go make coffee.
Tip: If you're running smaller experiments first, grab a 7B or 8B model instead. Something like meta-llama/Llama-3.1-8B will download in under a minute and lets you verify everything works before committing to the big model.
Step 5: Launch OpenClaw
Here's where it all comes together. One Docker command and you have a live inference API:
docker run -d \
--name openclaw-server \
--gpus all \
-p 8000:8000 \
-v /models:/models \
openclaw:latest \
python -m open_claw.serve \
--model /models/llama-3.1-70b \
--host 0.0.0.0 \
--port 8000 \
--quantization q4_k_m
Breaking this down:
--gpus all— exposes all GPUs to the container (use--deviceflags for ROCm if needed)-p 8000:8000— maps port 8000 to the host-v /models:/models— mounts your model directory--quantization q4_k_m— Q4_K_M quantization, which cuts VRAM usage by 50–75% with minimal quality loss--host 0.0.0.0— listens on all interfaces (not just localhost)
Watch the logs to make sure everything loads:
docker logs -f openclaw-server
You'll see it loading model shards, allocating VRAM, and eventually printing something like Serving on 0.0.0.0:8000. Model loading takes 1–3 minutes depending on size.
Now test it:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-70b",
"messages": [{"role": "user", "content": "What is the capital of France?"}],
"max_tokens": 100
}'
You should get a JSON response with the model's answer. Congratulations — you have a self-hosted, OpenAI-compatible inference API running on a €99/month VPS.
From any machine, you can now hit this endpoint:
curl http://YOUR_IP:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-70b",
"messages": [{"role": "user", "content": "Explain quantum computing in simple terms."}]
}'
Or use it with the OpenAI Python SDK:
from openai import OpenAI
client = OpenAI(
base_url="http://YOUR_IP:8000/v1",
api_key="not-needed" # OpenClaw doesn't require auth by default
)
response = client.chat.completions.create(
model="llama-3.1-70b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
This is the beauty of OpenClaw's OpenAI-compatible API. Any application, library, or framework that works with the OpenAI API works with your self-hosted endpoint. Just change the base_url.
Step 6: Productionize It
A raw Docker container on an open port is fine for testing. For production, you need a few more things.
Firewall
ufw allow 22/tcp # SSH
ufw allow 8000/tcp # OpenClaw API
ufw enable
Nginx Reverse Proxy with SSL
Install Nginx and Certbot for HTTPS:
apt install -y nginx certbot python3-certbot-nginx
# Create Nginx config
cat > /etc/nginx/sites-available/openclaw <<EOF
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host \$host;
proxy_set_header X-Real-IP \$remote_addr;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
}
EOF
ln -s /etc/nginx/sites-available/openclaw /etc/nginx/sites-enabled/
nginx -t && systemctl reload nginx
# Get SSL cert
certbot --nginx -d your-domain.com
Now your API is accessible at https://your-domain.com/v1/chat/completions.
Docker Compose for Auto-Restart
Create a docker-compose.yml so your server survives reboots:
version: "3.8"
services:
openclaw:
image: openclaw:latest
container_name: openclaw-server
restart: always
ports:
- "8000:8000"
volumes:
- /models:/models
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command: >
python -m open_claw.serve
--model /models/llama-3.1-70b
--host 0.0.0.0
--port 8000
--quantization q4_k_m
docker compose up -d
Monitoring
OpenClaw exposes Prometheus-compatible metrics. Pair with Grafana for dashboards:
# Check metrics endpoint
curl http://localhost:8000/metrics
You'll get request counts, latency percentiles, VRAM usage, tokens/second — everything you need to know if your server is healthy.
Performance Expectations
Based on community benchmarks and user reports from places like r/LocalLLaMA:
- Llama 70B Q4 on MI210 (64 GB): ~25–35 tokens/second, sub-1-second time to first token
- Llama 70B Q4 on H100 (80 GB): ~80–120 tokens/second
- Llama 8B FP16 on MI210: ~150+ tokens/second
Contabo's GPUs may undervolt slightly compared to bare-metal colocation, but the difference in real-world inference is marginal — maybe 5–10%. At these prices, nobody's complaining.
Troubleshooting the Common Gotchas
GPU not detected in Docker: Make sure you're using --gpus all for NVIDIA or the correct --device flags for ROCm. Reboot the VPS if it just provisioned.
Out of memory errors: Use heavier quantization (q4_k_m instead of fp16), reduce --max-seq-len to 4096 or 2048, or step up to a bigger plan. Monitor with watch -n1 rocm-smi or watch -n1 nvidia-smi.
Gated model download fails: You need to accept the model's license on Hugging Face's website first, then use huggingface-cli login with a valid token.
Slow Docker build: Contabo's bandwidth is 1 Gbps, which is fine, but building from source compiles CUDA kernels which is CPU-bound. Use pre-built images when available.
High latency from US/Asia: Choose a different Contabo datacenter, or put Cloudflare Tunnel in front of your endpoint for global edge caching of static responses.
The Real Cost Comparison
Let's put this in perspective. Running Llama 70B inference 24/7:
- Contabo GPU M (MI210): €99/month
- RunPod equivalent: ~$200/month
- AWS g5.4xlarge (A10G, 24GB — not even enough VRAM): ~$1,200/month
- OpenAI API (equivalent throughput): $2,000–5,000/month depending on usage
Contabo is not the fastest. It's not the most polished cloud provider. The control panel looks like it was designed in 2010. But for raw price-to-performance on GPU compute, nothing touches it in Europe.
What to Build With This
Now that you have a self-hosted OpenClaw API, the possibilities open up. Check out the Claw Mart listings for pre-built tools and integrations that plug directly into OpenAI-compatible endpoints — chatbots, RAG pipelines, agent frameworks, and more. Since OpenClaw serves the standard /v1/chat/completions endpoint, anything in the Claw Mart ecosystem that works with OpenAI works with your server.
A few ideas to start:
- Private AI assistant for your team — no data leaving your infrastructure
- RAG pipeline with your company's documents
- Multi-agent system where cost-per-token is zero after the VPS bill
- Fine-tuned model serving for domain-specific applications
Next Steps
- Sign up for Contabo and deploy a GPU M instance — use hourly billing to test first
- Follow the steps above to get OpenClaw running with a small model (8B) for validation
- Scale up to your target model once everything checks out
- Browse Claw Mart for OpenClaw-compatible tools to build on top of your new inference server
- Join the OpenClaw community on GitHub for updates, benchmarks, and support
You now have a production-grade LLM inference API running on your own hardware, in a European datacenter, for less than a Netflix family plan. That's the kind of asymmetric advantage that makes self-hosting worth the 30 minutes of setup.