Claw Mart
← Back to Blog
March 20, 20269 min readClaw Mart Team

OpenClaw Configuration File Explained (config.yml)

OpenClaw Configuration File Explained (config.yml)

OpenClaw Configuration File Explained (config.yml)

If you've ever stared at a YAML file for forty-five minutes wondering why your entire OpenClaw agent refuses to boot — only to discover you used a tab instead of two spaces on line 47 — welcome to the club. You're not alone, and no, you're not stupid. The config.yml file in OpenClaw is simultaneously the most powerful and most frustrating part of the entire platform.

Here's the thing: once you actually understand what every section of this file does and why it's there, OpenClaw goes from "mysterious framework that sometimes works" to "incredibly flexible tool that does exactly what you tell it to." The problem isn't OpenClaw. The problem is that nobody has sat down and explained the configuration file in plain language.

That's what we're doing today. Every section, every key decision point, every place where you're likely to shoot yourself in the foot. Let's fix this once.

Why the Config File Matters More Than You Think

OpenClaw gives your AI agent control over a real computer environment — mouse movements, keyboard input, screenshots, scrolling, clicking, the works. That's an enormous amount of power, and the config.yml is the single file that governs all of it. It determines:

  • What actions your agent is allowed to take
  • How it sees the screen (resolution, compression, format)
  • Which LLM backbone powers the decision-making
  • How the environment is set up (display, scaling, connection)
  • Security boundaries and safety rails
  • Retry behavior, timeouts, and error handling

Skip the config or half-understand it, and you'll get agents that burn through tokens doing nothing, try actions that aren't permitted, or literally can't see the screen properly. Get it right, and you have a precisely scoped, efficient, safe agent that does real work.

The Top-Level Structure

A well-organized OpenClaw config.yml breaks down into five major sections. Here's the skeleton:

environment:
  # Display, resolution, connection settings

observations:
  # How the agent "sees" the screen

actions:
  # What the agent is allowed to do

llm:
  # Model configuration and API settings

agent:
  # Behavior, retry logic, memory, and orchestration

Every section talks to the others. Your observations settings need to match your environment resolution. Your actions list determines what the agent section can actually execute. Think of it as a contract: the config defines the rules, and OpenClaw enforces them at runtime.

Let's break each one down.

Section 1: environment

This is where you define the actual computing environment your agent operates in.

environment:
  display:
    width: 1920
    height: 1080
    scaling_factor: 1.0
    color_depth: 24
  connection:
    type: vnc
    host: localhost
    port: 5900
    password: ${VNC_PASSWORD}
  capture:
    method: screenshot
    format: png
    compression: 6
  headless: false

The stuff that trips people up:

scaling_factor — This is the single most common source of "my agent can't click on the right thing" bugs. If your host machine runs at 2x DPI (like a Retina MacBook) but the VNC target is at 1x, your coordinates will be off by a factor of two. The agent will click on empty space, you'll waste thirty minutes, and then you'll find this setting. Set it explicitly. Always. If you're running in a Docker container, it's almost certainly 1.0.

compression — PNG compression level from 0-9. Higher means smaller files but slower encoding. For most setups, 6 is the sweet spot. If you're running on a beefy machine and want the agent to process screenshots faster, drop it to 3. If bandwidth is constrained (remote VNC), push it to 8.

Environment variables — See that ${VNC_PASSWORD}? OpenClaw supports environment variable interpolation. Use it. Never hardcode passwords or API keys in your YAML. I've seen people commit secrets to public repos because they "were going to change it later." They didn't.

headless — Set to true when running in CI/CD or a cloud container with no physical display. OpenClaw will use a virtual framebuffer (Xvfb) automatically. Set to false when you're developing locally and want to actually watch what the agent does — which you should, at least early on.

Section 2: observations

This controls how OpenClaw captures and processes what the agent "sees."

observations:
  type: screenshot
  format: base64
  resize:
    enabled: true
    max_width: 1280
    max_height: 720
  annotate:
    enabled: true
    show_cursor: true
    grid_overlay: false
    element_labels: true
  frequency: after_action
  history:
    enabled: true
    max_frames: 5

Key decisions here:

format: base64 vs format: path — If your LLM supports vision (and it should, if you're doing computer use), base64 sends the screenshot directly as a base64-encoded image in the API call. path saves it to disk and passes the file path, which is useful if you're doing custom processing or logging but adds latency. For most people: use base64.

resize — This is critical for token management. A raw 1920x1080 screenshot is huge. Resizing to 1280x720 before sending to the LLM cuts your token usage significantly with minimal loss of visual fidelity. I'd argue 1280x720 should be the default for almost everyone. Only go full resolution if your agent needs to read tiny text or interact with dense UIs.

annotate — One of OpenClaw's best features. When element_labels is enabled, OpenClaw overlays numbered labels on interactive elements before sending the screenshot to the LLM. Instead of the agent needing to figure out pixel coordinates for "the blue button that says Submit," it can say "click element 14." This dramatically improves accuracy and reduces token waste from the agent describing what it sees.

frequency: after_action — Takes a new screenshot after every action. You can also set this to on_demand (agent explicitly requests a screenshot) or timed with an interval. after_action is the safest default — the agent always has fresh visual context.

history.max_frames: 5 — Keeps the last 5 screenshots in context. More frames = better context for the agent but higher token costs. For straightforward tasks (fill out a form, navigate a site), 3 is fine. For complex multi-step workflows, bump it to 5-8.

Section 3: actions

This is where security meets functionality. You're explicitly whitelisting what your agent can do.

actions:
  allowed:
    - mouse_move
    - left_click
    - right_click
    - double_click
    - type_text
    - press_key
    - hotkey
    - scroll_up
    - scroll_down
    - screenshot
    - wait
  restricted:
    - shell_command
    - file_delete
    - system_shutdown
  parameters:
    mouse_move:
      speed: normal
      smooth: true
    type_text:
      delay_between_keys: 50
      clear_before_type: false
    scroll_up:
      amount: 3
    scroll_down:
      amount: 3
    wait:
      default_seconds: 2
      max_seconds: 10

This is the section where most people screw up, and it costs them real money.

Scenario you will encounter: You set up an agent to do web research. You define 12 actions but forget to include scroll_down. The agent loads a page, sees the content is below the fold, tries to scroll, gets an "action not permitted" error, retries, fails again, tries to find another way to see the content, burns 40 API calls, and eventually gives up. You just spent $3 in tokens because you forgot one line in your YAML.

My recommendation: Start with the full list above. It covers 95% of use cases. Only remove actions when you have a specific security reason to do so (like removing shell_command when the agent shouldn't have terminal access, which is almost always).

The parameters sub-section is where you fine-tune behavior. smooth: true on mouse movements makes the cursor travel in a human-like arc instead of teleporting — important if you're interacting with sites that detect bot-like behavior. delay_between_keys at 50ms simulates natural typing speed. clear_before_type: false means the agent will append to existing text in a field rather than clearing it first — set to true if your agent keeps appending to form fields instead of replacing them.

The restricted list is a blocklist that overrides everything. Even if something somehow gets requested, these actions will be hard-rejected. Use this for dangerous operations you never want the agent to perform.

Section 4: llm

Your model configuration.

llm:
  provider: openai
  model: gpt-4o
  api_key: ${OPENAI_API_KEY}
  temperature: 0.1
  max_tokens: 4096
  timeout: 60
  retry:
    max_attempts: 3
    backoff: exponential
    initial_delay: 1
  vision:
    enabled: true
    detail: high
  fallback:
    provider: openai
    model: gpt-4o-mini
    trigger: rate_limit

temperature: 0.1 — For computer use, you want low temperature. The agent needs to be precise and deterministic, not creative. I've tested values from 0.0 to 0.5, and 0.1 gives the best balance of reliability and the occasional ability to problem-solve when something unexpected happens. At 0.0, agents sometimes get stuck in loops. At 0.3+, they start making weird decisions.

vision.detail: high — This controls image token resolution for vision-capable models. high sends more image tokens but gives the model better ability to read text and identify small UI elements. low is cheaper but the agent will struggle with dense interfaces. Use high unless you're on a tight budget and working with simple, large-element UIs.

fallback — Incredibly useful. If your primary model hits a rate limit, OpenClaw automatically falls back to the specified model. You can set trigger to rate_limit, timeout, or error. This keeps your agent running instead of crashing at 2 AM when you're not watching.

A note on API keys: Use ${OPENAI_API_KEY} syntax and set it in your environment or a .env file. OpenClaw loads .env files automatically from the project root. Never, ever put your actual key in the YAML.

Section 5: agent

The orchestration layer.

agent:
  name: web-researcher
  max_steps: 100
  step_timeout: 30
  system_prompt: |
    You are a precise web research agent. You interact with a computer
    desktop through mouse and keyboard actions. Always take a screenshot
    after navigation to verify the page loaded correctly. Be methodical
    and verify each step before proceeding.
  memory:
    type: sliding_window
    window_size: 20
  planning:
    enabled: true
    replan_frequency: 10
  safety:
    confirm_before: 
      - file_delete
      - shell_command
    max_cost_usd: 5.00
    stop_on_error_count: 5

max_steps: 100 — Hard limit on how many actions the agent can take. This is your financial safety net. Without it, a confused agent can loop indefinitely, racking up API costs. 100 is reasonable for most tasks. Simple tasks (log in, click a button, extract text) might need 20. Complex multi-page workflows might need 200.

system_prompt — This is your agent's personality and instruction set. Be specific. "You are a helpful agent" is garbage. Tell it exactly what it's doing, how it should verify its work, and what to do when confused. The pipe (|) syntax preserves newlines in YAML, which makes multi-line prompts readable.

memory.type: sliding_window — Keeps the last N steps in the LLM context. This is the most token-efficient option. Other options include full (entire conversation history, expensive) and summary (periodically summarizes older steps, good balance for long tasks). Start with sliding_window at size 20.

planning.enabled: true — When enabled, the agent creates a high-level plan before acting and re-evaluates every replan_frequency steps. This prevents the agent from getting lost in tactical details. It costs a few extra tokens per planning cycle but dramatically improves success rates on multi-step tasks.

safety.max_cost_usd: 5.00 — OpenClaw tracks estimated API costs in real-time and stops the agent if this threshold is hit. Set this. Always. I don't care if you're a funded startup with money to burn. One runaway agent at 3 AM can cost you hundreds of dollars.

The Complete File

Here's a production-ready config.yml you can copy and adapt:

environment:
  display:
    width: 1920
    height: 1080
    scaling_factor: 1.0
    color_depth: 24
  connection:
    type: vnc
    host: localhost
    port: 5900
    password: ${VNC_PASSWORD}
  capture:
    method: screenshot
    format: png
    compression: 6
  headless: false

observations:
  type: screenshot
  format: base64
  resize:
    enabled: true
    max_width: 1280
    max_height: 720
  annotate:
    enabled: true
    show_cursor: true
    grid_overlay: false
    element_labels: true
  frequency: after_action
  history:
    enabled: true
    max_frames: 5

actions:
  allowed:
    - mouse_move
    - left_click
    - right_click
    - double_click
    - type_text
    - press_key
    - hotkey
    - scroll_up
    - scroll_down
    - screenshot
    - wait
  restricted:
    - shell_command
    - file_delete
    - system_shutdown
  parameters:
    mouse_move:
      speed: normal
      smooth: true
    type_text:
      delay_between_keys: 50
      clear_before_type: false
    scroll_up:
      amount: 3
    scroll_down:
      amount: 3
    wait:
      default_seconds: 2
      max_seconds: 10

llm:
  provider: openai
  model: gpt-4o
  api_key: ${OPENAI_API_KEY}
  temperature: 0.1
  max_tokens: 4096
  timeout: 60
  retry:
    max_attempts: 3
    backoff: exponential
    initial_delay: 1
  vision:
    enabled: true
    detail: high
  fallback:
    provider: openai
    model: gpt-4o-mini
    trigger: rate_limit

agent:
  name: web-researcher
  max_steps: 100
  step_timeout: 30
  system_prompt: |
    You are a precise web research agent. You interact with a computer
    desktop through mouse and keyboard actions. Always take a screenshot
    after navigation to verify the page loaded correctly. Be methodical
    and verify each step before proceeding.
  memory:
    type: sliding_window
    window_size: 20
  planning:
    enabled: true
    replan_frequency: 10
  safety:
    confirm_before:
      - file_delete
      - shell_command
    max_cost_usd: 5.00
    stop_on_error_count: 5

Validation: Don't Skip This

Before you run anything, validate your config. If OpenClaw ships with a validation command, use it:

openclaw config validate --file config.yml

If it doesn't, or you want an extra layer, use a YAML linter:

pip install yamllint
yamllint config.yml

This catches indentation errors, tab/space mixing, and structural issues before they become runtime mysteries. Make this part of your workflow. Every time you edit the config, lint it before running the agent.

The Shortcut: Felix's OpenClaw Starter Pack

I'll be honest — writing this post took me back to my first week with OpenClaw, where I burned an embarrassing amount of time troubleshooting config issues that turned out to be minor parameter mismatches. If you want to skip that entire phase, Felix's OpenClaw Starter Pack on Claw Mart is $29 and comes with pre-configured skills and config templates that already have all of this dialed in. The action parameters are tuned, the observation settings are optimized for common use cases, and the safety rails are set to sane defaults. It's the config I wish I'd had on day one. You can always customize from there once you understand what each setting does — which, after reading this post, you now do.

What to Do Next

  1. Copy the complete config above and save it as config.yml in your OpenClaw project root.
  2. Create a .env file with your actual API keys and VNC password.
  3. Run the linter to make sure everything is clean.
  4. Start with a simple task — have the agent open a browser and navigate to a URL. Watch it work. Watch the screenshots it takes. Check that clicks land where they should.
  5. Tune from there. Adjust resize dimensions, tweak temperature, modify the system_prompt for your specific use case.

The config file isn't the exciting part of building computer-use agents. But it's the foundation that everything else depends on. Get it right, and the rest of OpenClaw just works. Get it wrong, and you'll spend more time debugging YAML than building anything useful.

Now go build something.

Claw Mart Daily

Get one AI agent tip every morning

Free daily tips to make your OpenClaw agent smarter. No spam, unsubscribe anytime.

More From the Blog