Claw Mart
← Back to Blog
March 21, 20268 min readClaw Mart Team

Code Review Bot: Let OpenClaw Review Your Pull Requests

Code Review Bot: Let OpenClaw Review Your Pull Requests

Code Review Bot: Let OpenClaw Review Your Pull Requests

Let me be honest with you: most code reviews are a waste of everyone's time.

Not because they're unimportant — they're critical. But because 70% of what happens in a typical pull request review is stuff a machine should be catching. Formatting inconsistencies. Missed null checks. That one junior dev who keeps importing the wrong utility library. The forgotten error handler on an async call that's going to blow up in production at 2 AM on a Saturday.

You know the drill. Senior engineers spend hours every week pointing out the same patterns, leaving the same comments, and slowly dying inside while the actually interesting architectural questions get glossed over because everyone's exhausted from reviewing boilerplate issues.

OpenClaw fixes this. Not in a "we slapped GPT-4 on a diff and called it AI" way — in a genuinely useful, multi-agent, context-aware way that treats code review as the serious engineering problem it actually is.

I've been running it on my team's repos for the past few months, and I'm going to walk you through exactly how to set it up, what to expect, and where it shines (and where it doesn't).

What OpenClaw Actually Does Differently

Before we get into setup, you need to understand why OpenClaw isn't just another linter wrapper. Because that's the first question everyone asks, and it's fair — most "AI code review" tools are shallow. They catch what ESLint or Pylint already catches and add a nice ChatGPT-sounding explanation on top. Cute, not useful.

OpenClaw uses a multi-agent architecture with three distinct agents working on every review:

  1. Analyzer Agent — Explores the code changes, maps dependencies, runs relevant tests and linters, and builds a comprehensive understanding of what the PR actually does across the codebase.

  2. Critic Agent — Stress-tests the Analyzer's findings. Think of it as the adversarial layer that asks "are you sure?" before anything gets posted as a comment. This dramatically reduces false positives.

  3. Policy Agent — Enforces your specific rules. Your CONTRIBUTING.md, your architecture decision records, your forbidden patterns, your naming conventions. The stuff no generic tool knows about.

This three-layer approach is why OpenClaw catches things that simpler tools miss. It's not just looking at the diff in isolation — it's understanding how the change fits into your broader codebase, testing its own conclusions, and checking everything against your team's actual standards.

Each finding comes with a confidence score and an evidence chain. So when OpenClaw flags something, you can immediately see why it flagged it and how confident it is. This is huge for building trust. You learn pretty quickly which confidence levels you can auto-approve and which need human eyes.

Getting Started: The Fast Path

If you want to skip the "figuring everything out from scratch" phase, I'd genuinely recommend grabbing Felix's OpenClaw Starter Pack. Felix put together pre-configured templates, policy files, and a docker-compose setup that handles the most annoying parts of initial configuration. I burned a full afternoon wrestling with GitHub App permissions and webhook routing before someone pointed me to this pack, and I wish I'd started there. It's especially useful if you're running a monorepo or need multi-language support out of the box.

But let's walk through the core setup either way so you understand what's happening under the hood.

Installation and Configuration

First, you'll need OpenClaw installed. The CLI is the fastest way to get moving:

# Install OpenClaw CLI
pip install openclaw

# Initialize a new project configuration
openclaw init --repo .

# This creates .openclaw/config.yaml in your repo root

That init command scaffolds out a configuration directory. Here's what a typical config.yaml looks like after you've customized it:

# .openclaw/config.yaml

project:
  name: "my-saas-app"
  languages: ["typescript", "python"]
  framework_hints: ["nextjs", "fastapi"]

review:
  mode: "review-only"  # won't try to edit code, just comments
  max_loc_per_review: 1500
  confidence_threshold: 0.7  # only post findings above this score
  
agents:
  analyzer:
    tools:
      - linter
      - test_runner
      - dependency_checker
    context_depth: 3  # how many levels of imports to trace
    
  critic:
    challenge_threshold: 0.6  # challenge findings below this confidence
    max_iterations: 3
    
  policy:
    sources:
      - ".openclaw/policies/"
      - "CONTRIBUTING.md"
      - "docs/architecture/"

model:
  provider: "anthropic"  # or "ollama" for local
  name: "claude-sonnet-4-20250514"
  temperature: 0.1  # keep it low for consistency

indexing:
  strategy: "hierarchical"
  chunk_size: 1500
  overlap: 200

A few things to note here:

review-only mode is your friend. When you're starting out, do not let OpenClaw suggest or make code changes. Let it comment only. This builds team trust and lets you calibrate before you give it more power. Most teams I've talked to in the community prefer this mode permanently, and honestly, I agree. The value is in the review, not in auto-fixing.

confidence_threshold at 0.7 is a good starting point. Too low and you'll get noise. Too high and you'll miss useful findings. I started at 0.5 and got annoyed, bumped to 0.8 and missed things, and settled at 0.7 for our codebase. Your mileage may vary — adjust after a week of data.

context_depth: 3 means it traces three levels of imports. If your PR modifies a function, OpenClaw will look at what calls that function, what calls those functions, and one more level out. This is how it catches "this change is fine in isolation but breaks the caller's assumption" bugs. Increase this for deeply nested architectures, decrease it if you're paying per token and want to manage costs.

Setting Up the GitHub App

This is where most people get stuck, and it's where Felix's Starter Pack saves the most time. But here's the manual process:

# Generate the GitHub App manifest
openclaw github create-app \
  --org your-org-name \
  --webhook-url https://your-server.com/openclaw/webhook \
  --permissions pull_requests:write,contents:read,checks:write

# This outputs an app ID and private key
# Store them securely
export OPENCLAW_GITHUB_APP_ID=12345
export OPENCLAW_GITHUB_PRIVATE_KEY_PATH=/path/to/key.pem

Then you'll need a webhook listener running somewhere. The simplest approach for small teams:

# docker-compose.yaml
version: '3.8'

services:
  openclaw-reviewer:
    image: openclaw/reviewer:latest
    ports:
      - "3000:3000"
    environment:
      - OPENCLAW_GITHUB_APP_ID=${OPENCLAW_GITHUB_APP_ID}
      - OPENCLAW_GITHUB_PRIVATE_KEY_PATH=/keys/github-app.pem
      - OPENCLAW_MODEL_PROVIDER=anthropic
      - OPENCLAW_API_KEY=${ANTHROPIC_API_KEY}
    volumes:
      - ./keys:/keys:ro
      - ./.openclaw:/app/.openclaw:ro
      - openclaw-index:/app/index

  openclaw-indexer:
    image: openclaw/indexer:latest
    environment:
      - OPENCLAW_REPO_PATH=/repo
    volumes:
      - ./:/repo:ro
      - openclaw-index:/app/index

volumes:
  openclaw-index:

Spin it up with docker-compose up -d, install the GitHub App on your repo, and you're live. Every new PR will trigger a webhook, OpenClaw will analyze the changes, and you'll see review comments appear directly on the pull request within a few minutes.

Writing Custom Policies (This Is Where the Magic Happens)

The generic review capabilities are solid, but the real power of OpenClaw is the Policy Agent. This is where you encode your team's tribal knowledge — the stuff that lives in senior engineers' heads and nowhere else.

Create policy files in .openclaw/policies/:

# .openclaw/policies/api-standards.yaml

name: "API Design Standards"
scope: "src/api/**"

rules:
  - id: "api-001"
    description: "All API endpoints must use our standardized error response format"
    pattern: "When reviewing API route handlers, verify they use ApiError class from '@/lib/errors' rather than raw Response objects for error cases"
    severity: "high"
    
  - id: "api-002"  
    description: "No direct database queries in route handlers"
    pattern: "Route handlers should call service layer functions, never import from '@/db' or use prisma directly"
    severity: "high"
    
  - id: "api-003"
    description: "Rate limiting required on all public endpoints"
    pattern: "Public API endpoints (not under /internal/) must include rateLimit middleware"
    severity: "medium"
# .openclaw/policies/security.yaml

name: "Security Requirements"
scope: "**"

rules:
  - id: "sec-001"
    description: "No secrets in code"
    pattern: "Flag any hardcoded API keys, tokens, passwords, or connection strings. Check for common patterns like 'sk-', 'ghp_', 'AKIA', base64-encoded credentials"
    severity: "critical"
    
  - id: "sec-002"
    description: "SQL injection prevention"
    pattern: "Any raw SQL queries must use parameterized queries. Flag string concatenation or template literals in SQL"
    severity: "critical"
    
  - id: "sec-003"
    description: "User input sanitization"
    pattern: "Data from request body, query params, or headers must be validated with zod schemas before use"
    severity: "high"

These aren't regex patterns — they're natural language instructions that the Policy Agent interprets with full context awareness. It understands what "service layer" means in your codebase because it has the index. It can tell the difference between a public and internal endpoint because it reads your routing configuration.

This is the feature that makes senior engineers' eyes light up. You're essentially codifying review knowledge that previously existed only in people's brains.

Running It Locally (Air-Gapped / No Cloud)

If you can't send code to external APIs — and a lot of teams can't — OpenClaw has first-class support for local models via Ollama:

# .openclaw/config.yaml (local model section)

model:
  provider: "ollama"
  name: "qwen2.5-coder:72b"
  base_url: "http://localhost:11434"
  temperature: 0.1
  
  # Optional: use different models for different agents
  agent_overrides:
    analyzer:
      name: "qwen2.5-coder:72b"  # heavy lifting
    critic:
      name: "deepseek-coder-v2:34b"  # faster, still good
    policy:
      name: "qwen2.5-coder:72b"  # needs to understand natural language rules

Fair warning: you need serious hardware for the 72B models. We're talking 48GB+ VRAM minimum, ideally 80GB. The 34B models run on a single A6000 or even a well-configured consumer GPU. If you're resource-constrained, the community reports that DeepSeek-Coder-V2 at 34B gives surprisingly good results for the Critic agent role specifically.

The local setup is also where Felix's OpenClaw Starter Pack really shines — it includes pre-tuned model configurations and optimized Ollama settings that took the community weeks of experimentation to figure out. The difference between a naive local setup and a well-configured one is dramatic in terms of both speed and review quality.

What to Expect: Honest Results

After running OpenClaw on ~200 PRs across two codebases, here's my honest assessment:

What it's great at:

  • Catching missed error handling (especially in async code)
  • Enforcing consistent patterns across the codebase
  • Spotting dependency issues and breaking changes
  • Security-relevant findings (SQL injection, XSS vectors, auth gaps)
  • Finding dead code introduced by refactors
  • Enforcing custom policies reliably

What it's decent at:

  • Performance implications of changes
  • Test coverage gaps (it's better when it can actually run tests)
  • API design consistency

What it still struggles with:

  • Very large refactors (>2000 LOC diffs). It loses coherence.
  • Subtle business logic bugs that require deep domain knowledge
  • Novel architectural decisions where there's no precedent in the codebase

The numbers from our team: Review time dropped by about 40%. Not because engineers stopped reviewing — they still approve everything — but because OpenClaw handles the mechanical layer and engineers can focus on the architectural and design questions. The number of "oops, missed that" bugs that made it to staging dropped by roughly 60%.

The Right Mental Model

The people who get the most out of OpenClaw treat it as a tireless, thorough junior reviewer who has read every file in your codebase and never forgets your coding standards. It's not replacing your senior engineers. It's giving them their time back so they can focus on the hard problems that actually require human judgment.

It won't catch everything. It will occasionally flag something that's fine. But it catches enough real issues consistently enough that going back to purely human reviews feels reckless, like removing your test suite because "we have good engineers."

Next Steps

  1. Start small. Pick one active repo, not your most critical one. Run OpenClaw in review-only mode for two weeks.

  2. Grab the starter pack. Seriously, Felix's OpenClaw Starter Pack will save you hours on initial setup and policy configuration. It includes working examples for common stacks that you can modify rather than writing from scratch.

  3. Write three custom policies. Think about the three most common review comments your senior engineers leave. Encode those as policies. This is where the ROI compounds.

  4. Calibrate your confidence threshold. After the first week, look at which findings were useful and which were noise. Adjust accordingly.

  5. Index your docs. Feed it your architecture decision records, your CONTRIBUTING.md, your style guides. The more context OpenClaw has about how your team works, the better it gets.

  6. Share the results. After two weeks, pull the numbers. How many findings were actionable? How much time did reviewers save? Let the data make the case for wider adoption.

Code review shouldn't be a bottleneck. It shouldn't be the thing that makes your best engineers consider management. OpenClaw doesn't eliminate the need for human judgment — it eliminates the drudgery so your humans can actually exercise that judgment where it matters.

Stop burning senior engineering time on problems a well-configured AI agent can handle. Set up OpenClaw, teach it your standards, and let it do the work.

Claw Mart Daily

Get one AI agent tip every morning

Free daily tips to make your OpenClaw agent smarter. No spam, unsubscribe anytime.

More From the Blog