Email Triage Agent: Automatically Sort and Reply to Emails
Email Triage Agent: Automatically Sort and Reply to Emails

Let me be honest with you: your inbox is a disaster.
Not because you're bad at email. Because email is bad at being email. Every morning you open it up and there's a swamp of newsletters you forgot you subscribed to, three genuinely urgent messages buried under eleven that aren't, a cold pitch from someone who clearly didn't read your LinkedIn profile, and a reply-all thread that should have died six messages ago.
You've tried filters. You've tried labels. You've tried that thing where you only check email twice a day (lasted about a day and a half). The problem isn't discipline. The problem is that email triage is genuinely tedious cognitive work, and you're doing it manually, hundreds of times a week, like some kind of digital-age sharecropper.
Here's the good news: this is a solved problem now. Not "solved" in the way that tech blogs usually mean, where you need to duct-tape together four APIs, write a thousand lines of glue code, and pray nothing breaks. Actually solved, in the sense that you can build an AI agent that reads your email, understands what matters, sorts it, drafts replies for the routine stuff, and flags the important things for your attention β all running on your own infrastructure, without shipping your private emails to some third-party server.
The tool that makes this possible is OpenClaw, and I'm going to walk you through exactly how to set it up.
Why Most Email Automation Fails
Before we get into the build, let's talk about why the obvious approaches don't work.
Gmail filters are static. They match keywords. They can't understand that an email from your biggest client saying "no rush" actually means "I need this by Thursday." They can't detect that a message labeled "FYI" from your CEO is actually a veiled request for action. Context is everything in email, and filters have zero context.
Generic AI agents (the kind you see in YouTube tutorials) have the opposite problem. They're too powerful and too dumb at the same time. They can understand context beautifully, but they have no guardrails. There's a recurring horror story on r/AI_Agents about someone whose email agent sent a hallucinated reply to a client with made-up pricing. Another person's agent archived an email from their landlord about a lease renewal because it "looked like spam." These aren't edge cases β they're the natural result of giving an LLM unrestricted access to your inbox without purpose-built safety controls.
The real requirements for email triage are specific:
- The agent needs to understand email threads, not just individual messages.
- It needs a permission system β some actions (like archiving newsletters) are low-risk, while others (like replying to your boss) are high-risk.
- It needs to run locally or on your own infrastructure if you're dealing with anything remotely sensitive.
- It needs to explain its decisions so you can audit and trust it.
- It needs human-in-the-loop for anything that could blow up in your face.
This is exactly the problem OpenClaw was built to solve. Not as a demo, not as a side feature β email triage is a first-class use case in OpenClaw's architecture.
What OpenClaw Actually Is
OpenClaw is an open-source AI agent framework designed for real-world automation tasks where safety and privacy matter. Unlike generic agent frameworks that treat email as just another tool to bolt on, OpenClaw has purpose-built components for email: thread-aware fetching, smart reply drafting, label management, attachment handling, and β critically β a granular permission system that lets you control exactly what the agent can and can't do.
It supports local models out of the box (Ollama, LM Studio, vLLM) so your emails never have to leave your machine. It also works with hosted providers like Groq if you prefer speed over absolute privacy. The key point is: you choose, and the framework doesn't force your hand.
It connects to Gmail, Outlook, IMAP, and Exchange. Authentication is handled through a clean OAuth flow that, while still requiring some setup (OAuth is OAuth β nobody's made it fun yet), is significantly less painful than doing it from scratch.
The Architecture
Here's what we're building:
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Email ββββββΆβ Triage ββββββΆβ Action β
β Ingestion β β Reasoning β β Execution β
β (IMAP/API) β β (LLM) β β (Sandboxed) β
βββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
ββββββββββββββββ βββββββββββββββββββ
β Rules β β Human-in-the- β
β Engine β β Loop Queue β
ββββββββββββββββ βββββββββββββββββββ
The clean separation between triage reasoning (the LLM deciding what to do) and action execution (actually doing it) is what makes OpenClaw fundamentally safer than wiring up a LangChain agent with Gmail tools. The LLM proposes actions. The execution layer validates them against your permission rules. If an action exceeds the agent's permissions, it gets routed to a human review queue instead of being executed.
This is the architecture pattern that keeps showing up in discussions among people who've actually deployed email agents in production. OpenClaw just bakes it in from the start.
Setting Up: Step by Step
Step 1: Get Your Environment Running
If you're starting from scratch and want the fastest path to a working setup, I'd genuinely recommend grabbing Felix's OpenClaw Starter Pack. Felix put together a bundle of pre-configured templates, environment configs, and example workflows specifically for getting OpenClaw running without the typical "spend two hours debugging dependency issues" experience. It's worth it just for the time saved on the initial setup, and it includes email triage as one of the core example workflows.
Once you have the base environment:
# Clone the OpenClaw repo
git clone https://github.com/openclaw/openclaw.git
cd openclaw
# Install dependencies
pip install -r requirements.txt
# Copy and configure your environment
cp .env.example .env
Step 2: Configure Your Email Connection
In your .env file, you'll set up your email provider. Here's what it looks like for Gmail:
EMAIL_PROVIDER=gmail
GMAIL_CLIENT_ID=your_client_id
GMAIL_CLIENT_SECRET=your_client_secret
GMAIL_REDIRECT_URI=http://localhost:8080/callback
# Choose your LLM backend
LLM_PROVIDER=ollama
LLM_MODEL=llama3.1:70b
# Or use a hosted provider
# LLM_PROVIDER=groq
# LLM_MODEL=llama-3.1-70b-versatile
# GROQ_API_KEY=your_key
For the Gmail OAuth setup, you'll need to create a project in Google Cloud Console, enable the Gmail API, and create OAuth credentials. This is the most annoying part of the entire process. It takes about 15 minutes. The OpenClaw docs walk through it step by step, and the Starter Pack includes a script that automates most of it.
Step 3: Define Your Triage Rules
This is where it gets fun. OpenClaw uses a combination of YAML configuration and natural language rules to define how the agent should categorize and handle email. Here's an example triage_rules.yaml:
triage:
categories:
- name: urgent
description: "Emails requiring immediate attention"
rules:
- "From my direct manager or C-level executives"
- "Contains words like 'deadline', 'ASAP', 'critical', 'outage'"
- "Client emails mentioning contract, renewal, or cancellation"
action: flag_and_notify
- name: requires_reply
description: "Emails that need a response but aren't urgent"
rules:
- "Direct questions addressed to me"
- "Meeting requests or scheduling discussions"
- "Requests for information I can provide"
action: draft_reply
- name: informational
description: "FYI emails, no action needed"
rules:
- "CC'd emails where I'm not the primary recipient"
- "Status updates and progress reports"
- "Internal announcements"
action: label_and_archive
- name: low_priority
description: "Newsletters, marketing, non-essential"
rules:
- "Marketing emails and newsletters"
- "Automated notifications from tools"
- "Cold outreach and sales pitches"
action: archive
permissions:
auto_archive: true
auto_label: true
auto_reply: false # Require human approval for all replies
auto_delete: false # Never auto-delete
safety:
dry_run: true # Start in dry-run mode
require_confirmation:
- reply
- forward
- delete
max_actions_per_hour: 50
Notice a few things about this config:
Natural language rules. You don't need regex patterns or exact keyword matches. The LLM interprets rules like "Client emails mentioning contract, renewal, or cancellation" with actual comprehension. It'll catch "we're reconsidering our agreement" even though the word "contract" never appears.
Explicit permissions. Auto-reply is off by default. Auto-delete is off. The agent can sort and label freely, but anything that sends data out of your inbox requires human confirmation.
Dry-run mode. When you first set this up, turn on dry_run: true. The agent will process your email and log what it would do without actually doing it. Run it for a day or two, review the logs, and tune your rules before going live. This is the single most important piece of advice I can give you. Do not skip this.
Step 4: Build the Agent
Here's the core agent code:
from openclaw import Agent, EmailToolkit, TriageEngine
from openclaw.models import OllamaProvider
from openclaw.safety import PermissionGuard, HumanReviewQueue
# Initialize the LLM
llm = OllamaProvider(model="llama3.1:70b")
# Set up email tools with permission guards
email_tools = EmailToolkit(
provider="gmail",
permissions_config="triage_rules.yaml"
)
# Initialize the triage engine
triage = TriageEngine(
rules_path="triage_rules.yaml",
llm=llm
)
# Set up human review for high-risk actions
review_queue = HumanReviewQueue(
notification_method="slack", # or "email", "webhook", "terminal"
slack_webhook="https://hooks.slack.com/your-webhook"
)
# Build the agent
agent = Agent(
name="email_triage",
llm=llm,
tools=email_tools,
triage_engine=triage,
safety=PermissionGuard(
config="triage_rules.yaml",
review_queue=review_queue
),
verbose=True # See what the agent is thinking
)
# Run the agent
agent.run(
mode="continuous", # or "single_pass" for one-time processing
poll_interval=300, # Check every 5 minutes
batch_size=20 # Process 20 emails per cycle
)
The verbose=True flag is important for the first few days. It outputs the agent's reasoning for every decision:
[TRIAGE] Processing: "Re: Q4 Budget Review" from sarah@company.com
[REASONING] Sender is in contacts list (CFO). Subject mentions budget.
Thread has 4 prior messages. Latest message asks "Can you confirm
the marketing allocation by Wednesday?"
[DECISION] Category: requires_reply (direct question, deadline implied)
[ACTION] Draft reply queued for human review
[DRAFT] "Hi Sarah, I'll confirm the marketing allocation numbers and
have them to you by end of day Tuesday. Let me know if you need
anything else in the meantime."
[STATUS] Awaiting human approval in review queue
This transparency is what separates a tool you can trust from a black box you're terrified of.
Step 5: Running It for Real
After a day or two in dry-run mode, you'll want to tune your rules. Common adjustments:
Thread handling. By default, OpenClaw reads the full thread for context. If your threads are very long (50+ messages), you might want to set a max_thread_depth to control costs:
triage:
thread_handling:
max_depth: 10 # Only read last 10 messages in a thread
summarize_older: true # Summarize anything beyond that
Sender-based rules. You'll quickly realize that who sent the email matters more than what's in it. OpenClaw supports contact-based routing:
contacts:
vip:
- "*@bigclient.com"
- "boss@company.com"
- "ceo@company.com"
trusted_auto_reply:
- "scheduling@calendly.com"
- "noreply@github.com"
Cost optimization. If you're using a hosted LLM provider, every email costs tokens. A few tricks that people in the OpenClaw community have figured out:
- Use a small, fast model for initial classification (is this spam? yes/no)
- Only invoke the full model for emails that need nuanced judgment
- Cache classifications for recurring senders
agent = Agent(
# Fast model for simple classification
classifier_llm=OllamaProvider(model="llama3.2:3b"),
# Full model for nuanced triage and reply drafting
reasoning_llm=OllamaProvider(model="llama3.1:70b"),
two_stage=True
)
What About Privacy?
This is the question that comes up more than any other in every community discussion I've seen. And it's the right question.
If you're running OpenClaw with Ollama or LM Studio, your emails never leave your machine. Period. The LLM runs locally, the email connection is direct, and nothing is logged to any external service. This is the setup I'd recommend for anyone dealing with work email, legal correspondence, financial information, or really anything you wouldn't want a random company's employees to potentially read.
For personal email where the stakes are lower, using Groq or another hosted provider is fine and will be faster. Just understand the tradeoff you're making.
OpenClaw's architecture makes this a configuration choice, not an architectural decision. You can switch between local and hosted models by changing two lines in your .env file.
Common Pitfalls
After reading through hundreds of posts from people building email agents, these are the mistakes that keep coming up:
Not starting with dry-run. I said it before, I'll say it again. Run in dry-run mode first. Review the logs. Adjust your rules. Then go live. The people who skip this step are the same people posting "my agent sent a weird reply to my boss" on Reddit.
Too many categories. Start with 4-5 categories max. You can always add more later. People who start with 15 categories end up with an agent that's confused about edge cases and misclassifies constantly.
Ignoring thread context. A single email that says "Sure, sounds good" means nothing without the thread. Make sure thread_aware is enabled (it is by default in OpenClaw, but I've seen people turn it off to "save tokens" and then wonder why classification is garbage).
Not setting rate limits. The max_actions_per_hour setting exists for a reason. If something goes wrong, you want the blast radius to be limited.
Is This Actually Worth It?
Let me give you the math. If you spend 30 minutes a day on email triage β reading subject lines, deciding what matters, archiving junk, flagging things to reply to later β that's 2.5 hours a week. Over a year, that's about 130 hours, or more than three full work weeks, spent on deciding what to do with email rather than actually doing anything.
An OpenClaw email triage agent, properly configured, handles about 80-90% of that automatically. The remaining 10-20% β the genuinely ambiguous, high-stakes emails β get routed to your review queue with a draft response and the agent's reasoning. You spend 5 minutes a day instead of 30.
The setup takes an afternoon. Maybe a full day if you're fighting with OAuth. If you use Felix's OpenClaw Starter Pack, you'll cut that setup time roughly in half since the common configuration headaches are already solved.
Next Steps
Here's what I'd do this week:
-
Grab the Starter Pack. Felix's OpenClaw Starter Pack has the email triage workflow as a ready-to-customize template. It's the fastest path from "I want this" to "I have this."
-
Set up Ollama locally. Install it, pull
llama3.1:70b(orllama3.1:8bif your machine can't handle 70B), and make sure it's running. -
Connect your email. Follow the OAuth setup. Swear a couple times. Get through it.
-
Write your rules. Start simple. Four categories. Conservative permissions. Dry-run mode on.
-
Run it for two days in dry-run. Review the logs every evening. Adjust rules where the agent got it wrong.
-
Go live with auto-archive and auto-label only. Keep auto-reply gated behind human approval for at least two more weeks.
-
Gradually expand permissions as you build trust in the system.
The whole point of this approach is that you're in control the entire time. The agent earns trust incrementally. It starts as a sorter, graduates to a drafter, and eventually handles the routine stuff end-to-end while you focus on the emails that actually need a human brain.
Your inbox has been running your schedule long enough. Time to flip that around.