Automate Failed Login Analysis: Build an AI Agent That Investigates B…

Every security team has the same dirty secret: a huge chunk of analyst time goes to staring at failed login alerts that turn out to be nothing. Someone fat-fingered their password three times. A contractor forgot they changed credentials last week. An old service account is hammering a deprecated endpoint.

Mixed in with all that noise are the actual attacks — credential stuffing campaigns, brute force attempts, password spraying from residential proxies designed to look benign. And your analysts have to treat every alert like it could be the real thing, because the one time they don't is the time it is.

This is a textbook automation problem. Not "automate everything and fire the humans" automation — the kind where you take 80% of the repetitive, soul-crushing triage work off your analysts' plates so they can focus on the cases that actually require a brain.

Here's how to build an AI agent on OpenClaw that investigates brute force and failed login attempts, enriches them with context, makes initial risk assessments, and either resolves them automatically or hands them to a human with a full briefing already done.

The Manual Workflow Today

Let's be honest about what actually happens when a failed login alert fires in most organizations.

Step 1: Alert receipt (1–3 minutes). Your SIEM — Splunk, Sentinel, Elastic, whatever — fires a notification. "50 failed logins for user jsmith@company.com in 5 minutes." An analyst picks it up from the queue, assuming they're not already buried in twelve other alerts.

Step 2: Log aggregation (5–15 minutes). The analyst pulls authentication logs from Active Directory, your identity provider (Okta, Entra ID), application logs, maybe WAF or CDN logs if the attempts came through a web app. These are often in different systems with different query languages and different retention policies. Copy-paste between tabs. Export CSVs. The glamorous life of security operations.

Step 3: IP and threat intelligence enrichment (5–10 minutes). For each source IP, check AbuseIPDB, VirusTotal, maybe Shodan. Look up geolocation, ASN, whether it's a known VPN or Tor exit node or residential proxy. Cross-reference with any commercial threat intel feeds you're paying for.

Step 4: User and entity context (5–10 minutes). Who is jsmith? What department? What's their normal login pattern? Have they recently reset their password? Are they traveling? Do they have MFA enabled? What devices do they normally use? Pull this from your identity provider, HR system, and endpoint management platform.

Step 5: Correlation (5–15 minutes). Are other accounts being targeted from the same IP? Has jsmith successfully logged in from a different IP recently (possible compromise)? Any password reset requests? Any lateral movement after a successful login? Any data exfiltration indicators?

Step 6: Risk assessment (3–5 minutes). Based on everything above, the analyst decides: benign user error, automated attack against this specific account, broad credential stuffing campaign, or something weirder that needs deeper investigation.

Step 7: Response (5–10 minutes). Depending on the assessment — block the IP at the firewall, force a password reset, temporarily suspend the account, require step-up MFA, notify the user, or escalate to incident response.

Step 8: Documentation (5–15 minutes). Update the case in ServiceNow or Jira. Write a summary. Attach evidence. Close the ticket or escalate it. This is the part everyone skips when they're drowning, which means your compliance posture slowly degrades.

Total time per alert: 15–75 minutes. That's for one alert. A mid-sized enterprise might generate dozens to hundreds of these daily. A large enterprise? Thousands.

A 2023 Exabeam survey found that 68% of security teams still manually investigate most risky login events, even when they have automation tools available. Ponemon and Splunk research puts it at roughly 20,000 analyst hours per year spent on false positive investigations in mid-to-large enterprises.

That's not a security problem. That's a labor economics problem.

Why This Hurts

The costs here are concrete and compounding.

Direct labor cost. A security analyst in the US costs $90K–$140K fully loaded. If 25% of their time goes to failed login triage, you're spending $22K–$35K per analyst per year on work that's mostly mechanical. Scale that across a team of 8–15 analysts and you're looking at $175K–$500K+ annually on what's essentially pattern matching and data lookup.

Alert fatigue kills detection. When analysts see hundreds of benign alerts, they start skimming. They stop pulling the full context. They close alerts faster with less investigation. This is how real attacks get missed. IBM's Cost of a Data Breach report consistently finds that identity-related breaches take 243–258 days to identify and contain. That number isn't because the technology can't detect faster — it's because humans are overwhelmed.

Error rates increase with volume. Manual enrichment means manual mistakes. Wrong IP looked up. Context missed because the analyst didn't check the HR system. Correlation missed because the query was scoped too narrowly. Every step done by hand is a step that can be done wrong.

Response delays compound risk. If it takes 45 minutes to investigate and respond to a credential stuffing campaign, the attacker has 45 minutes of runway. In that window, they may have already found valid credentials, logged in, and started exfiltrating data. Speed of response directly correlates with blast radius.

Documentation debt. When analysts are underwater, documentation is the first thing cut. Six months later, when compliance needs evidence of your detection and response capability, you're reconstructing investigations from memory and fragmented logs.

What AI Can Handle Right Now

Not everything. But most of the mechanical steps. Here's what's realistic today — no hand-waving, no "AI will magically solve security" nonsense:

Automated log aggregation and normalization. An AI agent can query multiple log sources, normalize the data into a consistent schema, and build a timeline of events. This is deterministic work that doesn't require judgment — it requires speed and consistency.

IP enrichment at scale. Checking IPs against threat intelligence feeds, geolocating them, identifying VPNs/proxies/Tor, and checking ASN reputation — all API calls that an agent can execute in parallel in seconds instead of the minutes it takes an analyst clicking through tabs.

User context assembly. Pulling user role, department, MFA status, recent password changes, normal login patterns, and device history from identity and HR systems. Again, API calls. An agent can compile a complete user profile in seconds.

Pattern recognition across events. Is this one user getting brute-forced, or are 200 accounts being sprayed from the same IP range? Is the source IP part of a known botnet? Does the timing pattern match credential stuffing (rapid sequential attempts across many accounts) versus a user who just can't type their password? An AI agent can identify these patterns across millions of events in ways that are impossible for a human working in a SIEM query window.

Risk scoring and classification. Based on all the enriched context, an agent can classify events into categories — benign (user error), low risk (known scanner, already blocked), medium risk (suspicious but not conclusive), high risk (active attack campaign or likely compromise) — with a confidence score and full reasoning chain.

Automated response for low-to-medium risk events. For clear-cut cases: auto-block the IP, require step-up MFA on next login, send the user a "was this you?" notification, or simply close the alert with full documentation. No human needed.

Investigation summaries. For events that do need human review, the agent can present a complete, structured briefing — timeline, enrichment results, risk assessment, recommended actions — so the analyst starts at "make a decision" instead of "start investigating."

Step-by-Step: Building the Agent on OpenClaw

Here's how to actually build this. The examples below use OpenClaw as the agent platform, pulling from pre-built components available on Claw Mart and custom tool integrations.

Step 1: Define Your Data Sources and Integrations

Before writing any agent logic, map out every system the agent needs to query:

Authentication logs: Your SIEM (Splunk, Sentinel, Elastic) or identity provider (Okta, Entra ID) API
Threat intelligence: AbuseIPDB, VirusTotal, GreyNoise, or your commercial feed
User directory: Okta, Entra ID, or your HR system API
Endpoint context: CrowdStrike, SentinelOne, or your EDR's API
Ticketing: ServiceNow, Jira, or PagerDuty for case creation and escalation

In OpenClaw, each of these becomes a tool that your agent can invoke. You define the tool interface — what inputs it needs, what outputs it returns — and the agent decides when and how to use them based on the investigation context.

tools:
  - name: query_siem_failed_logins
    description: "Query SIEM for failed login events by user, IP, or time range"
    parameters:
      username: string (optional)
      source_ip: string (optional)
      time_range: string (default: "last_1h")
    returns: list of authentication events with timestamps, IPs, user agents, status codes

  - name: enrich_ip
    description: "Get threat intel, geolocation, ASN, and proxy/VPN status for an IP"
    parameters:
      ip_address: string
    returns: reputation_score, geolocation, asn, is_vpn, is_tor, is_proxy, abuse_reports

  - name: get_user_context
    description: "Get user profile, MFA status, recent activity, and baseline behavior"
    parameters:
      username: string
    returns: department, role, mfa_enabled, last_password_change, normal_login_locations, devices

  - name: check_related_events
    description: "Check for related activity - other targeted accounts, successful logins, lateral movement"
    parameters:
      source_ip: string (optional)
      username: string (optional)
      time_range: string
    returns: related_events list with classifications

  - name: block_ip
    description: "Add IP to firewall blocklist"
    parameters:
      ip_address: string
      duration: string
      reason: string

  - name: require_step_up_auth
    description: "Force MFA challenge on next login for user"
    parameters:
      username: string
      reason: string

  - name: create_security_ticket
    description: "Create incident ticket with investigation summary"
    parameters:
      severity: string
      summary: string
      evidence: object
      recommended_actions: list

Step 2: Build the Investigation Agent

The core agent is an OpenClaw agent with a system prompt that defines its role, investigation methodology, and decision framework. This isn't a simple rule engine — it's an agent that reasons about which tools to use and in what order based on the specific alert.

from openclaw import Agent, Tool, Trigger

failed_login_agent = Agent(
    name="failed_login_investigator",
    model="openclaw-reasoning-v2",
    system_prompt="""
    You are a security analyst agent investigating failed login alerts. 
    
    For each alert, follow this methodology:
    
    1. GATHER: Query SIEM for the full scope of failed login events related 
       to this alert. Identify all source IPs and all targeted accounts.
    
    2. ENRICH: For each unique source IP, get threat intelligence, geolocation, 
       and proxy/VPN status. For each targeted user, get their profile and 
       baseline behavior.
    
    3. CORRELATE: Check for related events - are other accounts targeted from 
       the same source? Has the user successfully logged in from other locations? 
       Any post-authentication suspicious activity?
    
    4. CLASSIFY: Based on all evidence, classify the event:
       - BENIGN: User error, known service account issue, expected behavior
       - LOW_RISK: Automated scanning from known bad IP, no successful auth
       - MEDIUM_RISK: Targeted attack on specific account(s), suspicious patterns,
         but no successful compromise detected
       - HIGH_RISK: Likely credential compromise, successful auth from attacker IP,
         high-value target, or novel attack pattern
       - CRITICAL: Confirmed compromise with post-auth malicious activity
    
    5. RESPOND:
       - BENIGN/LOW_RISK: Auto-close with documentation. Block IP if malicious.
       - MEDIUM_RISK: Block IP, require step-up auth for user, create ticket 
         for analyst review within 4 hours.
       - HIGH_RISK/CRITICAL: Block IP, suspend account, create urgent ticket, 
         page on-call analyst immediately.
    
    Always provide your full reasoning chain. Never guess — if you lack data 
    to make a classification, escalate to a human with what you've found so far.
    
    For high-value accounts (executives, domain admins, finance, engineering leads), 
    always escalate to human review regardless of classification.
    """,
    tools=[
        query_siem_failed_logins,
        enrich_ip,
        get_user_context,
        check_related_events,
        block_ip,
        require_step_up_auth,
        create_security_ticket,
        page_oncall_analyst,
        close_alert_with_documentation
    ]
)

Step 3: Set Up the Trigger

The agent needs to activate when a failed login alert fires. This is typically a webhook from your SIEM or a polling mechanism against your alert queue.

trigger = Trigger(
    name="siem_failed_login_alert",
    type="webhook",
    endpoint="/alerts/failed-login",
    payload_schema={
        "alert_id": "string",
        "username": "string",
        "source_ip": "string",
        "failed_count": "integer",
        "time_window": "string",
        "siem_link": "string"
    }
)

failed_login_agent.attach_trigger(trigger)

In your SIEM, configure an alert rule that fires when failed logins exceed your threshold (e.g., 10+ failures for a single account in 5 minutes, or 3+ accounts targeted from the same IP in 10 minutes) and sends a webhook to your OpenClaw endpoint.

Step 4: Add Guardrails and Escalation Logic

This is where you keep the agent from doing something stupid. OpenClaw's guardrail framework lets you define hard constraints:

from openclaw import Guardrail

# Never auto-suspend executive or admin accounts
failed_login_agent.add_guardrail(
    Guardrail(
        name="protect_high_value_accounts",
        rule="If user role is in ['executive', 'domain_admin', 'finance_admin', 'engineering_lead'], "
             "never take automated destructive actions (suspend, force reset). Always escalate to human.",
        enforcement="hard_block"
    )
)

# Rate limit automated IP blocks to prevent self-DoS
failed_login_agent.add_guardrail(
    Guardrail(
        name="ip_block_rate_limit",
        rule="Do not block more than 50 unique IPs in a 10-minute window without human approval. "
             "If threshold reached, pause and escalate.",
        enforcement="hard_block"
    )
)

# Require confidence threshold for automated responses
failed_login_agent.add_guardrail(
    Guardrail(
        name="confidence_threshold",
        rule="Only take automated response actions (block, suspend, reset) when classification "
             "confidence is above 85%. Below that, escalate with findings.",
        enforcement="hard_block"
    )
)

These guardrails are non-negotiable constraints. The agent cannot override them regardless of its reasoning. This is critical — you don't want an AI agent auto-suspending your CEO's account because they were logging in from a hotel in Tokyo.

Step 5: Deploy and Test with Historical Data

Before going live, run the agent against historical alerts. OpenClaw's replay mode lets you feed past alerts through the agent and compare its classifications and actions against what your analysts actually did.

from openclaw import Replay

replay = Replay(
    agent=failed_login_agent,
    dataset="historical_failed_login_alerts_q4_2024.json",
    mode="dry_run"  # No actual actions taken
)

results = replay.run()

# Compare agent classifications vs. analyst classifications
print(results.confusion_matrix())
print(results.average_investigation_time())
print(results.escalation_rate())
print(results.false_positive_rate())

You're looking for:

Classification accuracy above 90% compared to analyst decisions
Escalation rate between 10–25% (if it's escalating everything, it's not helping; if it's escalating nothing, it's probably missing things)
Zero false negatives on high-risk and critical events (it should never auto-close something that an analyst flagged as a real attack)

Tune the system prompt, thresholds, and guardrails based on replay results. Iterate until you're confident.

Step 6: Go Live with Human-in-the-Loop

Start in "supervised" mode where the agent runs its full investigation but requires analyst approval before taking any response actions. This lets you validate its reasoning in production before giving it autonomy.

failed_login_agent.deploy(
    mode="supervised",  # Agent investigates and recommends, human approves actions
    auto_approve_classifications=["BENIGN", "LOW_RISK"],  # Auto-close obvious noise
    require_approval_for=["MEDIUM_RISK", "HIGH_RISK", "CRITICAL"]
)

After 2–4 weeks of supervised operation with good results, expand the auto-approve scope to include medium-risk automated responses (IP blocks, step-up auth). Keep high-risk and critical always requiring human approval.

Step 7: Leverage Claw Mart for Pre-Built Components

You don't have to build every tool integration from scratch. Claw Mart has pre-built tool packages for common security operations integrations:

SIEM connectors (Splunk, Sentinel, Elastic) with pre-built query templates for authentication events
Threat intel enrichment bundles that parallelize lookups across multiple feeds
Identity provider integrations (Okta, Entra ID) with user context and response action capabilities
Ticketing integrations with investigation summary templates

Browse Claw Mart, install the packages you need, and wire them into your agent. It's the difference between spending two weeks building API integrations and spending an afternoon configuring them.

What Still Needs a Human

Let's be clear about the limits. AI agents don't replace security analysts — they replace the mechanical parts of security analysis. Humans are still essential for:

High-value account decisions. When the CFO's account shows anomalous login patterns, you need a human making the call on whether to suspend it. The blast radius of getting that wrong is too high for automation.

Novel attack patterns. AI agents classify based on patterns they've been trained to recognize. A genuinely new attack technique — something your agent hasn't seen before — will likely get classified as "ambiguous" and escalated. That's the correct behavior. Your senior analysts are the ones who identify new TTPs.

Ambiguous context. Executive traveling internationally and logging in from an unusual country. Developer working at 3 AM from a coffee shop. New employee who hasn't established a behavioral baseline yet. These cases require judgment that considers organizational context an AI agent doesn't have.

Response decisions with business impact. Blocking a /16 IP range that's also used by a major customer. Suspending an account that runs a critical automated process. Anything where the response itself could cause an outage or business disruption needs human oversight.

Compliance and legal decisions. When an investigation reveals insider threat indicators, the path forward involves HR, legal, and potentially law enforcement. AI has no business making those calls.

Root cause analysis. After a confirmed compromise, the deep forensic investigation — understanding exactly what happened, how the attacker got in, what they accessed, what needs to be remediated — is human work.

The right mental model: your AI agent is a highly competent junior analyst who works 24/7, never gets tired, never skims an alert, and always follows the investigation playbook. But they know their limits and escalate to senior analysts when things get complicated.

Expected Time and Cost Savings

Based on published data from organizations using similar automation (and being conservative with estimates):

Investigation time reduction. From 15–75 minutes per alert down to under 2 minutes for automated cases. That's a 90–95% reduction in time for the 70–80% of alerts that are benign or low-risk. For escalated cases, the analyst starts with a complete briefing instead of a raw alert, cutting their investigation time by 40–60%.

Analyst capacity. If a team of 10 analysts currently spends 25% of their time on failed login triage, that's 2.5 FTE equivalent. Automating 80% of that work frees up 2 FTE worth of capacity — either to handle a growing alert volume without hiring, or to focus on proactive threat hunting and higher-value security work.

Dollar savings. At $120K fully loaded per analyst, 2 FTE of recovered capacity is $240K annually. Against an OpenClaw subscription and the engineering time to build and maintain the agent (estimate 2–4 weeks of initial build, ongoing tuning), the ROI is typically positive within the first quarter.

Mean time to respond. From 45+ minutes to under 5 minutes for automated cases. For credential stuffing campaigns, that's the difference between the attacker having 45 minutes of runway and having essentially none.

Documentation quality. Every investigation is fully documented automatically — timeline, enrichment results, classification reasoning, actions taken. Compliance teams love this. You go from patchy, inconsistent case notes to complete, standardized investigation records for every single alert.

Coverage. Your AI agent works nights, weekends, and holidays. It doesn't call in sick. It investigates the 3 AM Sunday alert with the same thoroughness as the Tuesday 10 AM alert. For many organizations, this is the biggest practical improvement — consistent coverage during off-hours when attacks disproportionately occur.

Where to Start

Don't try to build the full system in one shot. Start with the highest-volume, lowest-risk automation:

Week 1–2: Build the enrichment pipeline. Get the agent pulling logs, enriching IPs, and assembling user context. No automated responses yet — just structured summaries delivered to analysts.
Week 3–4: Add classification logic. Let the agent classify events and compare its assessments against analyst decisions. Tune accuracy.
Week 5–6: Enable automated responses for benign and low-risk events (auto-close with documentation, auto-block obviously malicious IPs). Keep everything else in supervised mode.
Ongoing: Gradually expand automation scope based on confidence. Add new data sources and enrichment tools as you identify gaps.

The hardest part isn't the AI. It's getting clean API access to all your data sources and getting stakeholder buy-in to let an agent take automated response actions. Start those conversations early.

If you want to skip the infrastructure work and get to the interesting part faster, check out the security operations packages on Claw Mart — pre-built integrations, investigation playbook templates, and response action modules that plug directly into OpenClaw agents.

Ready to stop your analysts from drowning in failed login noise? Start building on OpenClaw, or hire a Clawsourcer to build your security automation agent for you. Most teams have a working prototype in under two weeks.

Automate Failed Login Analysis: Build an AI Agent That Investigates Brute Force Attempts