Agentic Red Teams Are Here

Autonomous Vulnerability Discovery Ushers in a New Security Paradigm

AI is transforming cybersecurity, transitioning from speculative research to impactful, practical applications. Yet, despite notable advancements in AI-driven auto-remediation and design-phase security solutions, significant gaps remain—particularly in autonomously identifying, confirming, and exploiting unknown vulnerabilities. Recent breakthroughs indicate that agentic red teams, AI-powered multi-agent systems capable of offensive security testing, have arrived as a timely and necessary innovation to fill this critical gap in the cybersecurity landscape.

written by
Mahesh Babu
published on
April 1, 2025
topic
Application Security

Current Market Landscape: Overcrowded and Unbalanced

In 2024–2025, AI-focused application security solutions proliferated, particularly around design-phase security reviews and automated vulnerability remediation. Products such as Prime Security’s design-phase threat modeling platform and Mobb and Pixee’s auto-remediation engines have made significant inroads by integrating large language models (LLMs) to proactively address vulnerabilities before deployment or to quickly patch issues afterward (The Hacker News, 2024; Medium, 2024). However, as valuable as these solutions have been, the market is notably skewed toward either proactive design-phase identification or reactive automated fixes for well-known vulnerabilities. The critical step in between - autonomous discovery, triage, and confirmation of unknown, exploitable vulnerabilities - remains underaddressed.

This imbalance creates a critical unmet need: while organizations are becoming adept at addressing vulnerabilities flagged by traditional static and dynamic scanners, fewer effective automated tools exist to discover unknown vulnerabilities in complex, real-world software environments. Thus, despite heavy investments in AI-enabled secure coding practices, organizations remain vulnerable to attacks exploiting unknown (zero-day) flaws.

Autonomous Vulnerability Discovery: Academic Proof-of-Concept

Addressing this gap, groundbreaking academic research by Fang et al. (2024) at the University of Illinois Urbana-Champaign demonstrated a practical AI-driven approach to autonomous vulnerability discovery. Their research introduced HPTSA (Hierarchical Planning for Targeted Security Assessments), a hierarchical multi-agent LLM framework composed of specialized agents—including exploration planners, exploitation strategists, and vulnerability triage agents—working collaboratively to autonomously discover and exploit previously unknown vulnerabilities (Fang et al., 2024).

When tested against a benchmark of 15 real-world web application vulnerabilities (CVEs) with no prior knowledge or vulnerability hints provided, HPTSA autonomously discovered and successfully exploited over half of them—a remarkable achievement compared to single-agent baseline approaches, which had negligible success under similar conditions (Fang et al., 2024). This empirical validation underscores both the offensive potential of coordinated LLM-driven multi-agent systems and their efficacy in discovering previously unknown vulnerabilities that conventional tools and single-agent approaches consistently miss.

Google's Big Sleep Agent: Real-World AI Discovery of a Zero-Day

Confirming that the theoretical promise of academic research can translate into impactful real-world outcomes, Google’s Project Zero, in collaboration with DeepMind, revealed their LLM-driven fuzzing agent, Big Sleep, in late 2024. Big Sleep autonomously discovered a critical memory safety vulnerability—a buffer underflow—in the widely deployed SQLite database. Google identified this finding as the first publicly reported real-world zero-day discovered entirely by an AI agent, marking a significant milestone in cybersecurity (The Hacker News, 2024).

Google emphasized that Big Sleep, leveraging generative agents to intelligently explore codebases, identified a previously overlooked vulnerability that had existed undetected for over two decades. Such capability underscores AI’s potential to enhance traditional vulnerability discovery by autonomously exploring complex code patterns beyond the practical scope of manual review or traditional automated scans. This event represents a turning point, heralding a new era in cybersecurity where AI-driven bug hunting could dramatically enhance and accelerate vulnerability research efforts (Medium, 2024).

The Strategic Value of AI-Powered Red Teams: Filling the Critical Gap

The emergence of autonomous vulnerability discovery systems, exemplified by academic prototypes like HPTSA and real-world applications like Google’s Big Sleep agent, signals the rise of a new cybersecurity paradigm: agentic red teaming. Unlike the crowded market of design-stage or auto-remediation solutions, agentic red teams offer capabilities largely unmet by current industry tools—specifically the ability to autonomously:

  • Discover unknown vulnerabilities in complex, realistic software environments without prior vulnerability signatures or heuristic guidance.
  • Confirm exploitability through autonomous testing, rather than relying solely on static analysis or heuristic alerts.
  • Coordinate multiple specialized agents—such as explorers, exploit strategists, and triage specialists—to systematically approach complex security tasks beyond single-agent capabilities.

These unique strengths position AI-driven red teams as a strategic necessity, addressing gaps in the current AppSec market that traditional security tools and existing AI applications do not fully satisfy.

Implications and the Path Forward

The implications of this paradigm shift are profound for both security researchers and industry practitioners. While AI-based design-phase security and automated patching solutions help reduce known vulnerabilities and streamline fixes, organizations remain exposed without effective autonomous vulnerability discovery capabilities. Agentic red teams now offer a compelling solution to close this gap, enabling continuous, autonomous security assessment capable of discovering, verifying, and exploiting vulnerabilities proactively.

As cybersecurity threats grow increasingly sophisticated, organizations must recognize that relying solely on existing AI-driven auto-remediation or design-focused security strategies is insufficient. Integrating autonomous vulnerability discovery systems powered by multi-agent AI will be crucial for effectively mitigating zero-day risks and enhancing overall resilience against advanced threats.

Agentic Red Teams: An Emerging Cybersecurity Necessity

The research by Fang et al. (2024) and Google's practical success with Big Sleep signal the arrival of agentic red teams as both feasible and necessary cybersecurity innovations. While significant investments have flooded into AI-powered vulnerability remediation and secure design solutions, agentic autonomous vulnerability discovery remains a critical yet underserved area. These pioneering efforts provide clear proof-of-concept that multi-agent AI frameworks can—and inevitably will—play a fundamental role in the future of cybersecurity.

Agentic red teams represent not just a novel research direction, but a strategic imperative for the industry. By adopting multi-agent AI frameworks capable of autonomously discovering and verifying unknown vulnerabilities, organizations will be better equipped to defend against increasingly complex cyber threats. The future of cybersecurity is not just AI-enhanced defense but also AI-driven proactive offense, one where autonomous agents actively uncover vulnerabilities before attackers do. That future is no longer hypothetical; it's here, and security teams must prepare accordingly.

References

Fang, X., Liu, H., Chen, Z., & Bailey, M. (2024). Teams of LLM agents can exploit zero-day vulnerabilities. University of Illinois Urbana-Champaign. Retrieved from https://medium.com

The Hacker News. (2024). Google unveils first AI-discovered zero-day vulnerability in SQLite. Retrieved from https://thehackernews.com

Medium. (2024). Google’s Big Sleep agent discovers critical SQLite vulnerability. Retrieved from https://medium.com

Blog written by

Mahesh Babu

Head of Marketing

A Primer on Runtime Intelligence

See how Kodem's cutting-edge sensor technology revolutionizes application monitoring at the kernel level.

5.1k
Applications covered
1.1m
False positives eliminated
4.8k
Triage hours reduced

Platform Overview Video

Watch our short platform overview video to see how Kodem discovers real security risks in your code at runtime.

5.1k
Applications covered
1.1m
False positives eliminated
4.8k
Triage hours reduced

The State of the Application Security Workflow

This report aims to equip readers with actionable insights that can help future-proof their security programs. Kodem, the publisher of this report, purpose built a platform that bridges these gaps by unifying shift-left strategies with runtime monitoring and protection.

Get real-time insights across the full stack…code, containers, OS, and memory

Watch how Kodem’s runtime security platform detects and blocks attacks before they cause damage. No guesswork. Just precise, automated protection.