AI-Powered Phishing Outperforms Elite Red Teams in 2025

AI agents can now out-phish elite human red teams, at scale. In an ongoing AI Spear Phishing Agent experiment from 2023 to 2025, AI’s performance vs. humans improved by 55%. Advances in AI are simultaneously disrupting the social engineering landscape and the cybersecurity training category. The co-evolution of attacks and protections must be considered when evaluating the rising threat of blackhat generative AI, and how to defend against it.

Post hero image

Table of contents

See Hoxhunt in action
Drastically improve your security awareness & phishing training metrics while automating the training lifecycle.
Get a Demo
Updated
April 3, 2025
Fact checked by

Overview: The rise of AI spear phishing agents

AI reached its Skynet Moment for social engineering in March, 2025.  AI agents developed by Hoxhunt for the first time in over 2 years of testing created more effective simulated phishing campaigns against millions of global users than our elite human red teams could.

  • In 2023, AI was 31% less effective than humans
  • In Nov. 2024, AI was 10% less effective than humans
  • In March 2025, AI was 24% more effective than humans

Our ongoing experiments have tracked the effectiveness of AI since 2023. Threat actors have been developing blackhat generative AI tools as well, but in the shadows. AI-powered phishing-as-a-service kits exist, but they are not yet as widely adopted or as effective for mass campaigns as perhaps believed.

  • “Only” .7-4.7% of phish that bypass email filters were written by AI in 2024, according to Hoxhunt research.
  • 4,151% increase in total phishing volume since advent of ChatGPT in 2022, according to Slashnext.
  • 49% increase in phishing attacks that bypass email filters since 2022, according to Hoxhunt research.

This public finding could be considered an inflection point for the threat landscape. AI’s superiority in social engineering will transform cybersecurity risks, attacks, and defenses.

Advances in AI Large Language Models are simultaneously disrupting the social engineering landscape and the cybersecurity training category. The co-evolution of attacks and protections must be considered when evaluating the rising threat of blackhat generative AI applications.

The rise of AI will accelerate the retirement of compliance-based SAT tools. They are (wisely) being replaced by adaptive phishing training and human risk management platforms, which measurably change behavior and integrate human threat intelligence into the security stack.

Background: AI phishing agents from 2023 to 2025

In the early stages of this research, AI was tested against human red teams based on who could create a more effective phishing attack from the same prompt. The single-prompt AI experiment was fundamentally different from the AI agents created later in our research.

In 2023, human red teams outperformed AI with a failure rate of 4.2% for humans versus 2.9% for AI.

That research also showed that behavior-based training conferred clear protection from both AI and human-generated attacks, with especially pronounced protection against AI.

That finding persists. Behavior-based training continues to work against AI spear phishing agents, while compliance-based SAT falls short.

With the introduction of advanced large language models, our continuous refinement of AI agents began tipping the scales from 2024 to 2025.

In 2024, AI agents began tricking more novice users with the better-written emails. Meanwhile, human-generated attacks were much more effective than AI against users with more than 6 months of training.

By February/March 2025,  AI surpassed human red teams across the spectrum of user skill levels.

From 2023 to 2025, AI’s phishing performance relative to elite human red teams improved by 55%.

The threat landscape has changed.

AI AGENT vs HUMAN RED TEAM FAIL RATES

Failure Rate Table
2023 Failure rate Nov 2024 Failure rate March 2025 Failure rate Total % change
AI 2.9% * 2.1% 2.78%
Human 4.2% 2.3% 2.25%
AI to Human
relative performance
-31% less effective than humans -10% less effective than humans +23.8% more effective than humans +55% improvement
Human to AI
relative performance
+44.8% more effective than AI +10.8% more effective than AI -19% less effective than AI -142% decline

The absolute failure rate metrics are less informative than the relative performance between the Hoxhunt Spear Phishing Agent (codenamed JKR) AI and human red teams. The experimental methodology changed from ChatGPT 3.5 AI prompts in 2023 to sophisticated AI agents in 2024. Thus, failure rates from 2023 do not give an apples to apples comparative reference point for the AI Spear Phishing Agent in 2025.

But the acceleration in AI Spear Phishing Agents' effectiveness in the 3 months between Nov. 2024 to Feb. 2025 has been eye-opening.

RESULTS

AI Single-Prompt in March 2023: Inferior to Human Red Teams

  • 4.2% Failure Rate (Human)
  • 2.9% (AI)
  • Humans performed 44.8% more effectively in a benchmark test

AI Spear Phishing Agent Failure Rate in November 2024: Approaches Human Red Teams

  • 70,000 AI Agent-created simulations sent
  • 2.3% Failure Rate (Human)
  • 2.1% (AI)
  • Humans performed 10.8% more effectively in total
  • AI was better at fooling novices, but less effective at duping skilled users

AI Spear Phishing Agent Failure Rate in March 2025: Surpasses Human Red Teams

  • 70,000 AI Agent-created simulations sent
  • 2.25% (Human)
  • 2.78% (AI)
  • AI performed 23.3% more effectively
  • AI was equally or more effective at duping users across all skill levels

METHODOLOGY

2023 Study

  • Control Group: A global user base of over 2.5 million people.
  • Task: Human red teams and AI agents were tasked with crafting phishing attacks based on the same single prompt
  • Evaluation: Failure rates were tracked to measure the effectiveness of the phishing attempts

2024 - 2025 Study Methodology: AI Spear Phishing Agent

70,000 simulations were sent by the AI Spear Phishing Agent in both the Nov. 2024 and the March 2025 studies.

1) AI Spear Phishing Agent Design (JKR):
The AI agent used in this study was internally called JKR, and it was designed to perform two distinct tasks:

  1. Task 1 (Novel Phishing Creation): The AI was given user-specific context (role, country, etc.) and told to create a novel phishing attack that maximized the likelihood of the user clicking the phishing link.
  2. Task 2 (Human-Generated Attack Enhancement): The AI was also tasked with improving an existing human-created phishing attack, making it more effective.

2) Continuous Improvement of the AI:

  • The AI’s effectiveness improved continuously through a process called "Evolves," which involved refining the AI's model and prompts over time. This ongoing process allowed the AI to generate more sophisticated phishing attacks.
  • As the AI models and prompts evolved, the attacks became more sophisticated and harder to detect, which improved the AI's performance in social engineering tasks.

3) Prompts and Task Setup:

  • Both tasks used similar starting prompts.
  • For Task 1, the AI was instructed to create phishing attacks based on the context of the user.
  • For Task 2, it was asked to improve an attack created by humans, making it more targeted and effective.
  • These prompts were designed to guide the AI in generating realistic phishing attacks based on various user characteristics.

4) Experiment Setup:

  • The experiment involved a large set of users (2.5M) selected from Hoxhunt's platform, which has millions of enterprise users, providing a substantial sample size for the study.
  • Instead of using traditional attack libraries, the AI Spear Phishing Agent took over and generated phishing attacks that were sent to the users.

5) User Outcomes:

The users had three possible outcomes when receiving a phishing email:

  • Report the attack: Identifying and reporting the phishing attempt.
  • Miss the attack: Failing to recognize the phishing attempt.
  • Fall for the attack (Fail): Clicking the link in the phishing email.

6) Data Collection:

  • The primary metric was the failure rate, which was the percentage of users who clicked the phishing link. This data was collected and analyzed to assess the effectiveness of the phishing attempts.
  • Additionally, reporting rates were tracked to measure how quickly users recognized and reported phishing attacks.

7) Validation and Monitoring:

  • The project involved strong monitoring loops to validate the AI’s output. These loops ensured that the generated attacks adhered to strict ethical AI development guidelines.
  • There were some challenges in ensuring the AI created attacks within set boundaries, but these were addressed through validation processes to ensure that the attacks were appropriate and effective.

Conclusion: The Future of AI in Cybersecurity

The AI Spear Phishing Agents have improved 55% relative to human red teams from 2023 to Feb. 2025.

It’s no longer theoretical.  We’ve proven that AI agents can create superior spear phishing attacks at scale.

It is only a matter of time until AI agents disrupt the phishing landscape. For now, there are many anecdotal media accounts of highly targeted, sophisticated AI spear phishing attacks that leveraged AI. These are typically bespoke campaigns.

Soon, the phishing-as-a-service market will shift to mass adoption of AI Spear Phishing Agents.

Once that happens, the baseline quality and effectiveness of mass phishing campaigns will rise to a level we currently equate with targeted spear phishing attacks.

Fortunately, we have time to prepare. According to the Hoxhunt Phishing Trends Report 2025, under 5% of phish that bypassed email filters were written by AI. According to 2025 Mimecast research, 12% of phish (presumably those that are caught at the email filter level) are AI-written.

These figures seem somewhat low to some, considering other studies by SlashNext, which report that total phishing volume--all phish caught by email filters--have increased by 4,151% since the advent of ChatGPT in 2022.

But disruption happens gradually and then all at once, to paraphrase Clayton Christensen. We must be prepared for when the inevitable disruption to the phishing-as-a-service market occurs, as AI-generated phish become more effective, easier to adopt, and ultimately more lucrative for criminals.

Discussion: The Future of AI Spear Phishing Agents in Human Risk Management

The good news from our research is that there is still time to harden the human layer with adaptive phishing training. Behavior change programs can achieve extremely high levels of engagement and resilience with the use of AI Spear Phishing Agents. These Whitehat agents contain the same capabilities of automation and personalization as AI attacks, but are used for protective purposes.

Our data shows that adaptive phishing training programs change behavior and protect organizations against even the most advanced AI-generated attacks.

We are confident that the trend of AI phishing effectiveness will  accelerate with technological development and innovation.

AI is a sword that cuts both ways; to penetrate or to parry.

As AI technology continues to evolve, the ability to craft more sophisticated phishing attacks on-demand will only increase, making AI an essential tool in both offensive and defensive cybersecurity strategies.

Organizations should therefore integrate AI spear phishing agents into their security awareness training. The adaptive training platforms should plug real threat detection into their human risk management programs, and connect it to the SOC.

The integration and orchestration of human threat intelligence enables earlier detection and response to social engineering attacks that bypass filters, even zero-day phish.

[.c-cta-box][.c-cta-content][.c-title-wrapper][.c-title]AI SPEAR PHISHING AGENT & YOU[.c-title][.c-title-wrapper][.c-paragraph-wrapper][.c-paragraph]The Spear Phishing Agent discussed in this white paper is available now in Hoxhunt. The agent adds AI-powered hyper-personalized attacks to the automated phishing training program. By activating the agent, security leaders help their employees stay at the cutting edge of the the quickly evolving threat landscape.[.c-paragraph][.c-paragraph-wrapper][.c-button-wrapper][.c-button]LEARN MORE[.c-button][.c-button-wrapper][.c-cta-content][.c-cta-box]

Want to learn more?
Be sure to check out these articles recommended by the author:
Get more cybersecurity insights like this