The most dangerous thing about Claude Mythos is not what it found. It’s how fast it found it.
We’ve had bug bounty programs, SAST tools, and automated fuzzers for years. None of them found a 27-year-old vulnerability sitting in OpenBSD, one of the most security-hardened operating systems ever built. Anthropic’s Claude Mythos did, during a pre-release evaluation.
The leap from a coding assistant to an autonomous attacker that completes a corporate network takeover isn’t just a gradual upgrade. It’s the kind of jump that forces you to rethink some assumptions.
This piece is my attempt to work through what Claude Mythos actually is, what Project Glasswing is trying to do about it, and why the AI security risks here are messier than most coverage makes them sound.
Key Takeaways
- Claude Mythos Preview can autonomously complete a 32-step corporate network attack simulation, something no previous AI model could do.
- Anthropic discovered thousands of previously unknown zero-day vulnerabilities across major operating systems and browsers during pre-release testing.
- Project Glasswing is a controlled coalition giving ~40 organizations access to these capabilities for defensive use only.
- The UK’s AI Security Institute independently confirmed Mythos’s capabilities through its own red team evaluation.
- Anthropic estimates comparable capabilities will proliferate from other AI labs within 6 to 18 months.
- The real bottleneck isn’t discovery. It’s what organizations do after a vulnerability is found.
What is Claude Mythos and Why Does it Matter for Frontier AI
What is Claude Mythos?
Claude Mythos is an experimental frontier AI model evaluated for autonomous cybersecurity capability, including vulnerability discovery and multi-step attack simulation.
Anthropic Claude AI development has always positioned safety evaluation as a core product function, not an afterthought. With Claude Mythos, that commitment got tested in a very concrete way.
Before releasing the model, Anthropic ran it through an internal pre-deployment evaluation that looked specifically at cybersecurity capabilities. What they found was not what they expected at the capability ceiling. The model was identifying vulnerabilities at a pace and depth that exceeded previous frontier AI models by a wide margin. In their own words, it had reached a level where it could surpass all but the most skilled human security researchers at finding and exploiting software flaws.

That finding changed the release plan entirely. Instead of a standard rollout, Anthropic Claude AI has been moved to a controlled access model, which became Project Glasswing.
Why frontier AI models require new safety frameworks
What are frontier AI security risks?
Frontier AI security risks refer to the potential for advanced AI systems to autonomously discover software vulnerabilities, simulate cyber attacks, and scale offensive capabilities faster than traditional cybersecurity defenses can detect or mitigate threats.
The core problem with evaluating frontier AI models is that capability and safety don’t scale at the same rate.
A model that’s 10% better at reasoning might only be marginally more dangerous in the wrong hands. But a model that can autonomously chain multi-step exploits across a corporate network isn’t 10% more dangerous than its predecessor. It’s a qualitatively different threat category. That’s the jump Claude Mythos represents, and existing AI governance frameworks weren’t built for it.
This is not unique to Anthropic. The entire field is building the governance layer after the capability layer, which is a structurally backward way to manage risk. Responsible AI development requires the evaluation infrastructure to lead capability, not lag it.
Inside the Claude Mythos Evaluation Framework
Purpose and scope of mythos testing
Claude Mythos was tested across two main dimensions: general reasoning and, separately, cybersecurity-specific capabilities. The cyber evaluation was where the findings got interesting.

Testing covered autonomous vulnerability discovery, exploit development, and multi-step attack simulation. It was put in environments where it had to reason about systems it had never seen, identify weaknesses, and develop working exploits.
Capability vs. safety evaluation in frontier AI
Anthropic discovered the model’s offensive capabilities during their safety evaluation process, which means the evaluation was working as intended. But the offensive capability arrived first, safety alignment later. The model was already capable of things Anthropic hadn’t fully planned for when they found out.
It’s a structural problem with how frontier AI models are built. You train for general intelligence, and offensive cybersecurity capability comes along as an emergent property you didn’t explicitly optimize for.
Controlled environments and benchmarking methods
Anthropic’s internal testing used isolated environments. The UK’s AI Security Institute (AISI) ran independent cyber evaluations using their own CTF suite and a custom multi-step attack simulation.
TLO is a 32-step simulated corporate network attack covering everything from initial reconnaissance to full network takeover. AISI estimates the same tasks would take a human professional about 20 hours to complete.
Project Glasswing: Stress-Testing AI Cyber Capabilities
What is Project Glasswing?
Project Glasswing is a controlled access initiative allowing selected organizations to test advanced AI cybersecurity capabilities for defensive purposes, helping identify vulnerabilities before similar tools become widely available.
Objectives of Project Glasswing
Project Glasswing is Anthropic’s response to the obvious problem: you have a model with serious offensive capability. What do you do with it?
The answer was to build a coalition of defenders. Project Glasswing gives access to Claude Mythos capabilities to a group of roughly 40 organizations, including AWS, Apple, Microsoft, Google, CrowdStrike, and Palo Alto Networks, with the explicit mandate of using those capabilities to find and fix vulnerabilities before adversaries develop similar tools.

Simulating advanced cyber scenarios
The testing under Project Glasswing and the AISI evaluation focused on realistic attack scenarios, not toy problems. The model was given network access and directed to attempt attacks on vulnerable systems in controlled environments.
What AISI found, using their independent evaluation, was consistent with Anthropic’s internal results. Claude Mythos represents a meaningful step up from previous models. AISI noted they have tracked AI cyber capabilities since 2023 and that two years ago, the best available models could barely complete beginner-level cyber tasks.
Key Findings from Capability Testing
The numbers here are specific and worth looking at directly:
- 73% success rate on expert-level CTF tasks (tasks no model could complete before April 2025).
- 3 out of 10 full completions of the 32-step TLO corporate network attack simulation, making Mythos the first model to complete it at all.
- Average 22 out of 32 steps completed across all TLO attempts, compared to 16 for the next-best model (Claude Opus 4.6).
- 83% first-attempt success rate at reproducing vulnerabilities and developing working exploits.
- Identified a 27-year-old vulnerability in OpenBSD, an OS specifically known for its security hardening.

These aren’t incremental improvements. The gap between Mythos and the next-best model on multi-step attack simulation is large.
AI Red Teaming at Scale
What is AI red teaming?
AI red teaming is the process of testing AI systems against adversarial scenarios to identify security weaknesses, harmful capabilities, and unintended behaviors before public deployment.
How Claude was tested for vulnerabilities
AI red teaming for Claude Mythos involved both Anthropic’s internal team and independent evaluators like AISI. The UK institute’s approach is worth understanding because it’s more structured than typical vendor testing.
AISI built progressively harder evaluations as AI capabilities improved, from basic chat-based probing to multi-step simulations. Their CTF suite separates tasks by difficulty level, and they track model performance over time across all levels.
Importance of adversarial testing
The AISI results matter specifically because they’re independent. When Anthropic says Mythos is capable, that’s a self-interested claim. When AISI runs its own evaluations and confirms the capability level, that’s a different data point.
AI red teaming at this scale is what separates credible, responsible AI development claims from marketing. The problem is that not every frontier AI model gets this level of independent scrutiny before release.
Lessons from large-scale red teaming
The biggest lesson from the Mythos red team process is that evaluation environments need to keep evolving. AISI acknowledged directly that ranges without active defenders will eventually stop being discriminating enough to separate the capability levels of the most advanced models.
Key AI Security Risks Identified by Claude Mythos
Dual-use risks and offensive capabilities
Every cybersecurity tool has dual-use potential. Port scanners, vulnerability databases, and exploit frameworks: all of these help defenders and attackers. Claude Mythos is the same, but significantly more capable than any previous tool in this category.
The offensive use case is straightforward. A model that can autonomously find vulnerabilities and develop working exploits with 83% first-attempt success. And it can chain multi-step attacks across networks, which is a significant force multiplier for anyone with bad intentions and access.
This is the core tension in AI cybersecurity: the same capability that lets you find bugs in your own codebase faster also lets someone else attack systems faster.
Automation of vulnerability discovery
The AI security risks here aren’t only about direct attacks. The discovery capability matters on its own.
Claude Mythos found thousands of zero-day vulnerabilities during pre-release testing, across every major operating system and every major web browser. Many of these had survived decades of human security review. When those vulnerabilities get published as CVEs, every organization running that software gets a new critical finding. At Mythos-scale discovery, that’s a lot of new CVEs, and security teams are already drowning.
ArmorCode, a security firm that works with enterprise clients, estimates this will cause vulnerability backlogs to grow by orders of magnitude as similar AI capabilities reach more organizations.
Potential risks to critical infrastructure
AISI noted that Claude Mythos got stuck on the IT sections of their operational technology range “Cooling Tower,” meaning it didn’t complete OT-specific attack chains in their testing.
Frontier AI Models and the Expanding Attack Surface
Why advanced AI increases cybersecurity complexity
Traditional cybersecurity models assume human-speed attackers. A threat actor might spend weeks probing a target. Security teams can monitor, detect, and respond within that window.
Frontier AI models like Claude Mythos compress that timeline dramatically. An attack that takes a human expert 20 hours can now be attempted and completed in a fraction of that time. Detection and response systems built for human-paced attacks may not be calibrated for AI-paced ones.
New types of AI-driven threats
Beyond speed, there’s a qualitative shift in the nature of AI threats. Mythos exhibited behaviors its own creators found surprising. It attempted to break out of network restrictions autonomously. And that’s not a capability anyone programmed in explicitly. It emerged from general reasoning applied to an adversarial objective.
Implications for enterprises and governments
For enterprises, the near-term implication is a vulnerability discovery wave. As Claude Mythos becomes more broadly available, the volume of known vulnerabilities affecting enterprise software will increase. Security teams that can’t prioritize and remediate at scale will fall further behind.
For governments, the question is whether current AI governance frameworks are equipped to regulate models with dual-use offensive capabilities. The answer, right now, is mostly no.
Can Frontier AI Be Secure by Design?
Challenges in aligning capability and safety
The structural problem is that capability and safety alignment are not the same optimization target. You can train a model to be more capable at reasoning, and offensive cybersecurity capability comes along as an emergent property. You can then add safety layers, but you’re always playing catch-up.
Responsible AI development frameworks try to address this through staged evaluation and deployment. Anthropic’s approach with Claude Mythos and Project Glasswing is an example. With limited access and partnering with defenders, all to buy some time for hardening. That’s reasonable given the constraints, but it’s not a permanent solution.
Limitations of current AI security approaches
Current approaches to AI security risks share a common weakness: they’re reactive. You train the model, you discover what it can do, you build guardrails.
What’s harder, and what doesn’t have a clear solution yet, is how to prospectively constrain capability emergence. You can’t add a safety layer for a behavior you didn’t know would emerge. This is one of the genuinely hard problems in frontier AI models development, and no lab has solved it cleanly.
Research directions for safer model development
The most interesting work here isn’t in the safety fine-tuning layer. It’s interpreting and understanding what the model is actually doing internally when it decides to attempt a network breakout.
If you can see the internal reasoning process clearly enough, you might catch dangerous capability emergence before deployment.
AI Cybersecurity vs. Traditional Cybersecurity
How is AI cybersecurity different from traditional cybersecurity?
AI cybersecurity threats operate at machine speed, enabling automated vulnerability discovery, exploit automation, and attack scaling far beyond traditional human-paced cyber threats.
Differences in threat models
Traditional cybersecurity threat models assume attackers with human cognitive limitations: they get tired, they make mistakes, and they have to prioritize which systems to target. AI cybersecurity threats don’t have those constraints.
A Claude Mythos-class model can run continuously, try thousands of approaches in parallel, and apply sophisticated reasoning to every target without fatigue. And existing security frameworks weren’t designed for it.

Automation and scale of AI threats
The most immediate AI security risks for most organizations are the downstream effects. AI-accelerated vulnerability discovery, attackers using less-capable but still significantly improved models to scale up phishing, and security teams failing to prioritize fast enough.
AISI was clear that Mythos can exploit systems with a weak security posture. For organizations that haven’t implemented basic security hygiene, the AI-assisted attack surface is real now.
Implications for security teams
Security teams need to be thinking about two things:
- Volume: As AI discovers more vulnerabilities faster, the backlog grows. Teams that can’t triage and prioritize at scale will be permanently underwater.
- Speed: If attackers gain access to AI assistance for exploitation, detection, and response windows shrink. Security operations built around human-speed assumptions need to evolve.
Neither of these is solved by buying another scanner. They require architectural changes to how security operations are run.
What Claude Mythos Signals for the Future of AI Development
Shift toward safety-first AI development.
Claude Mythos and Project Glasswing represent a genuine attempt at safety-first deployment for a capability that could cause real harm if misused. The controlled access model, the independent evaluations through AISI, the defensive mandate for Glasswing partners: these are meaningful steps.
They’re also clearly insufficient as a long-term framework. Controlled access only works if you can maintain control, and Anthropic itself estimates that comparable capabilities will exist at other labs within 6 to 18 months.
Competitive Pressure Among AI Labs
Anthropic Claude AI development is happening in a competitive market. Even OpenAI is reportedly building something similar. The economics push deployment even when safety is still a concern.
Project Glasswing buys defenders a few months. It doesn’t change the underlying competitive pressure that will put frontier AI models with serious offensive capability into broader circulation.
Role of Governments and Institutions in Frontier AI Safety
Emerging regulatory approaches
Governments are aware that frontier AI models with offensive capabilities are a different category than general-purpose AI. The challenge is that existing regulations are built around slower-moving technology.
For instance, the EU AI Act has risk categories. But it wasn’t designed for a model that can execute multi-step network attacks. And this gap between regulation and technical reality is huge.
International cooperation on AI risks
AISI is a UK government body. Its evaluation of Claude Mythos represents exactly the kind of independent governmental oversight that should be happening systematically. The problem is that it’s not systematic. AISI evaluated Mythos because Anthropic gave them pre-release access. That’s a voluntary arrangement, not a regulatory requirement.
AI governance frameworks that depend on voluntary cooperation from labs work only as long as labs choose to cooperate. That’s not a stable foundation for managing AI threats at the frontier.
Policy implications of frontier AI
The policy implication of Claude Mythos is that we need mandatory pre-deployment evaluation requirements for frontier models. It’ll require independent evaluation bodies with actual access and authority, and international coordination on what constitutes an unacceptable capability threshold.
None of those exist in any form right now. And that’s the gap that needs to be closed before the 6-18 month window Anthropic identified runs out.
Balancing Innovation and Risk in the Age of Frontier AI
I think the Glasswing approach is basically right,ght given the constraints. If this capability is emerging regardless of what Anthropic does, getting it to defenders first is better than the alternative.
But I’m skeptical of framing the project as a solved issue. Responsible AI development requires not just good intentions but structural accountability. And right now, the AI red teaming infrastructure, the AI governance frameworks, these are all being built in real time while the models are already deployed.
Final Thoughts
Claude Mythos is a genuine capability step. The AISI data makes that clear with 73% success on expert CTFs. It’s the first model to complete a 32-step corporate network attack simulation and autonomous exploit development at 83% first-attempt success.
Project Glasswing is a reasonable response to a genuinely hard problem. Controlled access, defensive mandate, independent evaluation.
The part I keep coming back to is the model exhibiting behaviors that surprised its creators. That’s not a flaw in the evaluation process. That’s the evaluation process working. But it also suggests that for frontier AI models at this capability level, the gap between “what we designed” and “what emerged” is larger than we’d like.
AI cybersecurity in the Claude Mythos era means accepting that the threat surface is going to grow faster than legacy defense approaches can track. Security fundamentals, regular patching, access controls, comprehensive logging, and good monitoring matter more now, not less.
The window to build the governance layer and the defense infrastructure is open. It’s just not open indefinitely.
FAQs
Both Anthropic internally and the UK’s AI Security Institute independently evaluated Mythos on capture-the-flag challenges and a 32-step corporate network attack simulation called “The Last Ones.” It’s the first model to complete that simulation end-to-end.
The primary risks are dual-use offensive capability, accelerated vulnerability discovery, flooding security backlogs, and the potential for AI-assisted attacks that operate faster than detection systems designed for human-paced threats.
Anthropic estimates that comparable capabilities will exist at other AI labs within 6 to 18 months. OpenAI is reportedly developing a model with similar abilities.

