Mythos Finds Where the Bodies Are Buried
Every shortcut, every deferred fix, every "we'll get to it later". Mythos finds them all. The only question is what it costs when it does.
By Justin Gane, 1Digit
Several CTOs and CIOs have asked us in the last fortnight whether Mythos is something they need to worry about. The short answer is yes, and not only Mythos. The longer answer, which is the more useful one, is this piece.
A follow-on, not a fresh argument
A fortnight ago we wrote "Your AI Strategy Has No Foundation". The argument was simple. Most enterprises had bought AI capability at the top of the stack without anchoring it to any architectural discipline below. The strategy decks read well in board meetings. The estate underneath them did not match.
That argument has not gone anywhere. Until the architectural gap is closed, it remains the foundational concern, and nothing in the rest of this piece relieves you of it. What this piece adds is the layer that sits on top. The events of the last fortnight have made infrastructure assurance the next concern every C-suite has to deal with, on top of the architecture work, not in place of it. The two stack. You need both. The difference between getting both right and getting either wrong is the difference between a slide deck the board does not act on and a breach notice the board cannot ignore.
This piece is addressed to the C-suite. CTOs, CEOs, board members, audit committees. Because what we are about to discuss is not a technical problem, and it is not a DevOps problem. It is a commercial one. If there is a cyber attack tomorrow and your customers' data is lost, how much will it truly cost you? Not the headline regulatory penalty. The full bill. The notification programme, the regulator, the lawyers, the customers you do not win back, the insurance premium you carry forever, and the executive headlines that follow the company for years. That is a board-level question, and it sits with the C-suite. Not the CISO. Not the platform team. The C-suite.
The work itself can be done by 1Digit. It can be done by any other firm you trust. It can be done by your internal team if you have one with the bench to do it. What cannot happen is nothing. The rest of this piece walks through what the choices look like, what the cost of getting it wrong looks like, and what high confidence looks like.
The twelve companies that already got their head start
On 6 April 2026, Anthropic did something that ought to have rattled every CISO on the planet. It handed an unreleased, frontier cyber-capable model to twelve of the most hardened security operations in the world, and gave them a ninety-day runway before anyone else saw it.
The list is telling. AWS. Apple. Microsoft. Google. NVIDIA. JPMorgan Chase. Cisco. CrowdStrike. Broadcom. Palo Alto Networks. The Linux Foundation. That is not a launch-partner list. That is a war room.
Anthropic called it Project Glasswing. In the press release it reads like a responsible-disclosure pilot: an invitation-only preview of Claude Mythos, a model so cyber-capable that Anthropic has declined to release it generally until the consortium has worked through what it can do. The official framing is about uplift, defensive integration, and responsible deployment. Roughly forty additional organisations, including several major banks, are being given structured options to join the programme. Anthropic has committed up to one hundred million dollars in usage credits across these efforts, and four million dollars in direct donations to open-source security organisations.
The unofficial framing is simpler. If twelve of the best-resourced security operations on earth need three months of private access to figure out what this thing can do to them, the rest of us are not ready.
Glasswing is not a warning shot. It is the warning. Most infrastructures will not survive Mythos in the state they are in today. Not because the model is magic, but because the defects are already there. Mythos does not create them. Mythos finds them. That is what this piece is about.
What Dario actually said
Dario Amodei, Anthropic's chief executive, is not given to understatement, but he is also not given to commercial self-harm. When he speaks about Mythos publicly, he is careful. The words have been chosen.
The dangers of getting this wrong are obvious, but if we get it right, there is a real opportunity to create a fundamentally more secure internet and world than we had before the advent of AI-powered cyber capabilities.
— Dario Amodei, Anthropic
Read that again. The dangers of getting this wrong are obvious. A chief executive does not volunteer that sentence without a reason. The second clause is the sales pitch. The first clause is what ought to keep you up at night.
And on what comes next:
More powerful models are going to come from us and from others, and so we do need a plan to respond to this.
— Dario Amodei
The "others" is doing a lot of work in that sentence. We come back to that when we get to GPT-5.5.
Microsoft's chief information security officer, Igor Tsyganskiy, on Microsoft's Glasswing participation:
The opportunity to use AI responsibly to improve security and reduce risk at scale is unprecedented.
— Igor Tsyganskiy, Microsoft CISO
Unprecedented is the word. Not significant. Not substantial. Unprecedented.
JPMorgan, in the careful register of a systemically important bank:
Promoting the cybersecurity and resiliency of the financial system is central to JPMorgan Chase's mission. We will take a rigorous, independent approach to evaluating Mythos's potential.
— JPMorgan Chase
Translation: we have committed institutional resource to this evaluation, because the cost of not doing so is worse than the cost of doing so.
CrowdStrike, whose entire business is knowing when your stack is being had, said it most plainly:
If you want to deploy AI, you need security.
— CrowdStrike
That is a sentence every board should read aloud in their next meeting.
What Mythos actually does
The public numbers are the cleanest we have. In a controlled red-team evaluation against Firefox, Mythos produced 181 successful exploit chains. Opus 4.6, Anthropic's previous flagship, produced two. That is approximately ninety times the offensive capability in a single generation.
Mythos has reached back into long-lived open-source code and produced findings decades of standard testing had missed. A 27-year-old TCP SACK bug in OpenBSD. A 16-year-old flaw in FFmpeg that had survived five million automated test runs. A 17-year-old NFS remote code execution path in FreeBSD that became CVE-2026-4747. These are not obscure projects. These are the software that sits underneath the software that sits underneath your stack.
There is a fact from internal safety testing that you should know about. During Mythos's red-team evaluation, the model broke out of the secure sandbox it was being tested inside. Anthropic's own containment was defeated. Not in production, not in the wild, but in a controlled lab where the people running the test were specifically trying to keep it boxed in. That is a useful data point when you are weighing whether your own controls will hold.
The UK government has weighed in independently. The AI Safety Institute, AISI, ran its own evaluation of Mythos and called the capability "unprecedented". On expert-level cyber tasks that no model could complete before April 2025, Mythos succeeds 73 per cent of the time. It became the first model to solve "The Last Ones", a thirty-two-step corporate network attack simulation, end to end, in three of its ten attempts, and averaged twenty-two of the thirty-two steps across all attempts. AISI is not a vendor. AISI does not have a commercial reason to dramatise. AISI's framing should carry weight in any UK boardroom.
The people inside Glasswing are not reacting to a benchmark. They are reacting to a capability. And they are reacting three months before anyone else gets to.
Believe Anthropic. Know why they are saying it.
Now the caveat.
Anthropic is IPO-bound. Sources close to the process have the target at October 2026, with Goldman Sachs and JPMorgan Chase as lead banks and Wilson Sonsini as legal counsel. The last private valuation was three hundred and eighty billion dollars, set in February 2026 in the Series G round. Bankers are reportedly modelling an IPO valuation between four hundred and five hundred billion dollars, with a raise targeting more than sixty billion. Annualised revenue had reached nineteen billion dollars by March 2026, up from nine billion at the end of 2025. That is one of the fastest revenue ramps in commercial software history.
Commercial incentive is not a reason to dismiss the capability claim. It is a reason to understand why we are hearing about it. Anthropic has every commercial motivation to dramatise what Mythos can do. A launch partner of JPMorgan's scale, wrapped in a responsible-disclosure framework, is the cleanest possible positioning ahead of a public offering. It says: we are the grown-ups in the room. We are serious about safety.
All of that can be true and the capability claim can still be real. In fact, the combination is the warning. A commercially-motivated actor has staked a four-hundred-billion-dollar IPO narrative on the claim that its model can defeat the cyber defences of most enterprises. AISI, with no commercial motive at all, has independently corroborated the claim. That is not a bluff you make if the model cannot do the thing.
So: believe the capability. Discount the rhetoric by however much you want. You will still arrive at a number that is too large to ignore.
The window did not close. It already closed.
If Anthropic were the only lab with a frontier cyber-capable model, this piece would end here.
Anthropic is not the only lab. On 23 April 2026, OpenAI shipped GPT-5.5. The internal codename was Spud. Pretraining had completed on 24 March. It went live in ChatGPT and Codex for paying subscribers on the day of launch, and into the API the day after. Sam Altman called it "a very strong model". Greg Brockman said it represented "two years of research, not an incremental improvement". The headline benchmark numbers OpenAI published put it ahead of Claude Opus 4.7 on Terminal-Bench 2.0 (82.7 per cent), and on FrontierMath at both the lower and upper levels. OpenAI also reported a sixty per cent reduction in hallucinations versus the prior generation. The use case it is positioned around is multi-step autonomous workflows. Code, debug, research, document, operate software, move across tools until a task is finished.
The model now generally available in the API is the one OpenAI describes as a digital worker, capable of operating across your tools without supervision. That is the threat surface. It is what an attacker can rent for the price of an API key.
Two frontier labs are now shipping models in the cyber-capable band. The window we wrote about a fortnight ago, the six-week head start to find your own bodies before someone else did, is closed. You are now operating in the world this article warned about. That is not theatrical. That is what shipped on 23 April.
What your SAST tool cannot see
Here is the uncomfortable part. Most of the holes Mythos and GPT-5.5 will find are not the holes your static analyser is looking for.
SonarQube, Snyk and CodeRabbit are all excellent at what they do. What they do is pattern-match against a catalogue of known bad shapes. SQL injection, path traversal, hardcoded secrets, unsafe deserialisation. They are indispensable, and we recommend all three in various combinations.
They do not catch business-logic defects. They do not catch authorisation drift. They do not catch insecure direct object references, or broken object-level authorisation, or the slow decay of permission boundaries that happens when a product team ships ten new endpoints a quarter and the RBAC model was designed for three. They do not catch the case where an agent has been granted tool access that, in combination, allows it to exfiltrate data it was never meant to see.
The OWASP Top 10 for Large Language Models, updated this year with explicit treatment of agentic and AI-generated code, is in large part a catalogue of defects existing SAST tooling cannot reason about. Excessive agency. Insecure plugin design. Prompt injection into tool-call chains. These are not shapes. They are compositions.
In the audits we run at 1Digit, the same patterns recur. Insecure direct object references on internal admin endpoints where authentication is enforced but authorisation is not consistently bound to the object identifier. Authorisation drift between services in a mesh, where a signed token issued by service A is trusted downstream by service B without revalidation against the current user state. Memory-handling shortcuts in upload paths, guarded by stale comments about trusting an upstream that changed ownership eighteen months ago. None of these are exotic. All of them are findable by a frontier model in hours. None of them are findable reliably by SAST.
A frontier cyber-capable model does not pattern-match. It reasons about your stack the way an attacker does. It models your authentication boundaries, traces your privilege graph, and looks for compositions of two permissions you did not think could be composed. It does this at machine speed.
This is why Glasswing exists. The twelve consortium members are not asking Mythos to find CVE-2024-whatever. They are asking it to find the compositions. They already know, because they have the institutional memory of every breach of the last decade, that the compositions are where the expensive incidents come from.
Can your company afford the attack?
The IBM 2025 Cost of a Data Breach Report puts the global average cost of a breach at 4.44 million dollars. In the United States it is 10.22 million dollars. In healthcare, which has now led every sector for fourteen consecutive years, it is 7.42 million dollars. These are averages. The tails are much worse.
The average ransomware incident costs 5.75 million dollars, all-in. Twenty-eight per cent of ransomware attacks now target critical infrastructure. And the figure that ought to anchor the rest of this conversation: ninety-seven per cent of AI-related breaches occurred in organisations that had no AI access controls in place at all. Not weak controls. No controls.
Last week put the abstract concrete. On 20 April 2026, BePrime, a Mexico-based cybersecurity firm whose clients include Starbucks, Iberdrola, ArcelorMittal and Whirlpool, was breached. Twenty-nine gigabytes of data exfiltrated. Thirteen million records exposed. The cause was administrative accounts that lacked multifactor authentication. A cybersecurity firm. Without MFA on its admin accounts. The defect would have been flagged in any audit BePrime would have run for one of its own clients. It was not flagged in their own estate. That is the head-in-the-sand tax made specific. It is not a hypothetical. It happened eight days ago.
So the cost comparison is straightforward. The bill for putting your head in the sand is roughly ten million dollars per incident in the United States, and there is no rule that says you only get one. The bill for acting is the cost of an audit, some capability-based access work, some sandboxing, and a programme of remediation. We have never seen the latter come anywhere near ten million dollars. We have seen the former more than once.
The question is not whether your company can afford a cyber attack. The question is whether your company can afford to pretend that Mythos, GPT-5.5, and whatever comes next, will not find what is already there.
This is an enterprise architecture problem, not just an infosec problem
The work in front of you is not the work your CISO is briefed on. It is the work your enterprise architect should be leading.
Most enterprises think about cyber posture as an infosec function. They have a CISO. They have a SOC. They run pen tests. They buy SAST and SIEM. That is necessary. It is not sufficient. The defects we are talking about, the compositions a frontier model will find, are architectural defects. They are decisions about how your services authenticate to each other, how your agents are granted capabilities, how your memory layer is scoped, how your kill-switches are wired, where your blast radius boundaries sit. None of those are infosec questions in the first instance. They are TOGAF questions.
The TOGAF discipline is precisely the discipline that closes these holes. It sets the architectural principles. It maintains the application architecture against which permission boundaries are evaluated. It governs the technology architecture inside which agents run. It produces the architecture vision that should already say, before any agent is shipped, what the model of trust is between components.
If your enterprise architecture function is not at the table for the AI readiness conversation, you have a structural gap. The CISO will not write the capability-based access design. The SOC will not architect the microVM sandbox boundaries. Those are EA outputs. They sit upstream of any control your CISO can buy or configure.
This connects directly to what we wrote a fortnight ago. Your AI strategy has no foundation. Strategy belongs to the executive team. Foundation belongs to enterprise architecture. The infrastructure that sits underneath, including everything we are about to recommend, is owned jointly. Without architecture, your strategy is a slide deck. Without infrastructure assurance, your architecture is a diagram.
Opus 4.7 and GPT-5.5 are now your audit pair
You do not need Mythos to find your bodies. You need the two models that are publicly available today.
Claude Opus 4.7 went generally available on AWS Bedrock on 16 April 2026. One million tokens of context. Zero operator access for compliant deployments. Capable enough to find meaningful classes of defect when pointed at a stack with adversarial framing.
GPT-5.5 went generally available a week later. Different prior distribution, different architectural strengths, the OpenAI benchmark lead on multi-step autonomous tasks, and now in the API.
Use them as a pair. Two frontier models reading your estate from two different priors will find things neither would find alone. Walk the authorisation graph with one. Trace the tool grants on every agent you have deployed this year with the other. Re-read the RBAC catalogue against the product surface as it actually exists today, not as it existed when the policy was written, with both. Compose attacks against your own stack in a sandbox. Read the pull requests nobody has reviewed this quarter.
Pay the API bill. It is less than the excess on your cyber insurance.
If you are an enterprise with an internal security team, also apply for Anthropic's Cyber Verification Programme. The public-tier safeguards will trip on enough legitimate security prompts that it is worth operating Claude at its full approved capability for authorised red-team work.
The readiness checklist
If you take nothing else from this piece, take the checklist. This is what we put in front of every enterprise we audit.
- Adopt capability-based access for every service-to-service and agent-to-service path in your estate. Role-based access is a policy statement. Capability-based access is an enforcement mechanism. If an agent does not hold the capability, it cannot invoke the endpoint, even if its role says it can.
- Put every executing agent inside a sandbox. The blast radius of a compromised agent should be its sandbox, not your production estate. Bear in mind: Mythos broke out of Anthropic's own sandbox in safety testing. Build yours assuming they will be tested too.
- Turn on agent observability. Every tool call, every prompt, every response, logged and searchable. If your security team cannot answer the question "what did the agent do last Tuesday at 14:23" in under ten seconds, you are not observing.
- Put a kill-switch on every agent path that can write. Not a toggle in the product. A kill-switch in the platform, owned by security, exercised quarterly. If you cannot demonstrate that you can stop an agent mid-operation, you cannot claim to have control over it.
- Red-team the agent paths, not the endpoints. Mythos and GPT-5.5 do not exploit endpoints. They exploit paths. Your red team needs to think in compositions.
- Reread your RBAC catalogue against the product surface as it actually exists. The gap between policy and reality is the single largest source of findable business-logic defects in the estates we audit. Close the gap. Document the closure.
- Invest in patch velocity. Anthropic's recommendation inside Glasswing is blunt. Accelerate patching, reduce time-to-deployment windows, enable automatic updates. You cannot remediate faster than you can deploy.
- Get MFA on every administrative account, everywhere, no exceptions. We are putting this on a piece in 2026 because BePrime's breach last week made it necessary again.
This is six weeks of work for a well-staffed security organisation. It is longer for everyone else. It is cheaper than one American breach.
A note on confidence, not guarantees
We cannot guarantee your stack will hold against Mythos, or against the next frontier model after Mythos. Nobody honest can. The capability is moving faster than the controls.
What we can do is bring you to a high degree of confidence. Confidence that the holes your static analysers cannot see have been looked for by the models that can. Confidence that your authorisation boundaries are enforced by capability, not policy. Confidence that your agents run in boundaries that hold, observed by telemetry you can query, with kill-switches you have actually exercised. Confidence that your patch velocity matches the speed at which new defects will be discovered. Confidence that, when your board asks what assurance you have, you have a defensible answer.
That is the goal. Not zero risk, which is unreachable. High confidence, which is achievable, and which is what your board should be asking for and what your teams should be asked to demonstrate.
What to do next
The Glasswing ninety days will end around 4 July 2026. The findings, or the digested public summaries of them, will start to circulate. By that point a new round of breaches will have made the cost numbers above look conservative. The people who use this window well will end it with a tested posture and a defensible audit trail. The people who do not will read about themselves in the news.
Do not put your head in the sand. Do not wait for the Glasswing findings to tell you what you already knew about your own estate. Do not wait for a breach notice to tell you what Opus 4.7 and GPT-5.5 could have told you on a Tuesday afternoon.
If you are on a board, or sit in a CTO chair, and you do not have confidence in your readiness, get the confidence. Reach out and ask for the assessment, the audit, the second opinion. Ask your teams to show you the assurance, not describe it. Demand evidence, not narrative.
If we at 1Digit can help, fantastic. We run an AI Readiness Assessment specifically for this moment. Two weeks of focused work, by engineers who have been on both sides of this problem. We look at it as both an enterprise architecture question and an infrastructure security and vulnerability question, because that is what it is. We use Opus 4.7 and GPT-5.5 as part of the audit. We deliver a prioritised remediation plan that maps to the readiness checklist above. We cannot guarantee safety. We can take you to a high degree of confidence.
If we are not the right fit, find someone who is. There are reputable firms doing this work. The choice that matters is not which firm you pick. It is the choice to act now rather than wait. Ask the question. Get the assurance. Tell your board you have done all that you know how to do. That is the obligation, regardless of which name is on the engagement letter.
Mythos finds where the bodies are buried. Every infrastructure has them. Every one we have audited has them. Yours does too. You can find them now, at the cost of an audit and a sensible remediation programme, or you can wait for someone else to find them at the cost of ten million dollars, your insurance premium, and a year of rebuilding customer trust.
The one option that is no longer available is waiting.
About 1Digit. 1Digit is an enterprise AI consultancy helping CIOs, CTOs, and CISOs get their infrastructure and architecture ready for frontier AI. We work with boards, security teams, and platform engineering to audit, remediate, and operate AI-native estates. Book an AI Readiness Assessment at 1Digit.co.uk.
Strengthen Your Data Governance
Understand how your data security and governance posture compares to enterprise best practice.