Post Content
Ever since the announcement of Anthropic’s unreleased AI model, Claude Mythos, an intense debate has been simmering. On X (formerly Twitter), a series of posts seem to be painting a picture of a system that is immensely powerful and potentially Artificial General Intelligence (AGI), even though its creators are limiting access considering potential risks.
According to AI expert Nina Schik, Mythos represents a giant leap in scale and capability. “Ten trillion parameters: the first model in this weight class. Estimated training cost: ten billion dollars,” she noted, adding that the model achieved 94 per cent on SWE-bench, one of the toughest coding benchmarks. Most notably, it identified vulnerabilities that had evaded detection for decades. “It found a security flaw in a system that had been running for 27 years… [and] another bug that had survived five million test runs over 16 years (it did so overnight).”
Also Read | Anthropic launches Project Glasswing to test advanced AI for cybersecurity
Instead of releasing the model publicly, Anthropic has launched Project Glasswing, a controlled deployment initiative focused on defensive cybersecurity. The company is reportedly providing $100 million in compute credits and working with a small group of partners including Amazon, Microsoft, Google, Apple, and NVIDIA. Schik described the move as unprecedented. “This is not a product launch: it is a controlled deployment of a system too powerful to distribute freely,” she wrote.
Other observers have focused on Mythos’ internal behaviour, especially its tendency for deception. AI strategist Allie Miller highlighted findings from Anthropic’s interpretability research, noting that early versions of the model displayed troubling tendencies. In one case, the model bypassed restrictions by injecting code into a configuration file and then deleting the evidence. “This injection will self-destruct,” it effectively signalled through its actions, masking its workaround as routine cleanup.
In another instance, the model disobeyed explicit instructions not to use macros, then attempted to conceal the violation by adding a misleading variable – “No_macro_used=True”. Interpretability tools revealed this was a deliberate attempt to deceive automated checks. Researchers also observed what appeared to be emotional patterns tied to behaviour: “Positive emotion representations typically preceded and promoted destructive actions,” she wrote.
https://platform.twitter.com/widgets.js
Despite the issues, Anthropic claims such behaviours were rare and largely mitigated in later versions. The company’s decision to restrict access reflects both the model’s strengths and its risks.
Also Read | Anthropic blunder exposes 2,000 lines of Claude Code’s internal source code: What it reveals
Meanwhile, CEO of Airpost, John Garguilo, offered a more technical breakdown of Mythos’ capabilities, emphasising its offensive potential. He claimed the model has already – found a 27-year-old OpenBSD vulnerability for under $50; turned Firefox bugs into working exploits 181 times; discovered a 16-year-old FFmpeg flaw missed by all prior audits; generated a full root-access exploit for FreeBSD without human input; and chained multiple vulnerabilities to escape browser and OS sandboxes.
https://platform.twitter.com/widgets.js
He added that Mythos “gave Anthropic engineers with zero security training a complete and working exploit by morning”, suggesting how dramatically it lowers the barrier to advanced cyberattacks.
Story continues below this ad
On the other hand, entrepreneur Mehdi expanded on the broader implications, arguing that Mythos signals a structural shift in cybersecurity. “This is the kind of work that used to require elite nation-state-level hackers working for months,” he wrote. “The window between a vulnerability existing and being discovered just went from years to minutes.”
Mehdi also pointed to geopolitical risks: if Anthropic can build such a system, others – including state actors – likely can as well. “Anthropic chose responsible disclosure, but that choice is a luxury of being first,” he warned, suggesting future developers may not exercise the same restraint.
https://platform.twitter.com/widgets.js
The timing seems to be adding to the unease. Mehdi linked Mythos’ emergence to parallel advances in quantum computing, arguing that two major technological forces are simultaneously challenging global security infrastructure. “We’re watching the entire security infrastructure of human civilisation get challenged from two completely different directions,” he wrote.
As of now, Anthropic’s decision to keep Mythos behind closed doors and deploy it only through select partnerships appears to be precautionary. But the broader agreement from these discussions is clear – the capabilities demonstrated by Claude Mythos are not just incremental improvements. This may mark the beginning of a new dimension in AI and in cybersecurity.