What Mythos access got us. Now public. https://blog.mozilla.org/en/privacy-security/ai-security-zero-day-vulnerabilities/
Holy patch notes, Batman! Firefox has obliterated over 200 sinister security bugs in this release alone - the most villanous vulnerabilities ever squashed in the Firefox history. https://www.mozilla.org/en-US/security/advisories/mfsa2026-30/
@freddy @Mae So, not 271 vulnerabilities found using the LLM, then? Is it - 3? More?
One could be excused for reading this paragraph as "271 vulnerabilities were identified and fixed simply by running the LLM-based tool over the code".
'As part of our continued collaboration with Anthropic, we had the opportunity to apply an early version of Claude Mythos Preview to Firefox. This week’s release of Firefox 150 includes fixes for 271 vulnerabilities identified during this initial evaluation.'
@brad @Mae Nah, you got it wrong, still.
There is no point in issuing 271 CVE identifiers when people have to update Firefox (or not). It's not like users have a choice which fixes to apply.
The LLM found way more than 271 vulns. We fixed the first 271 in Firefox 150. We lumped them into a somewhat arbitrary number of CVEs because we do not think our time is best spent writing advisory texts (or mastodon posts for that matter ;)). More bug fixes will come. And then some more.
@freddy Did you learn any lessons that would apply at the specification level? Totally reasonable if these were all implementation bugs, but I'd love to learn about mistakes we're making at other levels too.
@jyasskin https://github.com/tc39/proposal-thenable-curtailment comes to mind :D
@freddy @Mae The post implies that all the vulnerabilities fixed - regardless whether you're counting CVEs, bug reports, or "trust me bro"s - were identified using the LLM. Which does not appear to be the case.
Curious that you'd leave "way more" vulnerabilities unpatched, too. Not really vulnerabilities? Or the patching has to be done by people, and the LLM is being used as a static analyser? That then raises the question of what code scanning was being done previously.
@freddy do you have any infos regarding numbers of CVEs in how this compares to the coverage guided fuzzing time (AFL and following)?
@floyd you mean amount over time? It’s hard to compare because (I think) we got better and were faster at putting scalable automation behind this than when fuzzing, afl & asan happened. Probably easier to do statics when you look at year/multi year.
If you were to do bugs per week or month, this would beat all historic records
@freddy ok good to know, yeah per year would probably work. Ok so it got easier to automate, good to know. I don't feel that way without Mythic and using other AI
@freddy one funny thought: if the ai companies would have put in just a quarter of the money into fuzzing 🤔...
@floyd as we see at pwn2own every year, some bugs aren’t that easily found with fuzzing. I actually think this might be closing an interesting gap.