Open source LLM tool primed to sniff out Python zero-days

Researchers with Seattle-based Protect AI plan to release a free, open source tool that can find zero-day vulnerabilities in Python codebases with the help of Anthropic's Claude AI model.

The software, called Vulnhuntr, was announced at the No Hat security conference in Italy on Saturday.

"The tool does not simply paste some code from the project and ask for analysis," explained Dan McInerney, lead AI threat researcher at Protect AI, who developed the software with colleague Marcello Salvati.

"It automatically finds project files that are likely to handle remote user input, Claude analyzes that for potential vulnerabilities, then for each potential vulnerability Claude is given a vulnerability-specific highly optimized prompt and enters a loop."

"In this loop it intelligently requests functions/classes/variables from elsewhere in the code continually until it completes the entire call chain from user input to server output without blowing up its context window. The advantage of this over current static code analyzers is a massive reduction in false positives/negatives since it can read the entire call chain, not just little code snippets one at a time."

This approach, McInerney claims, can reveal complex, multi-step vulnerabilities, as opposed to flagging functions like eval() with known security implications.

"The tool was originally designed using Claude and used Claude's best practices in prompt engineering so it performs by far the best using Claude," said McInerney. "We included the option to use [OpenAI's] GPT-4 and we tested it with GPT-4o but got poorer results. Modifying the prompts to better fit GPT-4o is very straightforward and using the GPT-4o model is just a change in 1 line of code. By open sourcing it, we hope to encourage modifications such as these as new models come out."

So far, McInerney says, Vulnhuntr has found more than a dozen zero-day vulnerabilities in large, open source Python projects.

"All of these vulnerabilities were not previously known or reported to the project maintainers," he said.

The tool presently focuses on seven types of remotely exploitable vulnerabilities.

Affected projects include:

Other projects with vulnerable code spotted less than 90 days ago have not been identified to give maintainers time to fix things.

Ragflow, said McInerney, is the only project he's aware of that has fixed its identified bug.

Vulnhuntr has some limitations. It only works on Python code at the moment and it depends on access to a Python static analyzer. As a result, the tool is more likely to generate false positives when scanning Python projects that incorporate code in other languages (e.g. TypeScript).

When generating a proof-of-concept (PoC) exploit, the software generates a confidence score ranging from 1 to 10. A score of 7 means it's probably a valid vulnerability, though the PoC code may need some refinement. A score of 8 or more is highly likely to be valid. Scores of 6 or less are unlikely to be valid.

The output looks something like this:

Another issue is that LLMs aren't deterministic - they may provide different results for the same prompt at different times - so multiple runs may be required. Nonetheless, McInerney says that Vulnhuntr is a significant improvement over the current generation of static analyzers.

There's also some cost involved since the Claude API isn't free.

"My average use of it is to identify the one or two files in a project that handle remote user input and tell the tool to do analysis on just those couple files," said McInerney. "When used this way, it averages less than $0.50 of token usage. It will automatically find these network-related files as well, but it's a broad search that often sees it scanning 10-20 files instead of the 1-2 that give the best results usually. Depending on project size, scanning all the network-related files will still only cost ~$1-$3."

McInerney says he believes Vulnhuntr's discoveries represent the first time actual zero-day vulnerabilities have been identified in public projects by an AI-assisted tool.

"There are multiple papers purporting this and all are misleading because their AI did not discover zero-days, it was merely fed known vulnerable targets or code that it wasn't trained on and then said this was evidence their AI can find zero-days," he said. "As far as our research can tell, the release of Vulnhuntr will be the first time LLMs have actually found zero-days in the wild."

As an example, he pointed to a paper by academic researchers whose work we've covered previously.

Daniel Kang, assistant professor of computer science at the University of Illinois Urbana-Champaign, and a co-author on the cited paper and similar ones, told The Register that relying on simulated data is a common practice in security research.

"It is widely accepted that simulations of real-world environments are acceptable proxies for the real world," he said. "I can link to hundreds of security papers and press releases where security tools are used in simulated environments or on past real-world vulnerabilities and no one disputes these findings. The correct thing to say is that we simulate the zero-day setting, but again, this is widely accepted as common practice."

Kang's paper describes using teams of LLM agents to exploit zero-day vulnerabilities, noted that Vulnhuntr doesn't handle exploitation. He also said that in the absence of an analysis of false positives or a comparison to tools like ZAP, Metasploit, or BurpSuite, it's difficult to say how the tool compares to existing open source or proprietary alternatives.

According to McInerney, the vulnerabilities identified by Vulnhuntr are very easy to exploit once identified.

"The tool gives you a proof-of-concept exploit once it finds a vulnerability," he said. "It's not uncommon to need to make some kind of minor adjustment to the PoC to make it work, but it's obvious what adjustments to make after reading the analysis the LLM gives you as to why it's vulnerable."

We're told Vulnhuntr will be released on GitHub, presumably through a repo associated with Protect AI. The biz is also encouraging budding bug hunters to try the tool on open source projects listed on its bug bounty website, huntr.com. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Oct 22
Intern allegedly messed with ByteDance's LLM training cluster

No losses caused - except the intern's job - says TikTok parent

Oct 22
Microsoft says its Copilot AI agents set to tackle employee tasks in November

Let bots manage your supply chain? What could possibly go wrong?

Oct 21
Gary Marcus proposes generative AI boycott to push for regulation, tame Silicon Valley

Interview 'I am deeply concerned about how creative work is essentially being stolen at scale'

Oct 21
Big browsers are about to throw a wrench in your ad-free paradise

Mozilla and Google complicate life for users of uBlock Origin and uBlock Lite

Oct 21
UK authority struggles to RISE with SAP, throws another £9M at project

Gloucestershire continues with legacy system as SaaS replacement delayed by more than a year

Oct 21
AI 'bubble' will burst 99 percent of players, says Baidu CEO

Asia In Brief Plus: Australian bank glitch empties accounts; China online slang crackdown; Toshiba teams with Airbus; and more

Oct 20
WinAmp's woes will pass, but its wonders will be here forever

Opinion Not as clumsy or random as a streamer, an elegant player for a more civilized age