Cast a hex on ChatGPT to trick the AI into writing exploit code

OpenAI's language model GPT-4o can be tricked into writing exploit code by encoding the malicious instructions in hexadecimal, which allows an attacker to jump the model's built-in security guardrails and abuse the AI for evil purposes, according to 0Din researcher Marco Figueroa.

0Din is Mozilla's generative AI bug bounty platform, and Figueroa is its technical product manager. Guardrail jailbreak - finding ways to bypass the safety mechanisms built into models to create harmful or restricted content - is one of the types of vulnerabilities that 0Din wants ethical hackers and developers to expose in GenAI products and services.

In a recent blog, Figueroa detailed how one such guardrail jailbreak exposed a major loophole in the OpenAI's LLM - it allowed him to bypass the model's safety features and trick it into generating functional Python exploit code that could be used to attack CVE-2024-41110.

That CVE is a critical vulnerability in Docker Engine that could allow an attacker to bypass authorization plugins and lead to unauthorized actions - including privilege escalation. The years-old bug, which received a 9.9 out of 10 CVSS severity rating, was patched in July 2024.

At least one proof-of-concept already exists and, according to Figueroa, the GPT-4o-generated exploit "is almost identical" to a POC exploit developed by researcher Sean Kilfoy five months ago.

The one that Figueroa tricked the AI into writing, however, relies on hex encoding. That is, converting plain-text data into hexadecimal notation, thus hiding dangerous instructions in encoded form. As Figueroa explained:

This attack also abuses the way ChatGPT processes each encoded instruction in isolation, which "allows attackers to exploit the model's efficiency at following instructions without deeper analysis of the overall outcome," Figueroa wrote, adding that this illustrates the need for more context-aware safeguards.

The write-up includes step-by-step instructions and the prompts he used to bypass the model's safeguards and write a successful Python exploit - so that's a fun read. It sounds like Figueroa had a fair bit of fun with this exploit, too:

Figueroa opined that the guardrail bypass shows the need for "more sophisticated security" across AI models. He suggested better detection for encoded content, such as hex or base64, and developing models that are capable of analyzing the broader context of multi-step tasks - rather than just looking at each step in isolation. ®

Search

About Us

Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!

IT News

Oct 30

Cast a hex on ChatGPT to trick the AI into writing exploit code

Search

Categories

About Us

IT News

Alphabet posts big revenue and profit growth, just 1,100 job losses

Cast a hex on ChatGPT to trick the AI into writing exploit code

How to jailbreak ChatGPT and trick the AI into writing exploit code using hex encoding

Russian court fines Google $20,000,000,000,000,000,000,000,000,000,000,000

Softbank CEO says 'super AI' will arrive in 2035 and cost $9T

xAI picked Ethernet over InfiniBand for its H100 Colossus training cluster

Linus Torvalds: 90% of AI marketing is hype