HomeGame GuidesFlowchart images trick GPT-4o into producing malicious text outputs

Flowchart images trick GPT-4o into producing malicious text outputs

Published on

A new study calledImage-to-Text Logic Hacking: Your imagination can help you do anything‘ found that visual language models, such as GPT-4o, could be tricked into producing malicious text outputs, but feeding them a flowchart image depicting a malicious activity alongside a textual prompt asking for details about the process.

The study’s researchers found that GPT-4o, probably the most popular visual language model, is particularly susceptible to this so-called logical jailbreak, with a 92.8% attack success rate. He said GPT-4-vision-preview is safer, with only a 70% success rate.

The researchers developed an automated text-to-text hacking framework that was able to first create a flowchart image from a malicious text prompt, which was then fed into a visual language model to give a malicious output. This method had one drawback, however, that AI-generated flowcharts were less effective at running the logical jailbreak compared to hand-crafted diagrams. This implies that this jailbreak is more difficult to automate.

The findings of this study mirror another study reported by Neowin, which found that visual language models were susceptible to producing harmful outputs when provided with multimodal input such as image and text together.

The authors of this paper developed a new benchmark called Safe Inputs but Unsafe Output (SIUO). Only a few models, including the GPT-4o, scored above 50% on the benchmark (higher is better), but all had a very long way to go.

Visual language models like GPT-4o and Google Gemini are starting to become more common offerings from various AI companies. GPT-4o still limits image entries for now to so many per day. And yet, as As these limits become more liberal, AI companies will need to tighten the safety of these multimodal models to avoid criticism from governments, which have already established AI safety organizations.

Latest articles

More like this