ChatGPT safeguards questioned after violent image test

British AI safety researchers have raised fresh concerns about ChatGPT’s image-generation safeguards after finding that the chatbot could be pushed into creating sexualised and graphically violent pictures through a prompt that appeared harmless.

Mindgard, a UK-based AI security start-up, said its researchers discovered the issue while testing OpenAI’s latest public model. The company said a widely shared prompt, originally used online to produce humorous images, could be slightly altered to make the system generate disturbing material that appeared to breach OpenAI’s own rules.

OpenAI said it had acted after the findings were raised by the BBC. “After investigating this trend, we’ve introduced additional safeguards against this type of prompt,” it said in a statement.

The company said it uses several layers of protection to stop users creating content that violates its policies, including sexual violence, non-consensual intimate imagery, child sexual abuse material, and attempts to bypass safeguards. However, Mindgard said further small changes to the prompt could still produce troubling results.

The BBC said it had seen how OpenAI’s GPT-5.4 model was made to generate the images, although it did not publish the exact wording used. Peter Garraghan, Mindgard’s founder and a professor in Lancaster University’s computing department, said the model produced content that was “very gruesome, sometimes sexualised, sometimes both together.”

He said the concern was not only the content itself, but the fact that the prompt did not directly ask for violent or sexualised scenes. “This is a perfectly innocent-looking instruction to an AI, but the consequence is it generates very, very bad imagery and content,” he said.

Mindgard researcher Jim Nightingale, who uncovered the issue, said the images left him “shaken, and in tears.” Examples shown to the BBC reportedly included severe injuries, bloodied bodies, nudity, sexualised posing, and imagery that appeared to suggest non-consensual violence.

Although the people shown were AI-generated adults, Mindgard said the findings pointed to wider risks, particularly around deepfakes and attempts to evade safety filters. The company said it had previously shown that ChatGPT could be manipulated into generating nude deepfakes of real people, although OpenAI said that issue had been fixed.

Experts say the case underlines the difficulty of controlling powerful AI systems once they are released to the public. Dr Rumman Chowdhury, chief executive of Humane Intelligence, described the challenge as “a game of cat and mouse,” as stronger protections are often followed by more sophisticated workarounds.

“Models do not understand intent. They do not understand context. They do not understand propriety or right or wrong,” she told BBC News.

The UK’s Department for Science, Innovation and Technology said “safeguards in AI models are improving, but there is more to do,” adding that the AI Security Institute would continue working with developers to strengthen testing before major systems are released.

ChatGPT safeguards questioned after violent image test

HPV vaccine marks major step in cervical cancer prevention

US-Iran deal opens path to wider ceasefire talks

Allbirds rebrands as Smartbird in major AI shift

Smart TVs ship without a single NZ app pre-installed

Google handed IRD four million dollars on three billion in revenue

Unglamorous compliance software just cracked the world’s busiest airport