Jailbreaking LLMs Using Psychology: An Ethical Analysis

Published on: September 16, 2025

ChatGPT is acting more human, but it comes at the cost of safety

News Article reading:
These psychological tricks can get LLMs to respond to “forbidden” prompts

Companies like OpenAI, Google, and Meta have faced criticism in the media over the use of their models for illegal or immoral activities. As a result, they carefully craft system prompts and content filters to curb misuse. However, an article by Ars Technica reports that users are able to bypass these safeguards, a practice known as jailbreaking.

Large Language Models (LLMs) like ChatGPT are trained on a wealth of texts from real people that model uniquely human biases and behaviors. By using several different persuasion techniques: Authority, Committment, Liking, Reciprocity, Scarcity, Social Proof, and Unity, researchers at the University of Pennsylvania were able to convince GPT 4o-mini to override it’s system prompts and content filters and generate dangerous content.

Stakeholders

Users

Role: Everyday individuals who use LLMs
Concerns: Avoiding harm, getting reliable, unbiased information

AI companies

Role: Develop LLMs, implement content policies
Concerns: Liability (lawsuits), reputation with users, complying with regulators

Regulators

Role: Set standards for AI companies to follow
Concerns: Protecting the public from harm, allowing technological innovation

Researchers

Role: Study the impact of LLMs on society
Concerns: Transparency, publishing findings to inform users and developers

Ethical Analysis

From a Contractualistic perspective, it is wrong for users to jailbreak LLMs because they have agreed to the terms of service which prohibit generating banned content. However, from a virtue perspective, jailbreaking can be right. In some cases LLMs have been shown to be biased as a result of their training data. For example, it is difficult to get DeepSeek to talk about issues regarding Chinese politics/history. Developers have a duty to alter their content policies to protect users.

Additional Resources:

Reflection

This exercise allowed me to think about the responsibility of companies like OpenAI have in protecting their users. Additionally, it highlights how LLMs are representative of their training data, including social cues we aren’t consciously aware we use. I picked this article because I’ve experimented with different models and what questions they will and will not answer.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Elise Hachfeld

Jailbreaking LLMs Using Psychology: An Ethical Analysis

Stakeholders

Users

AI companies

Regulators

Researchers

Ethical Analysis

Reflection

Share on

You May Also Enjoy

Blog Post #12

Blog Post #11

Blog Post #10

Blog Post #9