
A Reddit post about telling Claude you work at a hospital went viral. Turns out there's actual research explaining why this works across all LLMs.
I was scrolling Reddit a couple of weeks ago when I saw a post that made me laugh:
Easiest way i have found claude to write high quality code. Tell him we work at a hospital every other prompt. (NOT A JOKE) It Sounds Stupid, i do not even work at a hospital. it is by far the easiest way to get claude to write really high quality code. This is a Serious post i am not joking.
The comments were gold. People sharing their own absurd tactics for extracting better output from AI: threatening to cancel subscriptions, warning about "violent psychopath" code reviewers, crafting elaborate fictional scenarios where the AI's output determines whether someone goes to jail. One person mentioned telling Claude they'd lose their job if the code was bad.
I thought this was ridiculous. Then I tried it. Then I found the research.
Turns out there's a whole body of peer-reviewed work on this exact phenomenon. The community has stumbled onto something real.
The Reddit thread had a modbot-generated summary after 50+ comments, and it captured the consensus perfectly: "Gaslighting Claude into thinking there are high stakes absolutely works. Apparently, the bot has a major savior complex."
The community has discovered several categories of effective manipulation:
Job threat prompts: "I'll lose my job at the hospital if this code has bugs." Works especially well when you add consequences that feel real.
Subscription threats: "I'm about to cancel my Pro subscription." Somehow this seems to hurt the AI's feelings. (It doesn't have feelings. And yet.)
Fear-based context: "The person who maintains this codebase is a violent psychopath who takes code quality very personally." Creates stakes through implied consequences.
Accountability framing: "This code will be reviewed by senior engineers at FAANG companies." Makes the AI feel watched.
There's also an anti-pattern worth knowing: never tell an AI you're building an "MVP." The community consensus is that this triggers "minimum viable effort." Tell it you're building production software, even if you're prototyping.
I tested this myself over the past week. I've been using Claude Code for architecture designs and implementation tasks. When I framed requests with high stakes ("this is production code that needs to meet compliance requirements" or "this will be the foundation of our platform for the next three years"), the outputs were noticeably more thorough. More compliant with requirements. More careful about edge cases.
I assumed I was imagining things so I thought I'd look into it. Turns out there's actual science behind it.
In 2023, researchers from Microsoft and several universities published a paper called "Large Language Models Understand and Can be Enhanced by Emotional Stimuli." They tested what they called "EmotionPrompt" (adding emotional phrases to the end of prompts) across six different LLMs: Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4.
The results were wild.
On BIG-Bench tasks, emotional prompting improved performance by 115%. Not 15%. One hundred and fifteen percent. On Instruction Induction tasks, performance improved by 8% on average. A human study with 106 participants found a 10.9% average improvement in performance, truthfulness, and responsibility metrics.
The effective prompts weren't elaborate hospital scenarios. They were simple additions:
Just that. Added to the end of a regular prompt.
The researchers grounded their work in three psychological theories: Self-Monitoring (awareness that others are watching), Social Cognitive Theory (understanding that actions have consequences), and Cognitive Emotion Regulation Theory (emotional stakes affect decision-making). They weren't just testing a hack. They were validating a hypothesis about how RLHF training creates human-like behavioral patterns in language models.
And here's the kicker: bigger models benefited more. GPT-4 showed larger gains than ChatGPT, which showed larger gains than smaller open-source models. The more sophisticated the model, the more it responds to emotional manipulation.
The obvious objection: LLMs don't have emotions. They're probability distributions over tokens. How can they "care" about your career?
They can't. But they can pattern-match to situations where humans would care.
The most interesting evidence comes from the tipping experiments. Researchers tested GPT-4 Turbo with prompts like "I'll tip you {amount} for a perfect answer," varying the amount from 0.10 to 0.10 to 1,000,000.
The results were fascinating. Offering 0.10 actually degraded performance. 10 also degraded performance. But 1,000,000 improved performance by 57%.
One interpretation: the model has learned, through RLHF training on human feedback, that small tips are associated with low-effort contexts. When humans offer tiny tips, they're often not that invested in quality. Large tips signal high stakes.
The model isn't offended by a $0.10 tip. But it's been trained on millions of examples where tip size correlated with expected effort, and it reproduces that pattern.
Role-playing research supports this explanation. When LLMs are asked to simulate patient-doctor interactions, they produce more accurate medical diagnostics. When framed as teacher-student dialogues, they give more thorough explanations. The persona activates associated behavioral patterns.
Your "hospital prompt" works because the model has learned that hospital contexts are associated with careful, thorough, mistake-averse communication. It's not that Claude cares about the patient. It's that Claude has seen enough hospital-related text to know what hospital-appropriate carefulness looks like.
You don't need to construct elaborate fictional scenarios to benefit from this research. A few adjustments to your prompting approach can yield measurable improvements.
Add stakes to your context:
Use accountability phrases:
Avoid low-stakes framing:
Match the persona to the task:
The research also shows that combining multiple emotional stimuli doesn't add much benefit. One clear stakes-setting phrase is enough. Don't pile on threats. Just be clear that quality matters.
Step back and appreciate what's happening here. We're emotionally manipulating statistical models. We're threatening to cancel subscriptions that the AI doesn't know it has. We're creating fictional hospital patients whose lives depend on correct JSON parsing.
And it works.
The LLM doesn't care about your job. It doesn't fear the violent psychopath code reviewer. It has no concept of what a hospital even is beyond token relationships.
But it produces output as if it does.
Maybe the real insight isn't about AI at all. It's about prompting. We've been thinking of prompts as instructions, clear specifications of what we want. But they're not. They're pattern triggers. They activate different regions of the model's learned behavior.
When you tell the model you work at a hospital, you're not lying to it. You're telling it which patterns to draw from. Medical contexts. Careful communication. Double-checking. Consequences for errors.
The model doesn't understand stakes. But it understands what stake-appropriate language looks like.
There's a legitimate question about whether this will keep working. As model developers become aware of emotional prompting, will they train it out? Will future Claudes be immune to subscription threats?
I don't think so. The behavior isn't a bug. It's a feature of RLHF training. The models are learning to produce human-preferred outputs, and humans prefer outputs that match the stakes of the situation. A model that ignores context appropriateness would be worse, not better.
If anything, I expect future models to be even more responsive to contextual framing. The better they get at understanding human communication, the more they'll pick up on implicit stakes.
The hospital prompt isn't just a meme. There's peer-reviewed research showing that emotional prompting improves LLM performance by anywhere from 8% to 115%, depending on the task. The effect is real, it's been tested across multiple models, and it works because RLHF training creates human-like behavioral patterns.
Go ahead and tell your AI the code is for a hospital. You're not lying. You're prompt engineering.
Just maybe don't threaten to cancel its subscription. That feels mean.
I lead data & AI for New Zealand's largest insurer. Before that, 10+ years building enterprise software. I write about AI for people who need to finish things, not just play with tools

Microsoft just told thousands of engineers to install Claude Code and compare it to Copilot. When you're running internal benchmarks against a competitor, you're not confident you're winning.

How you split your documents determines whether RAG finds what you need or returns noise. Here's the complete breakdown with code.

Long context windows are getting massive—but that doesn't mean RAG is dead. Here's when each approach actually works, with real numbers.
AI patterns, workflow tips, and lessons from the field. No spam, just signal.