A new paper from University of Southern California researchers demonstrates that persona-based prompting, a technique widely used in AI agent system prompts, consistently degrades factual accuracy even as it improves safety and alignment. The finding, published on arXiv on March 19, resolves conflicting results from prior studies by showing that persona effectiveness depends entirely on the type of task.
The researchers tested 12 persona prompts across six LLMs, including both instruction-tuned and reasoning-distilled models. The results split cleanly: expert personas helped on alignment-dependent tasks like safety filtering, preference satisfaction, and writing style, but hurt on tasks requiring factual knowledge retrieval.
The Numbers
On MMLU, a standard benchmark for factual knowledge, expert persona prompts reduced accuracy from 71.6% (baseline) to 68.0% across all subject categories, according to the paper’s results. Every persona variant tested caused accuracy to drop. Shorter personas reduced the damage but did not eliminate it.
On MT-Bench, which tests generative quality, the picture was more nuanced. Categories relying on pretrained knowledge, specifically humanities (-0.20), math (-0.10), and coding (-0.65), all degraded with expert prompts. Categories tied to alignment and instruction-following, such as extraction (+0.65), STEM (+0.60), and writing, improved.
Safety performance went the other direction. A dedicated “Safety Monitor” persona boosted attack refusal rates across all benchmarks tested, with the strongest gain on JailbreakBench at +17.7 percentage points.
Why This Happens
The researchers, Zizhao Hu, Mohammad Rostami, and Jesse Thomason, propose that persona prefixes activate the model’s instruction-following mode at the expense of factual recall. When a model is told it’s an expert in a particular domain, it optimizes for tone, format, and behavioral alignment with that role rather than precisely retrieving facts from its training data.
This is directly relevant to how production AI agents are configured. System prompts that assign personas (“You are a senior financial analyst” or “You are an expert security engineer”) are standard practice in enterprise agent deployments across platforms like Microsoft Copilot Studio, Salesforce Agentforce, and custom agent frameworks.
A Fix: PRISM
The paper proposes a solution called PRISM (Persona Routing via Intent-based Self-Modeling). Rather than applying a persona globally to all queries, PRISM uses a lightweight adapter with a binary gate that activates persona behavior only for tasks where it helps, specifically alignment and safety tasks, and routes factual queries to the base model without persona context.
PRISM requires no external training data. The model self-generates expert persona descriptions, creates training queries, compares outputs with and without persona context, and retains only the behaviors where the persona actually improves results. The adapter adds minimal memory and compute overhead, according to the paper.
What This Means for Agent Builders
For anyone deploying AI agents in production, the takeaway is concrete: if your agent handles tasks where factual accuracy matters (financial data, medical information, legal analysis, code generation), a blanket expert persona in the system prompt may be making it worse. The persona helps your agent sound more authoritative and refuse harmful requests more reliably, but it trades away precision on the facts.
The PRISM approach, selectively routing persona behavior based on query intent, points toward a more granular design pattern for agent system prompts. Instead of a single persona governing all interactions, agents could benefit from conditional persona activation that engages the expert framing only when alignment matters more than accuracy.