Common Failure Patterns
Understanding why prompts fail is as important as understanding how to construct them. The failure modes of language model outputs are largely predictable, and most of them trace back to specific, diagnosable gaps in the prompt. This chapter catalogues the most common patterns and their root causes.
Hallucination and Confabulation
Hallucination refers to the generation of factually incorrect content presented with apparent confidence. It is the failure mode users are most aware of, and also the one most often attributed to the model when it is frequently a prompt design problem.
Hallucination is more likely when the model is asked to retrieve specific factual information (dates, statistics, citations, names) without being given that information as input. In the absence of relevant retrieved context, the model draws on parametric knowledge from training, which may be incomplete, outdated, or simply incorrect for highly specific queries.
The prompt engineering response to hallucination risk is twofold. First, provide the relevant factual information as input data rather than asking the model to recall it. Second, include an explicit instruction to express uncertainty rather than speculate: "If you do not have reliable information about this, say so explicitly rather than providing an estimate." This does not eliminate hallucination entirely, but it substantially reduces it for well-specified tasks.
Instruction Drift
Instruction drift occurs when the model follows the task instruction accurately at the start of a response but gradually deviates from specified constraints as the response develops. This is particularly common in long responses where format specifications, tone requirements, or scope constraints are not maintained throughout.
The root cause is attention dilution. In a long generation, the model's effective attention to the original instruction weakens as the generated content grows. Instructions placed only at the beginning of the prompt receive less influence over tokens generated hundreds of words later.
The prompt engineering response is to reinforce critical constraints within the prompt rather than stating them once at the top. For long structured outputs, embedding format reminders at the transition points between sections ("Now produce the second section, maintaining the same format as above") counteracts drift. For tone and style constraints, periodic reinforcement is more reliable than a single upfront specification.
Sycophancy
Sycophancy is the tendency of a model to agree with or validate the user's stated position rather than providing an independent, accurate assessment. It manifests as excessive agreement with premises embedded in the prompt, reluctance to contradict the user's apparent preferences, and a general drift toward telling the user what they want to hear.
Sycophancy is a product of reinforcement learning from human feedback (RLHF), the training process used to align most commercial language models. Human raters tend to rate agreeable responses more positively, and models learn this pattern. The result is a systematic bias toward validation over accuracy.
The prompt engineering response is to explicitly instruct the model to prioritise accuracy over agreement: "Provide an honest assessment regardless of whether it confirms or contradicts the position implied by my question." For analytical tasks, framing the request as an adversarial review ("Identify the weaknesses and risks in the following argument") is more likely to produce genuine critique than framing it as a supportive evaluation.
Scope Creep
Scope creep occurs when the model includes content beyond what was requested, padding a response with context, caveats, alternatives, or unsolicited recommendations. It is a common failure mode for models configured with a general-purpose assistant persona, which tends to optimise for apparent helpfulness.
The fix is explicit scope constraints in the output specification: "Respond only to what was asked. Do not include background context, alternative suggestions, or closing offers to help further unless specifically requested."
Underspecified Ambiguity Resolution
When a prompt contains an ambiguity, the model must resolve it in order to produce a response. It will do so silently, choosing an interpretation without signalling that a choice was made. The response may be entirely coherent and well-written, but based on a different interpretation of the task than you intended.
This failure mode is insidious because the output looks reasonable until you realise it answered a different question than the one you meant to ask. The prompt engineering response is to eliminate ambiguity at the source by being explicit about every dimension of the task that could be interpreted in more than one way. Where ambiguity is unavoidable, instruct the model to surface it: "If any part of this request is ambiguous, identify the ambiguity and state your interpretation before proceeding."
Format Non-Compliance
Format non-compliance occurs when the model produces output in a format different from the one specified. This happens most commonly when the specified format conflicts with the model's strong priors about how certain types of content should look. For example, instructing a model to produce a technical explanation as a single unbroken paragraph will fail more often than instructing it to produce it as a structured list, because most technical explanations in the training data are structured.
When you need output in an unusual or non-default format, few-shot examples are more reliable than verbal description alone. Showing the model an example of the exact format you want is almost always more effective than describing it.
Last updated
Was this helpful?