What Goes Into the Prompt - Prompt Engineering for LLMs

Key Principle

Prompt content falls into two categories: static content (instructions, examples, boilerplate — unchanged across users) and dynamic content (user-specific context gathered at runtime via RAG). The critical insight is that precision matters as much as recall: including irrelevant context doesn't just waste tokens, it actively misleads the model through the Chekhov's Gun fallacy — the model forces interpretation of everything you include.

Why This Matters

Teams typically focus on retrieving enough relevant context (recall) while ignoring the damage from irrelevant context (precision). But the model assumes every detail must serve a purpose — it has internalized Chekhov's dramatic principle from well-written training data. An irrelevant snippet doesn't sit idle; it gets force-interpreted into the answer, corrupting output quality in ways that are hard to trace back to retrieval.

Good Examples

Few-shot prompting communicates implicitly. "LLMs have a compulsion to continue patterns, so if your Q&A pairs contain any, chances are the LLM will be more likely to follow them than if you had stated them as rules outright. Implicit is often better than explicit." (Chapter 5) Use few-shot examples primarily for output format demonstration. Include edge cases to communicate exception handling.

RAG bridges the knowledge gap. Two retrieval approaches with distinct trade-offs:

Lexical (Jaccard, BM25): Fast, debuggable, no infrastructure. Fails on synonyms. GitHub Copilot uses Jaccard similarity for snippets from open IDE tabs — speed matters for real-time completion.
Neural (embeddings + vector stores): Handles synonyms and cross-language queries. Requires infrastructure; failures are opaque and hard to diagnose. (Chapter 5)

Context scoring prevents crowding. Static clarification items get highest priority because "while you want as much context for the question as possible, it's more important to make sure the model actually understands the question." (Chapter 5)

Counterpoints

Three few-shot failure modes follow from the document completion mechanism:

Context window saturation: Rich examples consume tokens needed for the actual question's context.
Anchoring bias: Value distributions in examples bias predictions. If examples show uniform distribution but reality is skewed, the model continues the example pattern, not reality's pattern.
Spurious pattern detection: The model picks up incidental ordering (ascending numbers, "happy path first"). Even 3 numbers have a 17% chance of being accidentally ascending. Mitigation: shuffle examples. (Chapter 5)

The Chekhov's Gun Fallacy. "Even an irrelevant piece of context will easily get interpreted by the model, which will assume the irrelevant context simply must matter. That's the fallacy." (Chapter 5)

The Rumor Problem in summarization. Hierarchical summarization introduces misunderstanding at each level that propagates like a game of Telephone. Mitigate with conservative compression. Specific summarization (oriented toward a task) extracts more relevant details but must be redone if the task changes. (Chapter 5)

Key Quotes

"LLMs have a compulsion to continue patterns, so if your Q&A pairs contain any, chances are the LLM will be more likely to follow them than if you had stated them as rules outright. Implicit is often better than explicit." — Berryman & Ziegler, Chapter 5

"Even an irrelevant piece of context will easily get interpreted by the model, which will assume the irrelevant context simply must matter." — Berryman & Ziegler, Chapter 5

Rules of Thumb

Treat retrieval precision as seriously as retrieval recall — irrelevant context actively harms output quality
Use few-shot examples primarily for format demonstration; keep them minimal when context is rich
Shuffle few-shot examples to prevent spurious pattern detection
Instructions (static) always get higher priority than context (dynamic) in the token budget
For summarization, prefer specific (task-oriented) over general, but know you'll need to redo it if the task changes

Related References

Designing LLM Applications - The feedforward pass pipeline that structures content selection
Assembling and Structuring the Prompt - How to arrange and optimize the selected content
LLMs as Text Completion Engines - Chekhov's Gun fallacy follows from the document completion mechanism