Assembling and Structuring the Prompt - Prompt Engineering for LLMs

Key Principle

Because the model has a fixed compute budget per token and processes text unidirectionally, the arrangement of content matters as much as the content itself. The Valley of Meh is an attention dead zone in the early-middle of prompts caused by two interacting phenomena. The sandwich technique, inception, and knapsack-style assembly algorithms treat prompt construction as an engineering optimization problem rather than freeform writing.

Why This Matters

Engineers naively pack context in document order, causing critical information to land in the attention dead zone. The result: outputs that seem to miss obvious context, leading teams to debug content quality when the real problem is content placement. Systematic prompt assembly with scored elements, elastic snippets, and token budget optimization prevents both overflow (truncating critical content) and underutilization (leaving valuable context out).

Good Examples

The Valley of Meh. Two independent phenomena interact: (1) in-context learning gives later tokens disproportionate influence on the completion, (2) the "lost middle" phenomenon means models recall beginnings and endings but not middles. Their intersection creates a trough where context has minimal impact. Place high-importance elements outside this valley. Keep prompts concise — shorter prompts have shallower valleys. (Chapter 6)

The sandwich technique. Structure prompts in four parts: introduction (states the question, activates relevant knowledge early), context elements (the "meat"), refocus (restates the question with specifics after the context), and transition (shifts from problem description to solving). This anchors the task at the two high-attention positions. (Chapter 6)

Inception. Write the first portion of the model's answer as part of the prompt. Because autoregressive generation cannot backtrack, the model accepts this as its own and continues consistently. Removes uncertainty about the completion's opening, making outputs reliable and parseable. (Chapter 6)

Document type selection. Three archetypes: advice conversation (dialogue), analytic report (structured exposition), structured document (XML/YAML/JSON). LLMs respect scope boundaries more consistently in reports than in dialogues. GitHub Copilot used code comments with explicit side remarks to provide cross-file context naturally. (Chapter 6)

Counterpoints

Tokenization inertness breaks budget calculations. Token count is not additive when strings are concatenated — "cat" + "tail" (2 tokens separately) becomes "cattail" (3 tokens: [c][att][ail]). Separate elements with whitespace. Prefer elements starting with a space. (Chapter 6)

Elastic snippets avoid binary include/exclude. Elements with multiple length versions (full chapter down to key quotes) let the assembly engine ask "what is the longest version that fits?" rather than all-or-nothing inclusion. This maximizes information density. (Chapter 6)

Without a refocus, the model drifts. After long context stretches, the model's attention shifts away from the original question, producing off-topic completions. Without a transition, completion models may continue adding context rather than answering. (Chapter 6)

Key Quotes

"The Valley of Meh is caused by the interaction of in-context learning and the lost middle phenomenon, creating an attention dead zone in the early-middle of the prompt." — Berryman & Ziegler, Chapter 6

Rules of Thumb

Place the most important content at the beginning and end of the prompt, not the middle
Always include a refocus section after long context — restate the question with specifics
Use inception to control the opening of the completion and ensure parseability
Treat prompt assembly as a knapsack problem: maximize information value within token budget
Use elastic snippets (multiple length versions) instead of binary include/exclude
Separate snippets with whitespace to maintain tokenization inertness
Choose document format deliberately: reports for scope control, dialogues for naturalness, structured formats for parsing

Related References

How LLMs Process Information - Unidirectional attention explains why position matters
What Goes Into the Prompt - What to include; this reference covers how to arrange it
Designing LLM Applications - The feedforward pass pipeline that feeds into assembly