Key Principle
Prompt engineering is software engineering. This playbook translates the book's techniques into a concrete implementation sequence: start with evaluation, build the feedforward pass, optimize assembly, add reasoning and tools as needed, and iterate with data. Each step builds on the previous one.
Why This Matters
The book's 11 chapters cover techniques in conceptual order, but implementation requires a different sequence. Teams that start with the model (fine-tuning, temperature tuning) before building evaluation and prompt infrastructure waste effort on changes they cannot measure. This playbook provides the correct order of operations.
Phase 1: Foundation (Before Any Prompting)
Step 1: Build evaluation first. (Chapter 10)
- Create an example suite of 5-20 representative inputs with expected outputs
- Set up automated comparison (gold standard if you have reference answers, functional tests if you have verifiable properties, SOMA if neither)
- Record latency and token consumption from day one
Step 2: Choose document format. (Chapters 4, 6)
- Select an archetype: advice conversation (dialogue), analytic report (exposition), or structured document (XML/JSON)
- Reports give better scope control; structured formats give easier parsing; dialogues feel natural
- Apply the Little Red Riding Hood principle: use formats the model has seen millions of times
Step 3: Design the feedforward pass. (Chapter 4)
- Map the transformation: user problem → context retrieval → snippetizing → scoring → assembly → model → parsing → user result
- Assign priority tiers: instructions (highest) → task-specific context → general context → examples
Phase 2: Content and Assembly
Step 4: Build content pipeline. (Chapter 5)
- Static content: write clear instructions, add few-shot examples for format demonstration
- Dynamic content: implement RAG (start with lexical retrieval for speed; add neural for semantic coverage)
- Filter aggressively for precision — irrelevant content actively harms via Chekhov's Gun
Step 5: Implement prompt assembly. (Chapter 6)
- Structure: introduction → context → refocus → transition
- Create elastic snippets (multiple length versions) for key content
- Implement a greedy assembly algorithm respecting token budget and priority ordering
- Use inception to control the completion's opening
Step 6: Tune output handling. (Chapter 7)
- Add stop sequences to halt generation at the answer boundary
- Use logprobs for quality filtering (suppress low-confidence completions)
- Parse out fluff; preserve reasoning preambles
Phase 3: Complexity (Only When Needed)
Step 7: Add reasoning. (Chapter 8)
- For complex tasks, add chain-of-thought ("Think step by step before answering")
- Place reasoning instruction so it produces tokens before the answer
- Consider ReAct for multi-step tasks requiring external data
Step 8: Add tools. (Chapter 8)
- Define minimal tool interfaces: few tools, few arguments, meaningful names
- Intercept dangerous operations in the application layer — never rely on prompt instructions for safety
- Remove arguments with known values to prevent hallucination
Step 9: Consider workflows. (Chapter 9)
- If agent reliability is insufficient, decompose into a structured workflow
- Each subtask should be simple enough for a single LLM pass
- Build a deterministic supervisor for routing, error recovery, and state management
Phase 4: Optimization
Step 10: Evaluate and iterate. (Chapter 10)
- Run SOMA evaluations comparing each change to the previous baseline
- Use acceptance metrics as primary optimization target; latency and error rates as guardrails
- Consider fine-tuning (LoRA) once prompting is optimized and you have sufficient examples
Counterpoints
Don't skip to Phase 3. The five levels of sophistication (Chapter 1) predict that attempting agency without mastering content and assembly leads to compounding failures. Each level must be solid before adding the next.
Don't over-optimize early. Prototype with slightly larger models than you can afford — optimize model size and cost after the system works. (Chapter 7)
Key Quotes
"The very first bit of code we wrote was the evaluation, and it's only thanks to this that we were able to move so fast and so successfully with the rest." — Berryman & Ziegler, Chapter 10
"Fine-tuning is a continuation of prompt engineering by other means." — Berryman & Ziegler, Chapter 7
Rules of Thumb
- Evaluation → format → content → assembly → reasoning → tools → workflows — in that order
- Never skip evaluation; never add complexity before measuring what you have
- Start with the simplest architecture that works; add layers only when measurement shows the need
- Prototype with better models, ship with cheaper ones
Related References
- Evaluating LLM Applications - Phase 1 in detail
- Designing LLM Applications - The feedforward pass pipeline
- Collected Heuristics and Rules of Thumb - Quick-reference heuristics for each phase