Implementation Playbook - Prompt Engineering for LLMs

Key Principle

Prompt engineering is software engineering. This playbook translates the book's techniques into a concrete implementation sequence: start with evaluation, build the feedforward pass, optimize assembly, add reasoning and tools as needed, and iterate with data. Each step builds on the previous one.

Why This Matters

The book's 11 chapters cover techniques in conceptual order, but implementation requires a different sequence. Teams that start with the model (fine-tuning, temperature tuning) before building evaluation and prompt infrastructure waste effort on changes they cannot measure. This playbook provides the correct order of operations.

Phase 1: Foundation (Before Any Prompting)

Step 1: Build evaluation first. (Chapter 10)

Create an example suite of 5-20 representative inputs with expected outputs
Set up automated comparison (gold standard if you have reference answers, functional tests if you have verifiable properties, SOMA if neither)
Record latency and token consumption from day one

Step 2: Choose document format. (Chapters 4, 6)

Select an archetype: advice conversation (dialogue), analytic report (exposition), or structured document (XML/JSON)
Reports give better scope control; structured formats give easier parsing; dialogues feel natural
Apply the Little Red Riding Hood principle: use formats the model has seen millions of times

Step 3: Design the feedforward pass. (Chapter 4)

Map the transformation: user problem → context retrieval → snippetizing → scoring → assembly → model → parsing → user result
Assign priority tiers: instructions (highest) → task-specific context → general context → examples

Phase 2: Content and Assembly

Step 4: Build content pipeline. (Chapter 5)

Static content: write clear instructions, add few-shot examples for format demonstration
Dynamic content: implement RAG (start with lexical retrieval for speed; add neural for semantic coverage)
Filter aggressively for precision — irrelevant content actively harms via Chekhov's Gun

Step 5: Implement prompt assembly. (Chapter 6)

Structure: introduction → context → refocus → transition
Create elastic snippets (multiple length versions) for key content
Implement a greedy assembly algorithm respecting token budget and priority ordering
Use inception to control the completion's opening

Step 6: Tune output handling. (Chapter 7)

Add stop sequences to halt generation at the answer boundary
Use logprobs for quality filtering (suppress low-confidence completions)
Parse out fluff; preserve reasoning preambles

Phase 3: Complexity (Only When Needed)

Step 7: Add reasoning. (Chapter 8)

For complex tasks, add chain-of-thought ("Think step by step before answering")
Place reasoning instruction so it produces tokens before the answer
Consider ReAct for multi-step tasks requiring external data

Step 8: Add tools. (Chapter 8)

Define minimal tool interfaces: few tools, few arguments, meaningful names
Intercept dangerous operations in the application layer — never rely on prompt instructions for safety
Remove arguments with known values to prevent hallucination

Step 9: Consider workflows. (Chapter 9)

If agent reliability is insufficient, decompose into a structured workflow
Each subtask should be simple enough for a single LLM pass
Build a deterministic supervisor for routing, error recovery, and state management

Phase 4: Optimization

Step 10: Evaluate and iterate. (Chapter 10)

Run SOMA evaluations comparing each change to the previous baseline
Use acceptance metrics as primary optimization target; latency and error rates as guardrails
Consider fine-tuning (LoRA) once prompting is optimized and you have sufficient examples

Counterpoints

Don't skip to Phase 3. The five levels of sophistication (Chapter 1) predict that attempting agency without mastering content and assembly leads to compounding failures. Each level must be solid before adding the next.

Don't over-optimize early. Prototype with slightly larger models than you can afford — optimize model size and cost after the system works. (Chapter 7)

Key Quotes

"The very first bit of code we wrote was the evaluation, and it's only thanks to this that we were able to move so fast and so successfully with the rest." — Berryman & Ziegler, Chapter 10

"Fine-tuning is a continuation of prompt engineering by other means." — Berryman & Ziegler, Chapter 7

Rules of Thumb

Evaluation → format → content → assembly → reasoning → tools → workflows — in that order
Never skip evaluation; never add complexity before measuring what you have
Start with the simplest architecture that works; add layers only when measurement shows the need
Prototype with better models, ship with cheaper ones

Related References

Evaluating LLM Applications - Phase 1 in detail
Designing LLM Applications - The feedforward pass pipeline
Collected Heuristics and Rules of Thumb - Quick-reference heuristics for each phase