AI Software Development
AI Engineering: Building Applications with Foundation Models
Chip Huyen 2025 13 references
Apply Chip Huyen's AI Engineering frameworks when building, evaluating, or optimizing AI applications on foundation models — covering the three-axis model, evaluation-driven development, RAG, finetuning, inference optimization, and production architecture.
foundation-models llm evaluation rag finetuning production-ai prompt-engineering
Overview
The Core Framework
- Three-Axis Quality Model: Response quality = f(instructions, context, model). Optimize in that order — each step is ~10× more expensive than the previous.
- Evaluation first: Define evaluation criteria before writing any application code. Evaluation guidelines become finetuning annotation guidelines — early investment is doubly leveraged.
- Failure-type routing: Information failures (wrong facts) → RAG. Behavior failures (wrong form/style/format) → finetuning. Misrouting wastes both cost and quality.
- Data is the moat: In a world of converging model architectures, proprietary user feedback data — not model quality — is the primary long-term competitive differentiator.
- Goodput over throughput: Optimize for requests/second satisfying SLOs, not raw GPU utilization.
Quick Lookup
| Situation | Do This | Avoid This |
|---|---|---|
| Output quality is poor | Exhaust prompt engineering first | Jump to finetuning |
| Model gives wrong facts | Add RAG (information failure) | Finetune to add knowledge |
| Model uses wrong style/format | Finetune (behavior failure) | Add more RAG chunks |
| Evaluating model quality | Use functional correctness | Use BLEU/ROUGE on generative tasks |
| Comparing base vs. aligned models | Use task-specific metrics | Compare perplexity (breaks on aligned models) |
| Selecting a model | Filter hard attributes first | Evaluate soft attributes before filtering |
| Building agent systems | Scope autonomy to measured reliability | Grant write-actions before measuring accuracy |
| Optimizing inference | Define SLOs, then optimize goodput | Maximize GPU utilization |
| Synthetic training data | Mix with real data as a floor | Train recursively on synthetic only |
The Key Insight
"The three-axis model tells you not just what to do, but in what order to do it — and why skipping axes is expensive." — Chip Huyen, Preface
References
No references match your search.