Library
AI Engineering: Building Applications with Foundation Models
AI Software Development

AI Engineering: Building Applications with Foundation Models

Chip Huyen 2025 13 references

Apply Chip Huyen's AI Engineering frameworks when building, evaluating, or optimizing AI applications on foundation models — covering the three-axis model, evaluation-driven development, RAG, finetuning, inference optimization, and production architecture.

foundation-models llm evaluation rag finetuning production-ai prompt-engineering

Overview

The Core Framework

  • Three-Axis Quality Model: Response quality = f(instructions, context, model). Optimize in that order — each step is ~10× more expensive than the previous.
  • Evaluation first: Define evaluation criteria before writing any application code. Evaluation guidelines become finetuning annotation guidelines — early investment is doubly leveraged.
  • Failure-type routing: Information failures (wrong facts) → RAG. Behavior failures (wrong form/style/format) → finetuning. Misrouting wastes both cost and quality.
  • Data is the moat: In a world of converging model architectures, proprietary user feedback data — not model quality — is the primary long-term competitive differentiator.
  • Goodput over throughput: Optimize for requests/second satisfying SLOs, not raw GPU utilization.

Quick Lookup

Situation Do This Avoid This
Output quality is poor Exhaust prompt engineering first Jump to finetuning
Model gives wrong facts Add RAG (information failure) Finetune to add knowledge
Model uses wrong style/format Finetune (behavior failure) Add more RAG chunks
Evaluating model quality Use functional correctness Use BLEU/ROUGE on generative tasks
Comparing base vs. aligned models Use task-specific metrics Compare perplexity (breaks on aligned models)
Selecting a model Filter hard attributes first Evaluate soft attributes before filtering
Building agent systems Scope autonomy to measured reliability Grant write-actions before measuring accuracy
Optimizing inference Define SLOs, then optimize goodput Maximize GPU utilization
Synthetic training data Mix with real data as a floor Train recursively on synthetic only

The Key Insight

"The three-axis model tells you not just what to do, but in what order to do it — and why skipping axes is expensive." — Chip Huyen, Preface

References