ai
Designing Large Language Model Applications
Suhas Pai 2025 12 references
Suhas Pai's holistic framework for building production-grade LLM applications — covering the full stack from pre-training data through tokenization, architecture, evaluation, RAG, agents, and multi-model system design.
llm production-engineering rag agents system-design machine-learning
Overview
The Core Framework
- The central problem is the prototype-to-production gap: demos are trivially easy; production systems require holistic understanding of every LLM ingredient
- Data quality > model selection > prompt engineering — in order of impact
- LLM limitations (hallucination, reasoning failures, bias) are design constraints, not bugs — engineer around them
- Production deployment is a systems engineering problem: routers, cascades, guardrails, verifiers, orchestration
- Default to explicit interaction paradigms over autonomous agents for anything mission-critical
Quick Lookup
| Situation | Do This | Avoid This |
|---|---|---|
| Starting a new LLM project | Clean data, check tokenization, build internal benchmarks | Chasing leaderboard rankings or picking the biggest model |
| Model behaving oddly on specific inputs | Check tokenization of those inputs first | Debugging at prompt or architecture level only |
| Need structured output | Use constrained decoding (Jsonformer, LMQL) | Hoping the model complies via prompting alone |
| Building a knowledge-grounded app | Implement RAG with hybrid search (BM25 + embeddings) | Relying on parametric memory or dumping everything into long context |
| Need high reliability | Generate n > 1 completions, use self-consistency voting | Single generation with no verification |
| Optimizing cost at scale | Use LLM cascades (smallest model first, escalate on low confidence) | Defaulting to the largest model for all requests |
| Building an agent | Use explicit (pre-programmed) tool orchestration | Autonomous agents for mission-critical applications |
The Key Insight
"Plenty of software frameworks have emerged that enable rapid prototype development of LLM applications. However, advancing from prototypes to production-grade applications is a road much less traveled, and is still a very challenging task." — Suhas Pai, Preface
References
No references match your search.