Library
Designing Large Language Model Applications
ai

Designing Large Language Model Applications

Suhas Pai 2025 12 references

Suhas Pai's holistic framework for building production-grade LLM applications — covering the full stack from pre-training data through tokenization, architecture, evaluation, RAG, agents, and multi-model system design.

llm production-engineering rag agents system-design machine-learning

Overview

The Core Framework

  • The central problem is the prototype-to-production gap: demos are trivially easy; production systems require holistic understanding of every LLM ingredient
  • Data quality > model selection > prompt engineering — in order of impact
  • LLM limitations (hallucination, reasoning failures, bias) are design constraints, not bugs — engineer around them
  • Production deployment is a systems engineering problem: routers, cascades, guardrails, verifiers, orchestration
  • Default to explicit interaction paradigms over autonomous agents for anything mission-critical

Quick Lookup

Situation Do This Avoid This
Starting a new LLM project Clean data, check tokenization, build internal benchmarks Chasing leaderboard rankings or picking the biggest model
Model behaving oddly on specific inputs Check tokenization of those inputs first Debugging at prompt or architecture level only
Need structured output Use constrained decoding (Jsonformer, LMQL) Hoping the model complies via prompting alone
Building a knowledge-grounded app Implement RAG with hybrid search (BM25 + embeddings) Relying on parametric memory or dumping everything into long context
Need high reliability Generate n > 1 completions, use self-consistency voting Single generation with no verification
Optimizing cost at scale Use LLM cascades (smallest model first, escalate on low confidence) Defaulting to the largest model for all requests
Building an agent Use explicit (pre-programmed) tool orchestration Autonomous agents for mission-critical applications

The Key Insight

"Plenty of software frameworks have emerged that enable rapid prototype development of LLM applications. However, advancing from prototypes to production-grade applications is a road much less traveled, and is still a very challenging task." — Suhas Pai, Preface

References