Library
Designing Large Language Model Applications · 1 of 12
Designing Large Language Model Applications
ai CRITICAL

LLM Agents, Tools, and Interaction Paradigms

agents tools interaction-paradigms verification guardrails

Key Principle

Three interaction paradigms define the spectrum of LLM agency: passive (LLM receives context unknowingly), explicit (pre-programmed tool use), and autonomous (LLM decomposes tasks and selects tools). The more responsibility delegated to the LLM, the more failure modes compound. Production systems should default to explicit orchestration unless there is a compelling reason otherwise.

Why This Matters

The autonomous paradigm is general enough to capture any use case — but also the riskiest. The 99% Problem makes this concrete: even 99% accuracy means 1-in-100 unpredictable, potentially catastrophic failures. This is the prototype-to-production gap in its sharpest form. The last 1% requires fundamentally different engineering (human-in-the-loop, clever product design) rather than incremental model improvement.

This is the same barrier that delayed self-driving deployment — demo-impressive and production-reliable are separated by fundamentally different failure characteristics.

Good Examples

  • Passive paradigm: LLM receives retrieved context for QA/chatbots. Lowest risk. The LLM does not know the source or control retrieval. Used in standard RAG applications (Chapter 10).
  • Explicit paradigm: LLM follows pre-programmed tool-invocation rules. The interaction sequence is predetermined; the LLM exercises no agency. Recommended for applications with reliability requirements (Chapter 10).
  • Verification decomposition: Monolithic quality assessment is intractable, but decomposing into individual criteria (factuality, specificity, relevance, completeness, repetitiveness, coherence) makes each check tractable with inexpensive techniques (Chapter 10).

Counterpoints

  • Autonomous paradigm: "This paradigm is general enough to capture just about any use case. It is also a risky paradigm, as we are assigning the LLM too much responsibility and agency. At this juncture, I would not recommend using this paradigm for any mission-critical applications" (Chapter 10).
  • ReAct is brittle: The popular Thought-Action-Observation loop pattern is popular but brittle; simple agent loop prompts often suffice. Reflection-based methods may cause over-correction if invoked too frequently (Chapter 10).
  • Verification paradox: "Do not expect your verification process to be strictly better than your summary model. If that was the case, you could have used the verification process to generate the summary!" (Chapter 10). Adding more verifiers increases latency multiplicatively.
  • Code execution risk: LLM-generated code executed in response to user prompts is a prompt injection vector (Chapter 10).

Key Quotes

"At this juncture, I would not recommend using this paradigm for any mission-critical applications." — Suhas Pai, Chapter 10

"Do not expect your verification process to be strictly better than your summary model. If that was the case, you could have used the verification process to generate the summary!" — Suhas Pai, Chapter 10

"The keep it simple, stupid (KISS) principle applies to agents perhaps more than any other recent paradigm." — Suhas Pai, Chapter 10

Rules of Thumb

  • Default to the explicit paradigm; use autonomous only for non-critical, human-supervised tasks
  • Decompose verification into individual criteria rather than attempting holistic assessment
  • Keep agent architectures simple — complexity increases failure surface area
  • Every verifier added must justify its latency cost with measurable accuracy improvement
  • Treat LLM-generated code as untrusted input — sandbox execution environments

Related References