LLM Workflows vs. Conversational Agents - Prompt Engineering for LLMs

Key Principle

Current LLMs cannot be both maximally general and maximally capable. Conversational agents handle anything but accomplish nothing complex reliably. Workflows trade generality for strength by decomposing complex tasks into small, well-defined subtasks coordinated by a supervisor process. This is the practical ceiling of the text-completion paradigm: a single completion pass has finite reasoning capacity, so workflows chain multiple passes to accomplish what no single prompt could.

Why This Matters

Engineers attempt to use conversational agents for complex tasks, producing naive results — form-letter quality emails with literal [your_name] placeholders, terse descriptions, and superficial analysis. The failure is architectural, not a matter of better prompting. System messages are "basically a strong suggestion and nothing more" — they provide no deterministic control. Scaling an agent to handle complex workflows requires building external infrastructure anyway, at which point you have effectively built a workflow.

Good Examples

The generality-strength spectrum. Pure chat is maximally general but weak. Adding domain-specific system messages, tools, and structure increases strength but narrows scope. Workflows push furthest toward strength: each subtask is simple enough for high-fidelity execution, but the workflow as a whole accomplishes what no single prompt could. (Chapter 9)

Workflow design. A supervisor process coordinates specialized subtasks. Each subtask gets a narrowly scoped prompt, specific tools, and clear success criteria. The supervisor handles routing, error recovery, and state management — things the model cannot reliably do autonomously. (Chapter 9)

The five levels of sophistication resolve here. The taxonomy from Chapter 1 (thin wrapper → augmented input → stateful → tool usage → agency) predicts the generality-strength trade-off. Engineers who attempt agency (level 5) without mastering context management (levels 2-3) face compounding failures. (Chapters 1, 9)

Counterpoints

Agents are general but weak. "A workflow will not handle arbitrary user requests. Instead, it is designed for a specific task, and it will therefore be more capable of completing that task than a conversational agent would be." (Chapter 9) The generality of agents is a feature only when the task is genuinely open-ended.

When agents do work. For constrained goals with clear success criteria, conversational agents with tools can be effective. The problem is scaling to complex, multi-step tasks requiring reliable execution. (Chapter 9)

The automation paradox. Building a workflow that handles edge cases reliably requires as much engineering effort as building the agent infrastructure — but the workflow is deterministic where it matters. (Chapter 9)

Key Quotes

"A workflow will not handle arbitrary user requests. Instead, it is designed for a specific task, and it will therefore be more capable of completing that task than a conversational agent would be." — Berryman & Ziegler, Chapter 9

Rules of Thumb

Use conversational agents for open-ended, exploratory tasks; use workflows for complex tasks requiring reliable execution
If you find yourself adding more and more rules to a system message, you're building a workflow — make it explicit
Each workflow subtask should be simple enough that a single LLM pass can handle it reliably
Keep the supervisor deterministic — it routes, coordinates, and handles errors; the LLM handles language tasks
Start with the simplest architecture that works; add workflow complexity only when agent reliability fails

Related References

Reasoning Techniques and Tool Usage - The agent architecture that workflows extend
LLMs as Text Completion Engines - Single-pass completion limits explain why workflows are necessary
Evaluating LLM Applications - Evaluation-first development applies to each workflow subtask