LLMs as Text Completion Engines - Prompt Engineering for LLMs

Key Principle

LLMs are document completion engines. Given the beginning of a document, they produce the statistically most likely continuation based on patterns from training data. This is not a simplification — it is the literal mechanism. Every prompt engineering technique is a consequence of this principle: few-shot examples work because they make the prompt resemble a training document containing the desired pattern; chain-of-thought works because it forces reasoning tokens before the answer token; RAG works because it places retrievable facts into the completion context so the model copies rather than hallucinates.

"At their core, LLMs are just text completion engines that mimic the text they see during their training." (Preface)

Why This Matters

Without this mental model, every technique looks like an arbitrary trick. Engineers treat the model as an oracle that "understands" requests, leading to prompts that fight the model's actual mechanism rather than exploit it. This produces hallucination problems, reliability failures, and debugging in the wrong direction (improving content quality when the real problem is content placement or format).

With this mental model, the techniques become logical consequences. The capability heuristic follows directly: "Could a human expert who knows all the relevant general knowledge by heart complete the prompt in a single go without backtracking, editing, or note-taking?" If no, the task exceeds what a single LLM pass can do — you need chain-of-thought, tool usage, or multi-step workflows. (Chapter 2)

Good Examples

Chat is still document completion. After RLHF training, the model completes ChatML transcripts instead of plain documents — but the mechanism is identical. Understanding this prevents mystifying chat behavior: the system message works because it appears early in the "document" and conditions all subsequent completions. (Chapter 3)

Tool calling is structured text completion. A tool invocation is a hierarchical sequence of 5-6 classification decisions via token prediction — who speaks, should a tool be called, which tool, which argument, what value. "In the span of 10 to 20 tokens, the same, generic underlying neural network has effectively implemented 5 different, highly specialized inference algorithms." (Chapter 8)

Evaluation is document completion too. The SOMA framework works because it transforms a vague evaluation task into a structured rubric format the model has seen during training — grading sheets, multi-criteria assessments. (Chapter 10)

Counterpoints

The model is not an oracle. It cannot verify facts, look things up, or express genuine doubt. "The model can't google or edit, so it just guesses. Nor will the raw LLM express any doubt." (Chapter 2) Treating it as an oracle produces hallucinations and misplaced confidence.

The model is not thinking. It has no internal monologue — "no mental review of a problem statement, no consideration of how it maps to known facts, and no comparison of several competing ideas." (Chapter 8) Without explicitly manufactured reasoning tokens, the first answer is an intuitive guess.

Scaling does not escape the mechanism. Chapter 11 distills the book into two lessons, with the first being this principle. Even with multimodal inputs, knowledge distillation, and artifacts, the underlying mechanism remains document completion.

Key Quotes

"Assume you have picked a document from the training set at random. All you know about it is, it starts with the prompt. What is the statistically most likely continuation? That's the LLM output you should expect." — Berryman & Ziegler, Chapter 2

"If we had to sum up the main lessons from this book, there would be two: (1) LLMs are nothing more than text completion engines that mimic the text they see during training. (2) You should empathize with the LLM and understand how it thinks." — Berryman & Ziegler, Chapter 11

Rules of Thumb

Before prompting, ask: "What document would naturally contain my desired output as its continuation?"
If a human expert couldn't complete the prompt in one pass without notes, the model can't either — add chain-of-thought or break into steps
Every prompt is a document opening — make it resemble real documents from training data
When debugging, ask "what document does the model think this is?" before asking "what's wrong with the output?"

Related References

How LLMs Process Information - The architectural constraints that make this mechanism work the way it does
Reasoning Techniques and Tool Usage - Chain of thought as a workaround for the completion mechanism's limitations
Designing LLM Applications - The Little Red Riding Hood principle follows directly from this framework