Key Principle
LLMs have no internal monologue — "no mental review of a problem statement, no consideration of how it maps to known facts, and no comparison of several competing ideas." Chain of thought, ReAct, and all reasoning techniques share one mechanism: allocate tokens for the model to "think" before answering. Tool calling is not a new paradigm — it is fine-tuned text completion plus API syntactic sugar. Both reasoning and tools are structured text completion.
Why This Matters
Without reasoning tokens, the model's first answer is an "intuitive guess" and any subsequent explanation is a "rationalization to justify that guess." The explanation literally cannot improve the answer because answer tokens are already committed — autoregressive generation has no backtracking. This is perhaps the book's most important causal chain: autoregressive generation means each token is conditioned only on prior tokens; without reasoning tokens before the answer, the model has zero compute budget for deliberation.
Good Examples
Chain of thought transforms accuracy. CoT increased GSM8K math solve rate from ~20% to 60% on PaLM 540B. Even meaningless "pause tokens" (trained via fine-tuning) improve accuracy by giving additional timesteps. The mechanism is architectural, not magical. (Chapter 8)
Tool calling is 5-6 classification decisions. A tool invocation is: (1) who speaks, (2) should a tool be called, (3) which tool, (4) which argument, (5) what value, (6) are we done. "In the span of 10 to 20 tokens, the same, generic underlying neural network has effectively implemented 5 different, highly specialized inference algorithms." (Chapter 8)
ReAct combines reasoning with action. Iterative Thought-Action-Observation loops. Critical finding: ReAct initially performed worse than standard prompting with in-prompt examples alone. But after fine-tuning with only 3,000 examples, fine-tuned 8B ReAct beat standard prompting on the 62B model. Reasoning patterns become powerful only when the model is conditioned to execute them reliably. (Chapter 8)
Counterpoints
Tool safety cannot rely on prompt instructions. "Models are inherently undependable, and with a strategy like this, we guarantee that a small portion of the time, the model will do exactly the thing you told it not to do." (Chapter 8) Intercept dangerous requests in the application layer; require explicit user authorization for destructive actions.
Tool design follows prompt principles. Limit tool count; tools should partition the domain. Use meaningful, self-documenting names. Keep arguments few and simple. Never include superfluous output fields — Chekhov's Gun applies: the model gets distracted by unnecessary fields. (Chapter 8)
Argument hallucination. When parameter values haven't been mentioned in conversation, the model assumes placeholders like "my-org." Mitigate by removing arguments with known values from the definition or providing defaults. (Chapter 8)
Key Quotes
"Language models have no internal monologue and therefore no way to think about something before blurting out an answer." — Berryman & Ziegler, Chapter 8
"In the span of 10 to 20 tokens, the same, generic underlying neural network has effectively implemented 5 different, highly specialized inference algorithms." — Berryman & Ziegler, Chapter 8
"Models are inherently undependable, and with a strategy like this, we guarantee that a small portion of the time, the model will do exactly the thing you told it not to do." — Berryman & Ziegler, Chapter 8
Rules of Thumb
- Always use chain-of-thought for complex reasoning — the model literally cannot think without it
- Place reasoning before the answer, never after — post-hoc explanations are rationalizations
- Design tools with minimal arguments, clear names, and no superfluous output fields
- Never rely on prompt instructions to prevent harmful tool execution — intercept in code
- Consider ReAct with fine-tuning for multi-step tool use — prompting alone may underperform
- Remove tool arguments with known values from definitions to prevent hallucination
Related References
- How LLMs Process Information - Autoregressive generation explains why reasoning tokens are necessary
- LLMs as Text Completion Engines - Tool calling is the strongest proof that everything is text completion
- LLM Workflows vs. Conversational Agents - When single-agent reasoning isn't enough, structured workflows take over