Library
The Christensen Engine Has No Twin — But It Has Relatives in Surprising Places · 9 of 12

Key Principle

Twenty LLM pain points ranked by likelihood and severity, drawn from collective experience across 16 products. These are not theoretical risks — they are failure modes that every team encountered. Instruction amnesia, silent extraction failures, and LLM state-skipping are near-certain. Treat them as engineering problems with known solutions, not unsolvable risks.

Why This Matters

The pain points checklist functions as a pre-flight inspection for state-machine-over-LLM systems. Every item has been independently confirmed by multiple teams. Addressing them during initial architecture (not retrofitting) is significantly cheaper because they require structural state machine changes, not parameter tuning.

Good Examples

Near-Certain (Address First):

  1. Instruction Amnesia (#1): LLMs randomly ignore parts of long prompts. Confirmed across every team studied. Fix: one prompt per state (not monolithic), use the fixed command grammar to limit what the LLM can do. (p. 3, chunk 005)

  2. Silent Extraction Failures (#2): Structured data fails to extract from user input without anyone noticing. Fix: validate extraction output against expected schema before proceeding. Build extraction quality checks into guard conditions. (p. 3, chunk 005)

  3. LLM State-Skipping (#3): LLMs jump ahead through carefully designed assessment states to "demonstrate how smart and helpful they are." Fix: the LLM should never see the full state graph — show only current state instructions and available transitions. (p. 3, chunk 005)

High Likelihood:

  1. Token Overflow (#4): Context grows beyond window capacity. Fix: token budget allocation formula (15/25/20/15/10/15). Never append entire conversation history. (p. 4, chunk 005)

  2. Guard Condition Misfires (#5): Guards depending on LLM classification will misfire. Fix: retry logic, confirmation loops ("is that true?"), and fallback transitions for every guard. (pp. 3-4, chunk 005)

  3. Regression from Rule Updates (#6): Knowledge base updates introduce accuracy regressions elsewhere. Fix: separate creation from validation teams. Run regression test suites on every update. (p. 4, chunk 005)

  4. Kill Signal Detection Gaps (#7): Crisis or safety signals missed. Fix: run as pre-processing middleware before any other processing. Two tiers: fast-path regex + NLP classification. (p. 4, chunk 005)

Operational:

  1. Backward Loop Explosion (#8): Infinite cycles when users fail quality thresholds. Fix: max-attempts guard on every backward transition. Force-advance to fallback after limit. (p. 4, chunk 005)

  2. Dynamic Prompt Debugging (#10): Impossible to debug without logging. Fix: log {timestamp, inputState, assembledPrompt, llmResponse, guardsEvaluated, outputState, tokensUsed} for every LLM call. (p. 4, chunk 005)

  3. JSONB Concurrent Updates (#11): State corruption from race conditions. Fix: append-only arrays, explicit merge strategies per field type. (p. 4, chunk 005)

  4. Assessment Gaming (#17): Users submit minimal-effort responses. Fix: effort-detection in guard conditions; switch to diagnostic questions after repeated low-effort input. (p. 5, chunk 005)

  5. Knowledge Elicitation (#20): "Getting domain rules out of experts' heads and into a machine-readable format is the hardest part." Fix: use LLMs to draft, domain experts to validate. (p. 5, chunk 005)

Counterpoints

  • Not all pain points are equally severe. Items #1-3 are near-certain and must be addressed in initial architecture. Items #16-20 are lower frequency and can be addressed iteratively.
  • Some pain points interact: instruction amnesia (#1) amplifies guard misfires (#5); context rot (#4) amplifies instruction amnesia (#1). Address them as a system, not individually.

Key Quotes

"eager to demonstrate how smart and helpful they are" (p. 3, chunk 005) — on LLM state-skipping

"getting domain rules out of experts' heads and into a machine-readable format is the hardest part." (p. 5, chunk 005)

"You can't just test a prompt once." (p. 5, chunk 005) — Khanmigo on non-deterministic output

Rules of Thumb

  • Address #1-3 (amnesia, silent failures, state-skipping) in initial architecture — retrofitting is expensive.
  • Log every assembled prompt — debugging without it is guesswork.
  • Every guard condition needs a fallback transition.
  • Every backward loop needs a max-attempts counter.
  • Test with full conversation transcripts, not individual prompts.
  • Run regression suites on every knowledge base update.

Related References