Key Principle
Context rot — degradation of LLM reasoning quality when overloaded with context — is a structural problem requiring systematic token management. The solution has three parts: explicit token budget allocation, structured context persistence (not token history), and dynamic prompt construction that loads only state-relevant information per turn.
Why This Matters
Context grows monotonically with conversation length. Without management, the LLM's reasoning about early-session data degrades by mid-session. The system appears to "forget" or contradict itself — not because the LLM is unreliable, but because the context window is saturated. In assessment systems with 30-45 minute sessions, context rot is inevitable without architectural mitigation.
Good Examples
Token Budget Allocation Formula (Origin Financial): 15% system prompt, 25% state context, 20% relevant history, 15% tool definitions, 10% user query, 15% output buffer. Maintain state externally in a JSONB store. On each LLM call, compile minimal working context: currentState + relevantSlice(userProfile) + relevantSlice(sessionHistory) + relevantTools → LLM call. Never append entire conversation history. (p. 4, chunk 003)
Glossary Preview + Dynamic Expansion (Origin Financial): The LLM sees a compact index of all 200+ tools but only relevant tools get fully expanded into the prompt per query. Prevents tool definitions from consuming the context window. (p. 4, chunk 003)
Structured Context Persistence (Origin Financial): Store user decisions as typed fields in state objects, not raw conversation turns. When a user reclassifies an expense category, that semantic adjustment persists as structured data and is reconstructed into minimal context on future interactions. "Memory isn't just token history; it's structured context persisted across sessions." (p. 4, chunk 003)
Dynamic Prompt Construction (Salesforce, XState): System prompts and available tools change based on machine state. Each state gets a focused, minimal prompt and only relevant tools. Pre-verification state → only identity tools. Post-verification → transaction tools become available. Fresh prompt assembled per turn. (pp. 4-5, chunk 004)
Prompt Chaining Over Monolithic Prompts (Khanmigo): Using one prompt per section rather than a single monolithic prompt prevents instruction amnesia and improves rubric scores. (p. 2, chunk 005)
Counterpoints
Context Banks (Buildpad): Persistent store accumulating project knowledge across sessions is valuable, but without the token budget discipline, the accumulated context eventually triggers the same rot. Context banks need retrieval-based loading, not full injection. (p. 1, chunk 005)
JSONB Concurrent Update Conflicts: When concurrent updates hit the same JSONB state object, deep merge conflicts arise. Fix: append-only semantics for arrays, explicit merge strategies per field type. (p. 4, chunk 005)
Microsoft Research StateFlow: The SF_Agent variant uses separate LLM agents per state, reducing context accumulation. 13-28% higher success rates than ReAct, 3-5x lower cost — partly because focused prompts waste fewer tokens. (p. 5, chunk 004)
Key Quotes
"memory isn't just token history; it's structured context persisted across sessions." (p. 4, chunk 003) — Origin Financial
Rules of Thumb
- Budget tokens explicitly: 15/25/20/15/10/15 allocation formula.
- Never append entire conversation history to the prompt.
- Store state as typed fields, not raw conversation turns.
- Use glossary preview pattern: compact index with dynamic expansion per query.
- One prompt per state, not one monolithic prompt.
- Log every assembled prompt for debugging — include timestamp, state, response, and token count.
Related References
- Build Order Protocol and Implementation Guide - Build order and debugging practices
- Financial and Legal Domain Case Studies - Origin's implementation details
- Master Pain Points Checklist - Context rot and related failure modes