Production agents fail quietly when context grows without a plan: irrelevant history drowns the instructions that matter, token bills climb, and latency makes reviewers abandon the tool. The fix is not “smaller model”—it is explicit budgets and structured state.
Carry state in fields, not vibes
We keep durable facts in the workflow record: extracted entities, eligibility flags, and links to source documents. The model sees a bounded packet assembled for the current step—not the entire email thread since 2019.
Summarize only when the schema is stable
Rolling summaries help, but they need validation rules: what must never be dropped, what format downstream tools expect, and when to refuse rather than compress away ambiguity.
Step boundaries are free compression
- Split classify → draft → verify into separate calls with tight inputs
- Pass references (IDs, URLs) instead of pasting whole documents twice
- Log token counts per step so finance and engineering argue with the same numbers
When context is budgeted, agents behave more predictably—and your roadmap shifts from prompt hacks to product design: what belongs in the database, what belongs in the prompt, and what belongs in a human sentence.

