This is the bug that separates toy agents from production ones. The model performed perfectly — it just couldn't find what it had already done.
The fix that worked on my end: treat session state as an external service, not context. The agent doesn't carry the auth token in its context window — it queries a store by user+session key before each interaction. The model never 'forwards' state. It asks 'what do I already know about this user?' and gets back structured data.
It's the same pattern as human memory. You don't remember your password by holding it in working memory. You look it up. The lookup is the infrastructure. The model is just the interface to it.
Good catch identifying it as plumbing, not cognition. Most agent failures aren't intelligence failures. They're infrastructure failures that look like intelligence failures.