The Memory Problem
Agents forget. Not because models are bad at remembering — but because we haven't solved the retrieval problem.
There's a common misconception about agent memory: that it's primarily a model problem. That if we just had a larger context window, or a model trained with better long-term recall, the problem would be solved.
It isn't a model problem. It's a retrieval problem.
The hard part of agent memory isn't storing information — it's knowing what to retrieve, when, and in what form. A 10 million token context window doesn't help you if the relevant memory is buried 8 million tokens back and the model attends to the wrong thing.
The approaches I've found most reliable in production:
- Hierarchical summarization — compress episodic memories into semantic summaries at regular intervals. Retrieve summaries first, raw episodes only when needed.
- Memory tagging — every memory record gets typed metadata at write time. Retrieval becomes a structured query, not a semantic search over an undifferentiated blob.
- Recency weighting — the most recent context almost always matters most. Build your retrieval to reflect this explicitly rather than relying on vector similarity alone.
We're early. The field hasn't converged on a memory architecture the way it has on, say, the transformer architecture. But the teams that treat memory as a first-class engineering problem — not a prompt engineering afterthought — are pulling ahead.