Memory
An agent has two kinds of memory.
One is the context window: fast, present, and lost whenever the window is compacted or the session ends. The other is durable, and it is what lets an agent remember anything at all beyond right now.
The context window is working memory, the agent's mind for a single turn. But a window is volatile and finite. Anything an agent needs to keep, across the window's limits or across whole sessions, has to live somewhere more permanent.
The split
Working memory and long-term memory
Working memory is the context window. It is rebuilt every turn and holds only what this step needs. It is fast and it is volatile: when the turn ends, or the window is compacted, whatever was in it is gone unless it was saved.
Long-term memory is durable. It outlives the window and the session: notes written to a file, a running scratchpad, records in a store, documents in a knowledge base. The agent does not hold it in mind; it goes and gets it when it needs it.
working memory
the context window
volatile, rebuilt every turn, holds only what this step needs
long-term memory
files · notes · retrieval
durable, outlives the window and the session, recalled on demand
The forms
Long-term memory comes in a few shapes
Long-term memory is not one thing. In practice it shows up in three forms, and most capable agents use more than one.
The judgment underneath all three is the same: deciding what is worth keeping. Writing down everything is its own kind of clutter, so the skill is saving what will matter later and letting the rest go.
| Form | What it is | Good for |
|---|---|---|
| Scratchpad / files | Notes the agent writes to disk and reads back: a plan, a running log, a findings file. | Carrying state across a long task without bloating the window. |
| Retrieval (RAG) | A large body of documents made searchable ahead of time; the agent pulls back the few passages most relevant to the task at hand. | Drawing on far more knowledge than could ever fit in a window. |
| Structured store | Explicit records the agent writes and queries: facts, preferences, entities, keyed for lookup. | Remembering specific, durable facts across whole sessions. |
The pattern
Write it down, recall on demand
The move is the same one people use. You do not keep everything in your head; you write down what matters and look it back up later. An agent offloads detail out of the window and pulls it back when it becomes relevant again.
This is what keeps a long-running agent both lean and capable. The window stays small and focused, while nothing important is truly lost. Retrieval is the bridge: given the task at hand, find the right past notes and bring just those back into the window. And no new machinery is needed for most of it: the agent writes and recalls its notes with the same write and read tools it already has.
write now, recall later
turn 4
write { path: "plan.md", content: "Refactor auth, then add SSO" }
→ wrote plan.md
. . . 30 turns later, the window has been compacted . . .
turn 34
read { path: "plan.md" }
→ "Refactor auth, then add SSO"
Working memory is the constraint; long-term memory is the escape from it. A harness that can write things down and recall them on demand can work far past the size of any single window.
Next: steering Working memory: the context window Back to the overview