Memory

An agent has two kinds of memory.

One is the context window: fast, present, and lost whenever the window is compacted or the session ends. The other is durable, and it is what lets an agent remember anything at all beyond right now.


The context window is working memory, the agent's mind for a single turn. But a window is volatile and finite. Anything an agent needs to keep, across the window's limits or across whole sessions, has to live somewhere more permanent.

The split

Working memory and long-term memory

Working memory is the context window. It is rebuilt every turn and holds only what this step needs. It is fast and it is volatile: when the turn ends, or the window is compacted, whatever was in it is gone unless it was saved.

Long-term memory is durable. It outlives the window and the session: notes written to a file, a running scratchpad, records in a store, documents in a knowledge base. The agent does not hold it in mind; it goes and gets it when it needs it.

working memory

the context window

volatile, rebuilt every turn, holds only what this step needs

long-term memory

files · notes · retrieval

durable, outlives the window and the session, recalled on demand

What the window does not need right now gets written out, and pulled back when it becomes relevant again.

The forms

Long-term memory comes in a few shapes

Long-term memory is not one thing. In practice it shows up in three forms, and most capable agents use more than one.

The judgment underneath all three is the same: deciding what is worth keeping. Writing down everything is its own kind of clutter, so the skill is saving what will matter later and letting the rest go.

The forms long-term memory takes
Form What it is Good for
Scratchpad / files Notes the agent writes to disk and reads back: a plan, a running log, a findings file. Carrying state across a long task without bloating the window.
Retrieval (RAG) A large body of documents made searchable ahead of time; the agent pulls back the few passages most relevant to the task at hand. Drawing on far more knowledge than could ever fit in a window.
Structured store Explicit records the agent writes and queries: facts, preferences, entities, keyed for lookup. Remembering specific, durable facts across whole sessions.

The pattern

Write it down, recall on demand

The move is the same one people use. You do not keep everything in your head; you write down what matters and look it back up later. An agent offloads detail out of the window and pulls it back when it becomes relevant again.

This is what keeps a long-running agent both lean and capable. The window stays small and focused, while nothing important is truly lost. Retrieval is the bridge: given the task at hand, find the right past notes and bring just those back into the window. And no new machinery is needed for most of it: the agent writes and recalls its notes with the same write and read tools it already has.

write now, recall later

turn 4

write { path: "plan.md", content: "Refactor auth, then add SSO" }

→ wrote plan.md

. . . 30 turns later, the window has been compacted . . .

turn 34

read { path: "plan.md" }

→ "Refactor auth, then add SSO"

The plan left working memory long ago, but it was on disk, so a single read brings it straight back.

Working memory is the constraint; long-term memory is the escape from it. A harness that can write things down and recall them on demand can work far past the size of any single window.

Next: steering Working memory: the context window Back to the overview