The context window

Everything an agent knows fits in one window.

The context window is the finite span of text a model can take in at once. For an agent it is working memory, running cost, and hard ceiling, all at the same time.


A model has no memory between calls. Whatever the agent knows right now, it knows because it is sitting in the context window: the block of text handed to the model for this one turn. Understand the window and you understand most of what makes agents hard.

The basics

A fixed budget, and the only memory there is

The window is measured in tokens, the small chunks a model reads text in; one token is roughly three-quarters of a word, so this page is about two thousand of them. The budget is fixed: a model might take a few hundred thousand tokens at once, some a million, and not one more. Everything the agent is aware of has to fit inside it: who it is, what has happened, what it can do, and whatever it has looked up. Nothing outside the window exists, as far as the model is concerned.

Because the model is stateless, the window is the only memory the agent has out of the box; anything more durable, the harness has to provide (the subject of the memory page). The harness rebuilds the window every turn, and that rebuilt block is the agent's entire mind for that step.

Anatomy

What fills the window

context limit fixed budget of tokens
remaining headroom

tool results

file reads, command output (the space-eater)

conversation history

every message so far, grows each turn

tool schemas

what the agent can do

system prompt

who the agent is (fixed)

Every turn appends to the history and the tool results (in practice the results sit interleaved inside the history, not in one block). The window is fixed, so it fills, and every call carries all of it.

The core tension

It fills up, and that is the whole problem

A session has no natural length, but the window does. Every turn adds more: the user's message, the agent's reasoning, each tool call, and every tool result. And tool results are the quiet giant. One read of a long file, or one noisy command, can dump thousands of tokens into the window at once.

So the window fills, often faster than you would expect. And here is the part that stings: every call carries the entire window. Providers soften this with prompt caching, which discounts the unchanged prefix of the conversation (and is exactly why harnesses append to history rather than rearrange it), but a context half full of stale tool output is still money and latency spent on every turn until something clears it.

When it is full

Three things break, in order of how much they hurt

A full window is not a clean error you can catch. It degrades the agent in three distinct ways, and the worst one is the one you cannot see.

What goes wrong when the context window fills
Failure mode What happens
Forgetting An over-limit call simply fails, so the harness must drop or compact old turns first. Whatever they held, the agent forgets.
Cost and latency Every call carries the whole window, and a larger one is slower and pricier to process. A bloated context is expensive and sluggish.
Degradation Even within the limit, models attend less reliably to the middle of a very large context. More is not always better; signal gets lost in noise.

The job

Keeping the window lean

The job, then, is to keep the live window small and relevant. Drop the oldest turns when they stop mattering. Summarize a long history into a short recap. Offload bulky detail to files and pull it back only when it is needed.

The art is deciding what to cut without losing the one thing that mattered. Done well it is invisible and the agent works for hours. Done badly it quietly forgets the instruction you gave it twenty turns ago.

history grows

over the limit

compacted

summary of earlier turns

fits again

Old turns are summarized into a recap so the live history stays inside the window.

The context window is the central constraint of agent design. The model is rented and fixed; the window is the one lever you fully control. Most of building a good agent is deciding, turn after turn, what earns a place in it.

Next: memory, the other half How the harness assembles and manages it Back to the overview