Anatomy of a harness
The five primitives every harness is built from.
Strip a harness down to the metal and you find a loop wrapped around five moving parts. None is complicated on its own. The engineering, and the difference between a toy and a production agent, is in how you handle each one.
The loop is the spine. The primitives are the organs hanging off it, and one turn touches all five in order: context assembly gathers the prompt, history, and tool schemas into one block; tool calling is how the model's reply becomes an action; the loop runs what it asked for and goes around again; the execution environment is where those actions actually land; and context management prunes the growing history so the next turn still fits. Five parts, one pass. Pick any one to take it apart.
Primitive 01
Context assembly
The model is stateless. It remembers nothing between calls. So on every single turn, the harness rebuilds the entire text the model sees, from scratch. That assembled bundle is the model's whole world for that step.
Four things go into it: the system prompt (who the agent is and the rules it follows), the conversation history (what has happened so far), the tool schemas (what it is allowed to do), and any retrieved data pulled in for the task.
This is where most agent quality lives. Too little and the model is blind; too much and it drowns, loses the thread, and costs more. The hard part is relevance: deciding, every turn, what deserves the model's limited attention.
- system prompt who the agent is, the rules
- conversation history everything so far
- tool schemas what it may do
- retrieved data files, search, docs
the assembled context
the model's entire world for this turn
call the model
Primitive 02
Tool calling
A model can only emit text. Tool calling is the protocol that turns some of that text into action. The harness advertises each tool as a schema: a name, a description, and typed parameters. Those schemas go into the context, so the model knows a tool only by how you describe it.
When the model wants to act, it does not reply with prose. It emits a structured call: which tool, and what arguments. The harness validates it, runs it, and hands back a structured result, which becomes part of the context on the next turn.
Because the model sees tools only through their schemas, tool design is prompt design. A vague description is a bug the model will trip over. The other half of the work is defensive: validating arguments and returning errors the model can actually recover from.
-
1 · advertise
The tool's schema goes into the context
read(path): returns the file text
-
2 · call
The model emits a structured call
read { path: "src/app.ts" }
-
3 · result
The harness runs it and returns
"export function app() { ... }"
Primitive 03
The loop
The loop is the control flow that turns one response into many steps. Assemble the context, call the model, and check the reply: if it asked to use tools (it can ask for several at once, alongside its own commentary), run them, append the results, and go around again. A reply with no tool calls means it is finished.
A single model call is one step of thinking. The loop is what lets an agent take dozens of steps toward a goal without a human in the middle: read a file, run a test, see the failure, edit, re-run.
The hard part is knowing when to stop, and making sure it does. Stop conditions, step and cost ceilings, and guards against a model that loops forever are what separate a robust loop from a runaway one.
-
Build a prompt
system + history + available tools
-
Call the model
send the context, get tokens back
-
Model replies
text, tool calls, or both at once
no tool calls: done asked for tools? continue ↓
-
Run the tools
capture the results, append them to history
Primitive 04
The execution environment
Tools have to run somewhere. The execution environment is that somewhere: the shell, the filesystem, the network, the APIs the tools reach, and the boundary drawn around them. It defines both what the agent can do and how much damage it can do.
This is the primitive with the highest stakes. A coding agent with real shell access can delete files or push code; an environment without limits is an agent without limits. Isolation, permissioning, and careful secret handling are what keep capability from becoming catastrophe.
| Concern | What it means |
|---|---|
| Isolation | Run tools in a sandbox or container so a mistake cannot reach the host. |
| Permissions | Gate consequential actions (writes, shell, network) behind a policy or human approval. |
| Secrets | Keep credentials out of the model's context; inject them only at the point of execution. |
| Reproducibility | A known, consistent environment so the same action behaves the same way every run. |
Primitive 05
Context management
The context window is finite. Agent sessions are not. Every turn appends more, the history only grows, and eventually it will not fit. Context management is how the harness keeps the conversation inside the window without losing the plot.
There are three moves. Truncate: drop the oldest turns and hope they did not matter. Summarize: compact a long history into a shorter recap that preserves the gist. Offload: push detail out to files or external memory and pull it back only when it is needed.
The hard part is choosing what to compress without throwing away the one detail that mattered. Done well, it is invisible and the agent works for hours. Done badly, the agent quietly forgets the thing you told it twenty turns ago.
history grows
over the limit
compacted
fits again
That is the whole anatomy: a loop, and five primitives it drives on every pass. None of them is exotic. What separates a toy from a production harness is judgment, exercised turn after turn: what deserves the window, when the loop should stop, which actions need a human, what can safely be forgotten.
Next: the coding toolset The context window it manages Back to the overview