Anatomy of a harness

The five primitives every harness is built from.

Strip a harness down to the metal and you find a loop wrapped around five moving parts. None is complicated on its own. The engineering, and the difference between a toy and a production agent, is in how you handle each one.


The loop is the spine. The primitives are the organs hanging off it, and one turn touches all five in order: context assembly gathers the prompt, history, and tool schemas into one block; tool calling is how the model's reply becomes an action; the loop runs what it asked for and goes around again; the execution environment is where those actions actually land; and context management prunes the growing history so the next turn still fits. Five parts, one pass. Pick any one to take it apart.

Primitive 01

Context assembly

The model is stateless. It remembers nothing between calls. So on every single turn, the harness rebuilds the entire text the model sees, from scratch. That assembled bundle is the model's whole world for that step.

Four things go into it: the system prompt (who the agent is and the rules it follows), the conversation history (what has happened so far), the tool schemas (what it is allowed to do), and any retrieved data pulled in for the task.

This is where most agent quality lives. Too little and the model is blind; too much and it drowns, loses the thread, and costs more. The hard part is relevance: deciding, every turn, what deserves the model's limited attention.

Rebuilt from scratch on every turn
  • system prompt who the agent is, the rules
  • conversation history everything so far
  • tool schemas what it may do
  • retrieved data files, search, docs

the assembled context

the model's entire world for this turn

call the model

Primitive 02

Tool calling

A model can only emit text. Tool calling is the protocol that turns some of that text into action. The harness advertises each tool as a schema: a name, a description, and typed parameters. Those schemas go into the context, so the model knows a tool only by how you describe it.

When the model wants to act, it does not reply with prose. It emits a structured call: which tool, and what arguments. The harness validates it, runs it, and hands back a structured result, which becomes part of the context on the next turn.

Because the model sees tools only through their schemas, tool design is prompt design. A vague description is a bug the model will trip over. The other half of the work is defensive: validating arguments and returning errors the model can actually recover from.

  1. 1 · advertise

    The tool's schema goes into the context

    read(path): returns the file text

  2. 2 · call

    The model emits a structured call

    read { path: "src/app.ts" }

  3. 3 · result

    The harness runs it and returns

    "export function app() { ... }"

The model sees the tool only through its schema. The call and the result are both structured, not prose.

Primitive 03

The loop

The loop is the control flow that turns one response into many steps. Assemble the context, call the model, and check the reply: if it asked to use tools (it can ask for several at once, alongside its own commentary), run them, append the results, and go around again. A reply with no tool calls means it is finished.

A single model call is one step of thinking. The loop is what lets an agent take dozens of steps toward a goal without a human in the middle: read a file, run a test, see the failure, edit, re-run.

The hard part is knowing when to stop, and making sure it does. Stop conditions, step and cost ceilings, and guards against a model that loops forever are what separate a robust loop from a runaway one.

  1. Build a prompt

    system + history + available tools

  2. Call the model

    send the context, get tokens back

  3. Model replies

    text, tool calls, or both at once

    no tool calls: done asked for tools? continue ↓

  4. Run the tools

    capture the results, append them to history

Primitive 04

The execution environment

Tools have to run somewhere. The execution environment is that somewhere: the shell, the filesystem, the network, the APIs the tools reach, and the boundary drawn around them. It defines both what the agent can do and how much damage it can do.

This is the primitive with the highest stakes. A coding agent with real shell access can delete files or push code; an environment without limits is an agent without limits. Isolation, permissioning, and careful secret handling are what keep capability from becoming catastrophe.

What the execution environment has to get right
Concern What it means
Isolation Run tools in a sandbox or container so a mistake cannot reach the host.
Permissions Gate consequential actions (writes, shell, network) behind a policy or human approval.
Secrets Keep credentials out of the model's context; inject them only at the point of execution.
Reproducibility A known, consistent environment so the same action behaves the same way every run.

Primitive 05

Context management

The context window is finite. Agent sessions are not. Every turn appends more, the history only grows, and eventually it will not fit. Context management is how the harness keeps the conversation inside the window without losing the plot.

There are three moves. Truncate: drop the oldest turns and hope they did not matter. Summarize: compact a long history into a shorter recap that preserves the gist. Offload: push detail out to files or external memory and pull it back only when it is needed.

The hard part is choosing what to compress without throwing away the one detail that mattered. Done well, it is invisible and the agent works for hours. Done badly, the agent quietly forgets the thing you told it twenty turns ago.

history grows

over the limit

compacted

summary of earlier turns

fits again

Old turns are summarized into a recap so the live history stays inside the window.

That is the whole anatomy: a loop, and five primitives it drives on every pass. None of them is exotic. What separates a toy from a production harness is judgment, exercised turn after turn: what deserves the window, when the loop should stop, which actions need a human, what can safely be forgotten.

Next: the coding toolset The context window it manages Back to the overview