Composition, not inheritance
A coding agent is just a harness with coding tools wired in.
An agent harness is the reusable engine: a loop plus a handful of primitives. A coding agent is that same engine wired to read search write edit bash, running in a codebase with tests as its ground truth.
Part 1
What is an agent harness? At its core, it's a loop.
Strip away the marketing and an agent harness is one thing: a loop that turns a language model (which only does text-in / text-out) into something that can act in the world.
The model itself is stateless and inert. Given tokens, it predicts tokens. That's all it does. The harness is the surrounding program that makes a model useful as an agent, assembling its context, running the tools it asks for, and feeding the results back.
Mental model: the model is the brain; the harness is the body and nervous system. The brain is fixed and rented. The harness is where you express intent, which is why two agents on the same model can feel completely different.
The minimal loop
-
Build a prompt
system + history + available tools
-
Call the model
send the context, get tokens back
-
Model replies
text, tool calls, or both at once
no tool calls: done asked for tools? continue ↓
-
Run the tools
capture the results, append them to history
The five primitives
| Primitive | What it actually is |
|---|---|
| Context assembly | Deciding what text goes into the model each turn (history, system prompt, tool schemas, retrieved data). |
| Tool / function calling | A protocol for the model to request actions and receive structured results. |
| The agent loop | The control flow that lets the model take multiple steps autonomously. |
| Execution environment | Where tools actually run (shell, filesystem, API calls) and how that's sandboxed / permissioned. |
| Context management | What to do when history exceeds the window (truncate, summarize / compact, offload to files). |
The primitives are simple. The hard, valuable part is the policy decisions around them: what goes in context, when to stop looping, how to handle a tool that errors or returns garbage, which actions need human approval. A toy harness is ~100 lines; a production one is mostly that messy robustness.
The first wrapper
Same core, swap the tools
Read top to bottom. A coding agent doesn't inherit from a harness; it contains one and wires tools into it.
harness core
a model (the brain) + the loop that makes it act
- context
- tool-calling
- the loop
- exec env
- ctx mgmt
-
Coding agent
env: a real codebase + tests
- read
- search
- write
- edit
- bash
harness core
- context
- tool-calling
- the loop
- exec env
- ctx mgmt
-
Research agent
env: the open web
- search
- fetch
- synthesize
harness core
- context
- tool-calling
- the loop
- exec env
- ctx mgmt
-
Trading agent
env: market data + a portfolio
- fetch_prices
- rank
- rebalance
harness core
- context
- tool-calling
- the loop
- exec env
- ctx mgmt
Swap the tools, the prompt, and the environment and you get a different agent. The harness core is identical in all three.
Part 2
What is a coding agent, and how does it differ?
A harness is the generic machine; a coding agent is one specific thing you build with it. One is a category, the other an instance. A coding agent is a harness whose tools, system prompt, and environment are specialized for writing software. Swap them and the same harness becomes a research agent or a trading agent. The loop underneath is identical.
| Layer | Generic harness | Coding agent |
|---|---|---|
| Tools | “some functions” | read / search / edit / write files, run shell, run tests, git |
| Environment | “where tools run” | a real codebase + filesystem + working compiler / test runner |
| System prompt | “you are an agent” | “you are an expert engineer; match existing patterns; verify changes compile” |
| Feedback loop | observe tool result | the code either compiles and passes tests, or it doesn't |
That last row is the important one. What makes coding agents work unusually well is that the environment gives ground-truth feedback for free. The agent writes code, runs the tests, sees red, fixes it, sees green. The loop closes against an objective signal it didn't have to invent. Most domains lack this; a research agent has no compiler to tell it it's wrong.
the loop, closing against tests
read { path: "auth.ts" }
edit { path: "auth.ts", old: "==", new: "===" }
bash { command: "npm test" }
→ FAIL · 11 of 12 passed · "rejects expired tokens"
edit { path: "auth.ts", old: "expiry >", new: "expiry >=" }
bash { command: "npm test" }
→ PASS · 12 of 12 passed · no tool calls in reply · done
The relationship, in one line
Harness = the reusable engine. A coding agent is that same engine wired to coding tools, with tests as its ground truth.
Every coding agent contains a harness; not every harness is a coding agent. That is the first wrapper: change the tools, the prompt, and the environment, and the same core serves any domain. But tools are not the only wrapper. A second, independent one decides whether you watch the agent work or never see it at all.
Part 3
Headless vs. interactive: the same loop, wrapped two ways
This wrapper has nothing to do with tools. It is about how you run the same agent. A headless run executes the loop to completion and returns one answer; an interactive run keeps the loop alive so you can converse with it and watch it work, the way a chat-style coding assistant does.
The second wrapper
Same loop, wrapped two ways
Neither version touches the harness core. The wrapper around it decides whether the work is visible and steerable.
-
headless
one call, one result
- run to completion
- no UI
harness core
a model (the brain) + the loop that makes it act
- context
- tool-calling
- the loop
- exec env
- ctx mgmt
-
interactive
you can see it and steer it
- turn-taking
- streaming
- permissions
- live renderer
harness core
a model (the brain) + the loop that makes it act
- context
- tool-calling
- the loop
- exec env
- ctx mgmt
Headless / batch
One prompt in, one answer out. The loop runs to completion, silently and without interruption. Ideal for cron jobs, CI, and automation pipelines. In code, this is a single blocking call that returns the agent's final result.
Interactive / conversational
A living session you talk to and watch unfold: turn-taking, streamed output, visible tool calls, approval prompts, and interruption. Ideal for pair-working. In code, this is a long-lived session object you send messages into and read a stream of events back from.
| Wrapper layer | What it adds to the loop |
|---|---|
| Persistent session | Keeps the loop alive between turns. Your next message appends to the same history, so context and state carry across the whole conversation. |
| Streaming | Consumes the model's output token by token, so text and tool calls appear as they happen instead of arriving in one final block. |
| Rendered loop events | Surfaces every step the loop takes: each tool call, its arguments, and its result. This is what lets you watch the work. |
| Human-in-the-loop gates | Pauses before consequential tools to ask permission, and lets you interrupt mid-run to redirect. |
| Duplex transport | A two-way channel: your input flows in, a stream of events flows out, with the loop (core) kept separate from the renderer (the UI). |
Two wrappers, one core
Every agent is one core, wrapped along two axes
Down the side, which tools (the domain). Across the top, how you run it (the mode). Every cell is the same harness core.
Coding
CI fix-it bot
pair programmer (this session)
Research
nightly report
research chat
Trading
cron rebalancer
trading copilot
That is the whole idea. Build the loop once and keep it domain-agnostic; everything else, the tools, the environment, the session, the way you watch it, is a wrapper you choose. Same core, many products.
Go deeper
Take the harness apart, page by page
This page is the map. Each piece of the harness, and the policies that wrap around it, has its own page.
-
Primitives
The five parts every harness is built from.
-
Tools
The coding toolset: read, search, write, edit, bash.
-
Context
The context window, the agent's working memory.
-
Memory
Working memory versus durable, long-term memory.
-
Steering
The system prompt, the lever that shapes behavior.
-
Oversight
Autonomy, permissions, and where the gates go.
-
Subagents
One agent running a team of agents.