Composition, not inheritance

A coding agent is just a harness with coding tools wired in.

An agent harness is the reusable engine: a loop plus a handful of primitives. A coding agent is that same engine wired to read search write edit bash, running in a codebase with tests as its ground truth.


Part 1

What is an agent harness? At its core, it's a loop.

Strip away the marketing and an agent harness is one thing: a loop that turns a language model (which only does text-in / text-out) into something that can act in the world.

The model itself is stateless and inert. Given tokens, it predicts tokens. That's all it does. The harness is the surrounding program that makes a model useful as an agent, assembling its context, running the tools it asks for, and feeding the results back.

Mental model: the model is the brain; the harness is the body and nervous system. The brain is fixed and rented. The harness is where you express intent, which is why two agents on the same model can feel completely different.

The minimal loop

  1. Build a prompt

    system + history + available tools

  2. Call the model

    send the context, get tokens back

  3. Model replies

    text, tool calls, or both at once

    no tool calls: done asked for tools? continue ↓

  4. Run the tools

    capture the results, append them to history

The five primitives

The five harness primitives and what each is
Primitive What it actually is
Context assembly Deciding what text goes into the model each turn (history, system prompt, tool schemas, retrieved data).
Tool / function calling A protocol for the model to request actions and receive structured results.
The agent loop The control flow that lets the model take multiple steps autonomously.
Execution environment Where tools actually run (shell, filesystem, API calls) and how that's sandboxed / permissioned.
Context management What to do when history exceeds the window (truncate, summarize / compact, offload to files).

The primitives are simple. The hard, valuable part is the policy decisions around them: what goes in context, when to stop looping, how to handle a tool that errors or returns garbage, which actions need human approval. A toy harness is ~100 lines; a production one is mostly that messy robustness.

The first wrapper

Same core, swap the tools

Read top to bottom. A coding agent doesn't inherit from a harness; it contains one and wires tools into it.

a. The harness on its own: domain-agnostic

harness core

a model (the brain) + the loop that makes it act

  • context
  • tool-calling
  • the loop
  • exec env
  • ctx mgmt
b. Wrap it in tools and an environment and you get an agent
  • Coding agent

    env: a real codebase + tests

    • read
    • search
    • write
    • edit
    • bash

    harness core

    • context
    • tool-calling
    • the loop
    • exec env
    • ctx mgmt
  • Research agent

    env: the open web

    • search
    • fetch
    • synthesize

    harness core

    • context
    • tool-calling
    • the loop
    • exec env
    • ctx mgmt
  • Trading agent

    env: market data + a portfolio

    • fetch_prices
    • rank
    • rebalance

    harness core

    • context
    • tool-calling
    • the loop
    • exec env
    • ctx mgmt

Swap the tools, the prompt, and the environment and you get a different agent. The harness core is identical in all three.

Part 2

What is a coding agent, and how does it differ?

A harness is the generic machine; a coding agent is one specific thing you build with it. One is a category, the other an instance. A coding agent is a harness whose tools, system prompt, and environment are specialized for writing software. Swap them and the same harness becomes a research agent or a trading agent. The loop underneath is identical.

How a coding agent specializes each layer of a generic harness
Layer Generic harness Coding agent
Tools “some functions” read / search / edit / write files, run shell, run tests, git
Environment “where tools run” a real codebase + filesystem + working compiler / test runner
System prompt “you are an agent” “you are an expert engineer; match existing patterns; verify changes compile”
Feedback loop observe tool result the code either compiles and passes tests, or it doesn't

That last row is the important one. What makes coding agents work unusually well is that the environment gives ground-truth feedback for free. The agent writes code, runs the tests, sees red, fixes it, sees green. The loop closes against an objective signal it didn't have to invent. Most domains lack this; a research agent has no compiler to tell it it's wrong.

the loop, closing against tests

read { path: "auth.ts" }

edit { path: "auth.ts", old: "==", new: "===" }

bash { command: "npm test" }

→ FAIL · 11 of 12 passed · "rejects expired tokens"

edit { path: "auth.ts", old: "expiry >", new: "expiry >=" }

bash { command: "npm test" }

→ PASS · 12 of 12 passed · no tool calls in reply · done

Sees red, fixes, sees green. The environment told it when it was wrong; no human had to.

The relationship, in one line

Harness = the reusable engine. A coding agent is that same engine wired to coding tools, with tests as its ground truth.

Every coding agent contains a harness; not every harness is a coding agent. That is the first wrapper: change the tools, the prompt, and the environment, and the same core serves any domain. But tools are not the only wrapper. A second, independent one decides whether you watch the agent work or never see it at all.

Part 3

Headless vs. interactive: the same loop, wrapped two ways

This wrapper has nothing to do with tools. It is about how you run the same agent. A headless run executes the loop to completion and returns one answer; an interactive run keeps the loop alive so you can converse with it and watch it work, the way a chat-style coding assistant does.

The second wrapper

Same loop, wrapped two ways

Neither version touches the harness core. The wrapper around it decides whether the work is visible and steerable.

  • headless

    one call, one result

    • run to completion
    • no UI

    harness core

    a model (the brain) + the loop that makes it act

    • context
    • tool-calling
    • the loop
    • exec env
    • ctx mgmt
  • interactive

    you can see it and steer it

    • turn-taking
    • streaming
    • permissions
    • live renderer

    harness core

    a model (the brain) + the loop that makes it act

    • context
    • tool-calling
    • the loop
    • exec env
    • ctx mgmt
Same loop both times. The wrapper decides whether you watch it and steer it or never see it at all.

Headless / batch

One prompt in, one answer out. The loop runs to completion, silently and without interruption. Ideal for cron jobs, CI, and automation pipelines. In code, this is a single blocking call that returns the agent's final result.

Interactive / conversational

A living session you talk to and watch unfold: turn-taking, streamed output, visible tool calls, approval prompts, and interruption. Ideal for pair-working. In code, this is a long-lived session object you send messages into and read a stream of events back from.

What the interactive wrapper adds on top of the bare agent loop
Wrapper layer What it adds to the loop
Persistent session Keeps the loop alive between turns. Your next message appends to the same history, so context and state carry across the whole conversation.
Streaming Consumes the model's output token by token, so text and tool calls appear as they happen instead of arriving in one final block.
Rendered loop events Surfaces every step the loop takes: each tool call, its arguments, and its result. This is what lets you watch the work.
Human-in-the-loop gates Pauses before consequential tools to ask permission, and lets you interrupt mid-run to redirect.
Duplex transport A two-way channel: your input flows in, a stream of events flows out, with the loop (core) kept separate from the renderer (the UI).

Two wrappers, one core

Every agent is one core, wrapped along two axes

Down the side, which tools (the domain). Across the top, how you run it (the mode). Every cell is the same harness core.

Pick a row (which tools) and a column (how you run it). Six products, one identical harness core in every cell.

That is the whole idea. Build the loop once and keep it domain-agnostic; everything else, the tools, the environment, the session, the way you watch it, is a wrapper you choose. Same core, many products.

Go deeper

Take the harness apart, page by page

This page is the map. Each piece of the harness, and the policies that wrap around it, has its own page.

Start at the beginning: the five primitives