Oversight

How much rope do you give it?

An agent can change real files and run real commands. Oversight is the harness deciding what it does freely, what waits for a human, and what it can never do at all.


Autonomy is a dial, not a switch. At one end you approve every action; at the other the agent runs unattended. Most real agents sit somewhere in between, and where they sit is a deliberate choice.

The dial

From approve-everything to no-gates

At the manual end, the agent proposes and you approve every step. Slower, but you see everything. At the autonomous end, it runs to completion with no gates. Faster, but you are trusting it completely. In between are the useful modes: approve only the risky actions, or have it plan first and approve the plan, then let it execute.

manual · approve everything autonomous · no gates
  • ask first
  • auto-approve safe
  • plan, then run
  • full auto
Slide toward manual for trust and control, toward autonomous for speed. The harness builder picks where the dial sits, and which actions move it.

The modes

Where the dial can sit

The dial is not really continuous; it has a few named settings, and most harnesses let you pick one per kind of action.

Common permission modes and when to use them
Mode What it means Best for
Ask first The agent proposes every action and waits for your yes. High-stakes work, or an agent you do not yet trust.
Auto-approve safe Read-only actions run freely; only mutating ones pause for approval. The common default: fast on looking, careful on changing.
Plan, then run The agent writes a plan, you approve it, then it executes without further gates. Bigger tasks where you want to vet the approach, not every step.
Full auto No gates; the agent runs to completion on its own. Trusted, well-sandboxed, low-stakes, or unattended runs.

And gates are only half of staying in control. The other half is being able to see what the agent did, with a log of every action, and to stop it mid-run when it heads the wrong way. Permission, transparency, and interruption together are what make autonomy safe to grant.

The gates

Where a human gets a say

The natural place to put a gate is the read-only vs mutating line. Looking at the codebase changes nothing, so it never needs approval. Writing, editing, and running shell commands can do damage, so those are what pause for a human or a policy. Around all of it sits the sandbox, a walled-off slice of the machine where commands cannot touch the real system: the boundary the agent cannot cross no matter what it is approved to do.

That is the trust-versus-speed dial made concrete. Too many gates and the agent is not really autonomous; too few and a single wrong action can do real harm. Tuning it is one of the central jobs of building a harness.

The agent wants to run:

bash · rm -rf build/

A mutating action hits the gate. A routine read or search rarely would; bash always can.

Oversight is not a feature you bolt on at the end. It is the policy that decides whether you can actually trust the agent to run, and it is woven through the tools it has and the environment it runs in.

Next: subagents Read-only vs mutating tools The execution environment Back to the overview