Oversight
How much rope do you give it?
An agent can change real files and run real commands. Oversight is the harness deciding what it does freely, what waits for a human, and what it can never do at all.
Autonomy is a dial, not a switch. At one end you approve every action; at the other the agent runs unattended. Most real agents sit somewhere in between, and where they sit is a deliberate choice.
The dial
From approve-everything to no-gates
At the manual end, the agent proposes and you approve every step. Slower, but you see everything. At the autonomous end, it runs to completion with no gates. Faster, but you are trusting it completely. In between are the useful modes: approve only the risky actions, or have it plan first and approve the plan, then let it execute.
- ask first
- auto-approve safe
- plan, then run
- full auto
The modes
Where the dial can sit
The dial is not really continuous; it has a few named settings, and most harnesses let you pick one per kind of action.
| Mode | What it means | Best for |
|---|---|---|
| Ask first | The agent proposes every action and waits for your yes. | High-stakes work, or an agent you do not yet trust. |
| Auto-approve safe | Read-only actions run freely; only mutating ones pause for approval. | The common default: fast on looking, careful on changing. |
| Plan, then run | The agent writes a plan, you approve it, then it executes without further gates. | Bigger tasks where you want to vet the approach, not every step. |
| Full auto | No gates; the agent runs to completion on its own. | Trusted, well-sandboxed, low-stakes, or unattended runs. |
And gates are only half of staying in control. The other half is being able to see what the agent did, with a log of every action, and to stop it mid-run when it heads the wrong way. Permission, transparency, and interruption together are what make autonomy safe to grant.
The gates
Where a human gets a say
The natural place to put a gate is the read-only vs mutating line. Looking at the codebase changes nothing, so it never needs approval. Writing, editing, and running shell commands can do damage, so those are what pause for a human or a policy. Around all of it sits the sandbox, a walled-off slice of the machine where commands cannot touch the real system: the boundary the agent cannot cross no matter what it is approved to do.
That is the trust-versus-speed dial made concrete. Too many gates and the agent is not really autonomous; too few and a single wrong action can do real harm. Tuning it is one of the central jobs of building a harness.
The agent wants to run:
bash · rm -rf build/
Oversight is not a feature you bolt on at the end. It is the policy that decides whether you can actually trust the agent to run, and it is woven through the tools it has and the environment it runs in.
Next: subagents Read-only vs mutating tools The execution environment Back to the overview