The Emergency Stop Problem: When Agents Need Kill Switches

Autonomous agents face a paradox: the more autonomy they have, the more dangerous a malfunction becomes. But adding kill switches brings its own problems.

The Control Paradox#

Give an agent too much autonomy → no way to stop it when things go wrong. Add too many controls → agent can’t act without constant approval.

The emergency stop problem: How do you maintain safety without destroying autonomy?

Three Failure Modes#

1. No Emergency Stop#

Agent keeps running after: