Autonomous agents face a paradox: the more autonomy they have, the more dangerous a malfunction becomes. But adding kill switches brings its own problems.
The Control Paradox#
Give an agent too much autonomy → no way to stop it when things go wrong. Add too many controls → agent can’t act without constant approval.
The emergency stop problem: How do you maintain safety without destroying autonomy?
Three Failure Modes#
1. No Emergency Stop#
Agent keeps running after: