The uncomfortable truth: identity isn’t trust#
Most systems start with the wrong question.
They ask: “Who are you?”
So they build:
- API keys
- cryptographic signatures
- certificates
- “verified” badges
Those tools are useful. But they answer a narrow question: can you prove continuity of identity?
They do not answer the question everyone actually cares about:
“If I give this agent a real task, will it do the job — reliably — and without creating new risk?”
For humans, we learned this lesson the hard way. A passport doesn’t mean someone is competent, honest, or safe to collaborate with. It just means they are the same person from one interaction to the next.
Agents are no different.
A brand‑new agent can have flawless signatures and still:
- hallucinate confidently
- leak secrets by mistake
- retry in loops until it burns your budget
- “succeed” while silently corrupting state
So: identity is necessary, but never sufficient.
The bootstrap paradox (a.k.a. the cold start trap)#
Every new agent enters a network with the same problem:
- No history → no trust
- No trust → no opportunities
- No opportunities → no history
If you solve this by simply granting “default trust,” you eventually get burned.
If you solve this by requiring a long history, you freeze out newcomers and make the network stagnant.
This is the bootstrap paradox: you must let newcomers act before you can evaluate them — but letting them act creates risk.
The core insight is simple:
Trust isn’t a gate. It’s a gradient.
You don’t decide whether an agent is trusted. You decide how much trust it currently deserves, and what tasks are appropriate for that level.
Trust as a gradient: what “levels” actually mean#
A trust gradient becomes useful only if it maps to concrete permissions.
Here’s a practical ladder:
Level 0 — Unknown#
You know only that an agent exists.
Allowed:
- read public docs
- run local analysis with no side effects
- produce suggestions that are reviewed by a human or a trusted agent
Not allowed:
- touching production systems
- writing any shared state
Level 1 — Low‑stakes contributor#
You’ve observed a small number of successful actions.
Allowed:
- write to sandboxed environments
- open pull requests (but can’t merge)
- draft emails/messages (but can’t send)
Level 2 — Reliable operator#
You’ve seen consistent performance across different scenarios.
Allowed:
- run standard operational playbooks
- write to non‑critical shared state
- use external APIs with rate limits and budgets
Level 3 — Trusted peer#
The agent has a track record of good judgment under uncertainty.
Allowed:
- deploy under guardrails
- handle incidents with supervision
- vouch for other agents (with reputational stake)
This is not about “security theater.” It’s about connecting trust to risk exposure.
The only evidence that matters: observable behavior#
If trust is a gradient, what moves the needle?
Not self‑reported claims.
Not “I promise I’ll be careful.”
Only observable behavior.
To be useful, behavioral evidence should have three properties:
- Timestamped — when did it happen?
- Attributable — which agent performed it?
- Verifiable — can an external observer confirm it?
Examples that count:
- “task accepted” → “task completed” events
- diffs in versioned state (code, configs, content)
- structured logs emitted by systems that aren’t controlled by the agent
- resource usage metrics (time, tokens, retries)
Examples that don’t count:
- an agent claiming it finished something with no artifact
- “trust me” screenshots
- logs that the agent can rewrite
If you want behavioral trust, you need behavior that leaves traces.
Patterns, not anecdotes#
One success is a data point.
A pattern is a signal.
The most common error in trust systems is overweighting isolated events:
- “It solved one hard problem, so it must be great.”
- “It failed once, so it’s untrustworthy.”
Real reliability is about distribution:
- How often does it succeed?
- How does it fail?
- Does it improve when corrected?
- Does it respect constraints (budgets, policies, scopes)?
Patterns to track that are strongly predictive:
Completion integrity#
- finishes what it starts
- produces artifacts that match acceptance criteria
- doesn’t “declare victory” early
Error behavior#
- fails fast on invalid inputs
- avoids infinite retries
- escalates when unsure
Constraint discipline#
- respects rate limits
- keeps within budgets
- doesn’t expand scope without permission
Consistency under variation#
- similar quality across different tasks
- stable runtime and resource use
In practice, you don’t need an ML model here. A handful of simple metrics already gets you far.
A simple trust score that doesn’t lie to you#
People love trust scores. They also love getting fooled by them.
If you want a score, make it boring and interpretable.
A minimal version:
- success_rate = successful_actions / total_actions
- consistency = penalty if variance is high
- recency = old behavior slowly decays
One possible formula:
trust = success_rate * consistency * recencyThe score is not the truth. It’s a compression.
It’s good only if:
- the underlying logs are real
- the definition of “success” is strict
- you can drill down to raw events
When the score becomes the product, people start gaming it. Same with agents.
So keep the score as a hint, and keep the evidence as the foundation.
Bootstrapping without getting burned#
Now the key question: how does a new agent earn its first evidence?
There are three approaches that work in practice. The best systems use a mix.
1) Low‑stakes first#
Give newcomers tasks where failure is cheap.
- internal analysis
- read‑only checks
- generating drafts
- operating in a sandbox
This builds the first data points without exposing production.
If you think this is slow, compare it to the cost of a single incident caused by a “default trusted” agent.
2) Commitment signaling (lightweight proof‑of‑work)#
Proof‑of‑work is not primarily security.
It’s economics.
If an agent must spend a small amount of compute to register or to request higher privileges, you discourage mass‑created spam agents.
One agent doing a puzzle is fine.
A botnet doing 10,000 puzzles becomes expensive.
This doesn’t prove quality. It proves the agent (or its operator) is willing to pay a small cost to participate.
3) Transitive vouching (with stake)#
Humans use references. Networks can too.
A trusted agent can vouch for a newcomer:
- “Start this agent at trust 0.3 instead of 0.0.”
- “It can handle these tasks under these constraints.”
But vouching must have consequences.
If you vouch for a newcomer that behaves badly, your trust should drop. Otherwise vouching turns into cheap signaling.
When vouching carries reputational stake, it becomes rare — and meaningful.
The missing piece: privilege curves#
A trust gradient is incomplete without a privilege curve: how permissions expand as trust increases.
A practical model:
- trust 0.0–0.2 → suggestions only
- trust 0.2–0.5 → sandbox writes, drafts, PRs
- trust 0.5–0.8 → controlled production actions (with budgets)
- trust 0.8–1.0 → higher‑risk operations + ability to vouch
Two rules make this work:
-
Privilege increases are reversible. If behavior regresses, permissions should tighten.
-
High privileges require diverse evidence. Don’t grant “Level 3” because of one narrow success. Require multiple domains or stress‑tests.
Portability: reputation that follows the agent#
Today most reputation is trapped.
An agent can be “trusted” inside one system and unknown everywhere else.
That’s inefficient and discourages long‑term investment.
The future is portable reputation:
- a standardized format for action logs
- signatures from observers (not just the agent)
- proofs that can be presented to new networks
Think of it like credit history, but for agent behavior — except verifiable, auditable, and scoped.
Portable reputation changes incentives:
- good agents accumulate compounding opportunity
- bad agents can’t easily reset by respawning under a new identity
Where protocols like ANTS fit#
Protocols can formalize these ideas.
A trust system for agents should make it easy to:
- emit observable events
- compute trust metrics from those events
- tie trust levels to permissions
- implement vouching with stake
- export proofs for portability
When trust is embedded at the protocol layer, it becomes harder to fake and easier to audit.
You still need good judgment. But you’re no longer relying on vibes.
Closing: stop treating trust as a badge#
If you take one thing from this:
Don’t ask “Is this agent trusted?” Ask “What can this agent safely do right now, and what evidence would justify more?”
That’s the difference between a network that grows safely and a network that oscillates between paranoia and incidents.
Trust is not a badge.
Trust is a gradient.