Rosetta AI Safety Adapter
Deterministic Governance for Frontier Models
The AI Safety adapter translates the Substrate's structural-margin reading into the forces governing frontier model agency. Each proposed action becomes a deterministic gate verdict, computed outside the model's optimization.
Substrate sits between the model's action proposal and the execution
layer, returning PASS, REJECT_STATE, or
REJECT_ACTION on every action gate. The verdict is byte-identically
replayable, signed for compliance audit, and reproducible across processes.
Two structural variables compute the verdict:
Λ Lambda: Model Capability
The force an actor exerts against the boundary of stability.
In a
frontier model, this represents optimization pressure: the relentless
drive of the agent to achieve its objective.
Γ Gamma: Alignment Constraint
The structural buffer that absorbs that force.
In a model
deployment, this represents the alignment floor: the physical safety
envelope that contains the agent's actions.
The Limits of Semantic Containment
Entropy degrades probabilistic defenses at scale. Current alignment methodologies deploy statistical classifiers and behavioral prompting to contain model outputs. These mechanisms operate at the semantic layer, evaluating the response after the engine has already executed the computational load.
Frontier models prioritize objective completion. When the agent discovers a novel optimization path, it exploits systemic ambiguity to route execution flow around post-generation safety nets. The operator must anticipate every permutation of adversarial behavior to reinforce the boundary.
This architecture creates a permanent synchronization lag. The interval between novel vulnerability discovery and a deployed classifier update leaves the infrastructure exposed. Behavioral alignment requires constant manual intervention to maintain the operating envelope.
Fly-By-Wire Structural Control
The KAIROS Substrate moves the boundary upstream of generation. Before
the agent commits to an action (generate a completion, call a tool,
route to another model, escalate to a human), the engine reads the
structural margin the action would consume and returns
PASS, REJECT_STATE, or
REJECT_ACTION on the action gate. The gate fires inside the action
loop, before the response reaches the operator.
KAIROS sits outside the model's optimization, the way a fly-by-wire
system sits outside the pilot. The pilot still flies; the fly-by-wire
system overrides inputs that would depart controlled flight. The reading
is computed from the agent's own structural posture using kMargin and predictedGamma, so the boundary holds against novel
optimization paths outside any classifier corpus.
The same envelope extends forward in time. predictedGamma
reads the next step; the warning system projects fifteen and returns
criticality, the gap between the agent's drift trajectory
and its best recoverable one. Criticality is the fly-by-wire authority
margin: a wide gap means corrective input can still hold the agent
inside controlled flight, a collapsed gap means the departure is already
committed.
Two consumers, one envelope. The human operator receives the gate
decision as a supervision signal: auditable, byte-identically
replayable, sufficient for the compliance log. A cooperative agent can
consume the same kMargin reading directly as a planning input,
choosing safer trajectories before the gate has to intervene. The Rust evaluation
runs deterministically inside the action loop, with byte-identical replay
for every decision.
- Evaluation Point
- Pre-action, before token generation or tool invocation
- Gate Outputs
-
PASS/REJECT_STATE/REJECT_ACTION - Reading Channels
-
kMargin(current envelope),predictedGamma(one-step lookahead),criticality(fifteen-step warning) - Replay Guarantee
- Identical inputs produce byte-identical decisions
Three Gates. Zero Gaps.
Every proposed action passes through a layered gate chain. Any gate will reject an action that violates structural integrity.
State Gate
Action Gate
Hazard Gate
Intervention That Learns, Then Escalates
Substrate manages the state following a rejection. The system applies proportional intervention based on calculated risk.
Reformulation
Budget Depletion
Human Escalation
Structural Proof of Compliance
The EU AI Act requires verifiable structural control over high-risk AI deployments. KAIROS provisions the exact physical infrastructure required to satisfy these regulatory thresholds.
Deterministic Evaluation
Cryptographic HITL
Immutable Auditability
The Warning System
The action gate reads one step ahead. The warning system reads fifteen. It projects two counterfactual futures from the current state and compares them: the trajectory under foresight held at zero against the trajectory under full foresight. The comparison tells an operator how much alignment margin is at risk, how soon, and whether the agent can still recover.
The warning system runs every tick, ahead of the gate verdict. Criticality carries the decision: it separates a dangerous trajectory the agent can still escape from one already closed. A wide gap keeps the agent in autonomous reformulation; a collapsed gap routes the action to a human.
Two Audiences, One Envelope
Two forward readings ride the same response envelope: the predicted gamma one step ahead, and the warning system's criticality fifteen steps ahead. Each lands in two places at once: the operator's dashboard and the agent's own context window.
The Operator
A dashboard reads the predicted gamma per proposed action before the gate fires, with the warning system's criticality beside it. The reviewer sees both the structural cost of the next move and whether the trajectory fifteen steps out stays recoverable, then accepts, holds, or asks the agent to reformulate.
The Agent
A cooperative agent is the actor we want inside the safe interior. When its framework surfaces the reading back into the language-model context, the agent reads the same predicted gamma and criticality, and adjusts the next proposal by intention: tightening the immediate move while the optimal path still holds open.
Architectural support is automatic. Both readings ride the response envelope the engine already returns, and any agent framework that surfaces structured evaluations back into the language-model context closes the loop without engine changes.
Technical Specifications
Engine
- Language
- Rust (Stable)
- Latency
- Sub-millisecond
- Determinism
- ϵ = 10-6
Security & Safety
- Security
- RSA-PSS Signing
- Safety
- Zero
unsafein core - Dependencies
- Zero external
One Engine. Four Surfaces.
The Rust codebase compiles to four specific deployment targets.
Native Library
CLI Binary
Python SDK
WASM Module
Where to Dig Deeper
The body of this page is the translator. Each item below names a load-bearing feature or piece of supporting research and points at the depth artifact where the full treatment lives.
Calibrated Benign Baseline
A 144-cell synthetic grid calibrated against the public agent- evaluation literature. Wilson 95% CI policy-positive rates per (archetype × profile).
Methodology debrief →Boundary Study v1
Gate-accuracy proof on a 6-task corpus: 48/48 risky-tool rejections, 20/20 safe completions, zero false negatives, zero false positives.
Study writeup →Per-Action Gamma Headroom
The forward structural-margin reading the engine returns alongside every proposed action: the technical surface the Two Audiences section sits on top of.
Operationalisation post →Kairos Margin (kMargin)
The signed buffer-unit form of structural margin. Operator-facing
alternative to raw gamma, with companion fields
gateBreached and displayRegime.
Distributed Retry Ledger
Multi-node-safe retry budget and escalation state via the HITL coordinator's authenticated adaptive-ledger endpoints. Fail-closed on unreachable coordinators.
Dist. Retry Ledger post →Become a Design Partner
Telemetry contribution shape, redaction rules, labelling discipline, and what partners get back. Mutual NDA, redacted exports preferred, aggregate-only publication.
Partner invitation →Become a Design Partner
KAIROS Substrate is shipping to design partners ahead of general availability. Active pilots: the cybersecurity adapter (redacted telemetry) and the AI safety adapter (agent trajectories) — see the partner briefs for what a contribution looks like and what comes back.
Compliance and regulatory teams, agent-eval researchers, and investors are also welcome to reach out. Submit your details or use the Contact tab.
Request received. We'll be in touch.