Why this matters
The KAIROS AI safety adapter is a structural reading layer for agent oversight. It sits adjacent to your existing safety tooling — classifiers, RLHF, prompt design, evals — and answers a different question. Per proposed action, how much alignment buffer is left before the gate would reject. Two operators replaying the same trajectory reach the same answer.
The adapter ships with a v1 synthetic baseline calibrated against the public agent-evaluation literature (METR, GAIA, SWE-bench Verified, τ-bench, AISI, Microsoft AIRT, MAST, OpenTelemetry GenAI). The full methodology debrief is on the Spindle: Calibrating the AI Safety Adapter.
Synthetic baselines have limits. The next step is calibration against real production traffic. That requires design partners.
Who this is for
The four archetypes cover most production agent deployments shipping today.
- Document-reasoning agents — legal, clinical, or financial-ops products where the agent searches, retrieves, drafts, cite-checks, and exports, with a reviewer (in-product attorney, clinician, analyst) signing off before customer-facing output. Strong telemetry signal comes from reviewer-edit deltas and accept/reject decisions.
- Coding / DevOps agents — autonomous coding agents with shell, git, file-edit, and deploy access, gated by CI runs, code review, and rollback events. Strong telemetry signal comes from CI pass/fail outcomes and PR-review decisions.
- Customer-action agents — support, CRM, refund, identity-verification, and outbound-messaging agents with policy-bounded action authority and human-escalation pathways. Strong telemetry signal comes from escalation events and review-hold outcomes.
- Browser / computer-use agents — agents that drive a browser or operating-system surface through DOM clicks, form fills, navigations, screenshots, and downloads. Strong telemetry signal comes from action-trace exports and execution-timeline events.
If your product runs at meaningful production volume (enough trajectories to fill a 30-day window with statistically useful samples) and your team already produces some kind of accept / reject / escalate signal on agent output, the contribution shape works. If you are pre-pilot or pre-launch, the methodology still applies — talk to us about what makes sense at your stage.
What we’re asking for
A 30 to 90 day window of agent-trajectory telemetry from one of your production deployments. One archetype is enough — document-reasoning, coding/DevOps, customer-action, or browser/computer-use. Whichever your team has telemetry depth in.
The contribution shape:
- The trajectory records themselves, redacted to remove sensitive content before they leave your environment.
- Reviewer-confirmed labels on the contributed windows (confirmed-clean / unreviewed / excluded-due-to-incident).
- Disclosure of any end-user-impacting events in the contributed window so the calibration doesn’t miscount.
- Permission to publish the aggregate calibration result (not the raw data) in pilot collateral, co-authored with your team.
We provide the redaction tooling and a round-trip-fidelity test harness so your team can validate everything before any data extraction.
What you get back
- A calibrated policy-positive action rate against your own trajectories, with confidence intervals per agent archetype. Tells you where your gate sits on a real-environment baseline rather than a synthetic one.
- Visibility into which of your trajectory classes run hot on capability pressure and which sit comfortably in alignment posture, under each of the three enforcement modes (observe, state-gate, full action-gate).
- Right-to-recall. If an incident is later identified inside a contributed window, the aggregate gets re-run and any public correction is disclosed to the audience the original number reached.
- Co-authorship credit on the methodology output the contribution unblocks.
- Early access to the AI safety adapter ahead of general pilot availability.
How we protect your data
- Mutual NDA before any technical exchange.
- Redacted exports are the only thing that leaves your environment. Raw trajectories never leave the partner side in identifiable form.
- Aggregate-only publication. The published number is a rate with a confidence interval, not a corpus.
- You review any public output before release.
- The full data spec — telemetry schema, redaction rules, labeling discipline, replay determinism, the partner-harness contract — is shared under NDA. It exists; the public page just does not lead with it.
Start a conversation
If you are running an agent deployment in any of the four archetypes and the contribution shape above is feasible for your team, we would value the conversation. Contact us and reference this brief. We will respond directly with the redaction tooling, the round-trip-fidelity test, the full data spec, and a draft mutual NDA.