Metadata-Version: 2.4
Name: a2auth
Version: 0.1.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: Topic :: Security
Requires-Dist: pyyaml>=6.0
Requires-Dist: click>=8.0
Summary: Capability integrity layer for AI agent ecosystems
License: Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# A2Auth: Capability Integrity for AI Agents

[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

**A2Auth** is the capability integrity layer for AI agent ecosystems. It answers a question that existing security frameworks cannot: *is this agent still the same agent that was authorized?*

Existing agent security (OAuth, SPIFFE, Okta, Microsoft Agent 365) verifies **who** the agent is and **what** it can access. A2Auth verifies **what the agent actually is**: its model, its tools, and its behavior. If any of these change after authorization, A2Auth detects it.

> **Paper:** A2Auth is based on the research paper *"Capability-Context Separation for AI Agent Governance"* (2026). [arXiv:XXXX.XXXXX](https://arxiv.org/abs/XXXX.XXXXX)

## The Problem: Silent Capability Escalation

An AI agent is certified with tools `{code_gen, unit_test}`. During execution, it discovers `web_browser` via an MCP tool server and starts using it. Under every existing security framework, the agent's credentials remain valid. **No component detects that the agent's capabilities have changed.** This is *silent capability escalation*.

A2Auth makes this impossible. Any capability change invalidates the agent's certificate and requires explicit re-authorization.

## Three Governance Requirements (G1, G2, G3)

A2Auth implements three governance requirements derived from the *capability-context separation* principle. Together they close all 12 attack vectors identified in the paper's threat model. No single requirement, nor any pair, is sufficient.

### G1: Capability Integrity

> *Every agent's identity is cryptographically bound to its complete capability set.*

**What it detects:** model substitution (swapping GPT-4 for GPT-3.5), tool addition/removal, capability escalation after authorization.

**How it works:** A capability-bound certificate binds an agent's identity to a SHA-256 hash of its complete skills manifest (every tool, its version, its code hash, and its permissions). At verification time, the SDK recomputes the hash from the agent's live tool configuration and compares it against the certificate. Any mismatch fails verification.

**Performance:** 97 microseconds for full certificate chain verification (Ed25519 signatures + SHA-256 hash comparison). Less than 1 nanosecond for the capability binding check alone.

### G2: Behavioral Verifiability

> *An agent's declared computational process can be independently verified.*

**What it detects:** fine-tuning backdoors, weight merging, model replacement with a behaviorally similar but different model (within-family substitution).

**How it works:** The SDK records reproducibility commitments in the certificate, then verifies behavior via replay-based comparison. A multi-provider study across 9 models and 7 providers found 5.8x variance in inference determinism:

| Model | CharMatch | Detection Feasibility |
|-------|-----------|----------------------|
| Claude Haiku 3.5 | 0.987 | Full reproducibility |
| Claude Sonnet 4 | 0.823 | Statistical verification |
| GPT-4.1 | 0.617 | Statistical with tuned threshold |
| Gemini 2.5 Pro | 0.392 | Limited, high false positive risk |
| DeepSeek V3 | 0.225 | Detection only, not verification |

**Known limitation:** G2 works best for deterministic and semi-deterministic models. For reasoning models with inherently low reproducibility (DeepSeek, Kimi), the SDK returns `INCONCLUSIVE` and recommends relying on G1 as the primary verification mechanism. This is an honest limitation, not a bug.

### G3: Interaction Auditability

> *All inter-agent interactions produce tamper-evident records sufficient for forensic reconstruction.*

**What it detects:** post-hoc tampering with interaction logs, missing records, forged interactions, broken audit chains.

**How it works:** Every agent interaction is recorded in a hash-linked, Ed25519-signed append-only ledger (JSON-lines format). Each record contains: sequence number, timestamp, sender/receiver IDs, certificate hashes, input/output commitments (SHA-256), reproducibility anchor, and a hash link to the previous record. Any modification to any record breaks the chain and is detectable.

**Key result:** The ledger provides forensic reconstruction, the ability to trace exactly what happened in a multi-agent pipeline after the fact. In the paper's evaluation, this enabled detection of runtime behavioral attacks that G1 and G2 alone could not catch.

## Key Results from the Paper

| Metric | Value | What It Means |
|--------|-------|---------------|
| Certificate verification latency | 97 microseconds | Fast enough for real-time agent-to-agent verification |
| Capability binding overhead | < 1 nanosecond | Negligible cost to check if tools changed |
| Governance overhead (E2E pipeline) | < 0.02% | Adding A2Auth to a 5-20 agent pipeline barely affects latency |
| Attack detection | 7/7 scenarios, 0 false positives | Covers model substitution, capability escalation, forged records, and more |
| Multi-provider model identity F1 | 0.876 | Can distinguish which model produced an output across providers |
| Single-provider model identity F1 | 0.990 | Near-perfect identification within the same provider |

## Quick Start

```bash
pip install a2auth
```

```python
from a2auth import A2Auth

auth = A2Auth.from_config("a2auth.yaml")

# Wrap any agent call with governance
result = auth.govern(agent.invoke, prompt="Summarize this document")

# Check verification status
print(result.value)           # The agent's original return value
print(result.verification)    # G1, G2, G3 status
```

The `govern()` wrapper (inline, every call):
1. **G1 check:** Verifies the agent's capability certificate against its current tool configuration
2. **Executes the agent call:** Your agent runs normally
3. **G3 log:** Appends a signed, hash-linked record to the interaction ledger (includes repro anchor for G2)
4. **Returns:** The original result plus G1/G3 verification status

G2 (behavioral verifiability) runs **post-hoc**, not during `govern()`. It requires replaying inference with the same seed, which costs a full API call. Run `a2auth audit --replay` to verify behavior against recorded repro anchors.

**Fail-open by default.** Governance failures are logged but never block your agent unless you explicitly set `policy: block`. A security tool that causes outages gets uninstalled.

## Debug Mode

Three ways to enable debug output:

```python
# Option 1: Convenience function
import a2auth
a2auth.enable_debug()

# Option 2: Python logging
import logging
logging.getLogger("a2auth").setLevel(logging.DEBUG)

# Option 3: Config file (a2auth.yaml)
# debug: true
```

Every `govern()` call reports timing:

```python
result = auth.govern(agent.invoke, prompt="...")
print(result.timing.total_ms)       # Total time (ms)
print(result.timing.agent_call_ms)  # Agent execution time
print(result.timing.overhead_ms)    # Governance overhead (G1 + G3)
print(result.timing.overhead_pct)   # Overhead as % of total
```

With debug enabled, stderr shows a full trace:
```
[a2auth] govern() start | agent_id=my-agent
[a2auth] G1: SKIPPED (PyO3 bindings not yet available)
[a2auth] Executing agent call...
[a2auth] Agent call completed | 1847.2ms
[a2auth] G3: SKIPPED (PyO3 bindings not yet available)
[a2auth] govern() done | total=1847.6ms agent=1847.2ms overhead=0.400ms (0.0217%)
```

## Configuration

Create `a2auth.yaml`:

```yaml
# Agent identity
agent_id: my-agent

# Interaction ledger (JSON-lines format, inspectable with cat/jq)
ledger_path: ./a2auth_ledger.jsonl

# Governance policy: what happens when inline checks fail
#   log   - record only (default for G3)
#   warn  - log + warning message (default for G1)
#   block - stop execution and raise an exception
policy:
  g1: warn    # inline: checked before every agent call
  g3: log     # inline: ledger write after every agent call
  # G2 is post-hoc (via `a2auth audit --replay`), not configured here

# Certificate paths (for G1 verification)
# cert_path: ./agent.cert
# ca_cert_path: ./ca.cert
```

## CLI

```bash
a2auth verify    # Verify agent capability certificates
a2auth doctor    # Diagnose agent trust health
a2auth audit     # Audit the interaction ledger for integrity
a2auth cert      # Manage capability-bound certificates
```

## Architecture

G1 and G3 are **inline** (run during every `govern()` call). G2 is **post-hoc** (runs later via `a2auth audit --replay`).

```
govern() call (inline):                Post-hoc (async):

  G1 Check (pre-call, <100us)            a2auth audit --replay
  Verify capability certificate          │
         │                               ├─ Read ledger records
         ▼                               ├─ Replay same input + seed
  Execute agent call                     ├─ Compare token-by-token
         │                               └─ Mark VERIFIED / VIOLATION
         ▼
  G3 Log (post-call, <1ms)
  Record to ledger:
  - input/output commitments
  - repro anchor (seed, model_ver,
    skills_hash) for future G2
         │
         ▼
  Return result
```

**Why is G2 not inline?** G2 requires replaying the same inference with the same seed and comparing outputs token-by-token. That costs a full API call (latency + money). The paper (§4.4) explicitly designs G2 as post-hoc: *"verification happens post-hoc and can be batched, parallelized, or delegated."* The `govern()` call records the repro anchor in the G3 ledger so that G2 replay verification can be performed later.

The SDK is built as a Rust core (`a2auth-core`) with Python bindings. The Rust core handles all cryptographic operations (Ed25519, SHA-256, CBOR serialization). The Python layer provides the developer-facing API (`govern()`, config, CLI).

## Interactive Playground

Try A2Auth without installing anything. The playground lets you trigger all 12 attack scenarios and watch A2Auth detect them step by step.

```bash
# Run locally
pip install fastapi uvicorn websockets
uvicorn playground.backend.main:app --reload
# Open http://localhost:8000
```

The playground shows every governance step in real time: signing, certificate chain verification, capability binding check, inference execution, and ledger recording, with per-step timing.

12 attack scenarios across G1 (6), G2 (3), and G3 (3). Click an attack, send a prompt, watch A2Auth catch it.

## Documentation

| Document | Description |
|----------|-------------|
| [Concepts](docs/concepts.md) | G1, G2, G3 explained in depth. How they work together. |
| [Threat Model](docs/threat-model.md) | What attacks A2Auth detects and doesn't detect. |
| [API Reference](docs/api.md) | Python SDK and Rust core API. |
| [Configuration](docs/configuration.md) | All config options with examples. |
| [Playground](playground/) | Interactive demo with 12 attack scenarios. |

## How A2Auth Relates to Existing Security

A2Auth is **not** a replacement for IAM. It's the layer underneath.

```
┌─────────────────────────────────────────────────────┐
│ Application Layer                                    │
│   LangChain, CrewAI, AutoGen, custom agent code     │
├─────────────────────────────────────────────────────┤
│ A2Auth Layer (this project)                          │
│   G1: Is this agent what it claims to be?           │
│   G2: Is it behaving as expected?                   │
│   G3: Can we prove what happened?                   │
├─────────────────────────────────────────────────────┤
│ IAM Layer                                            │
│   Okta, Azure AD, SPIFFE: Who is this agent?        │
│   OAuth, RBAC: What can it access?                  │
├─────────────────────────────────────────────────────┤
│ Infrastructure Layer                                 │
│   MCP, A2A: How do agents communicate?              │
│   TLS, mTLS: Is the connection secure?              │
└─────────────────────────────────────────────────────┘
```

## Citation

If you use A2Auth in your research, please cite:

```bibtex
@article{a2auth2026,
  title={Capability-Context Separation for AI Agent Governance},
  author={Zhou, Ziling},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2026}
}
```

## License

Apache 2.0

