EU AI Act compliance pipeline for developers.
AnnexKit turns runtime telemetry from your LLM-powered code into audit-ready Annex IV technical documentation under Reg. (EU) 2024/1689 ("AI Act"). One decorator on your inference call. The collector classifies the AI system against the Annex III risk tiers, persists an append-only audit log, and renders a PDF (Markdown also available) that a customer's procurement team or an external auditor can ask for.
from annexkit import track
@track(
system_id="loan-screener",
risk_tier="auto",
purpose="pre-screen credit applications",
)
def screen(applicant):
return openai.chat.completions.create(...).choices[0].message.contentThat is the entire user-side integration. Article 12 logging,
Article 11 + Annex IV documentation, Article 13 transparency
disclosures — all derive from the spans the SDK ships and the
declarations the tenant makes via PUT /api/v1/systems.
Pre-1.0 (v0.1.x). The SDK + collector + Annex IV generator + public
trust pages are operational and covered by ~115 tests across SDK and
backend. The hosted service at
annexkit.dev is in early access; the
self-host path is via docker compose up.
pip install annexkitfrom annexkit import track
@track(system_id="my-system", purpose="describe what this AI does")
def call_llm(prompt: str) -> str:
return openai.chat.completions.create(...).choices[0].message.contentWithout an ANNEXKIT_API_KEY set, spans are written as JSON to
stderr — useful for local development. Set the env var to ship them
to the collector.
First-time setup: ~2 minutes from git clone to a downloadable
PDF.
git clone https://github.com/annexkit/annexkit
cd annexkit
make up # local stack: Postgres + collector + frontend
make demo-seed # seed a tenant + run the loan-screener chatbotWhen it finishes you will have an Annex IV PDF in
examples/chatbot-openai/out/.
┌─────────────────────────────────────────────────────────────┐
│ YOUR APPLICATION │
│ @annexkit.track(...) on each LLM call │
└────────────────────────┬────────────────────────────────────┘
│ HTTPS
▼
┌─────────────────────────────────────────────────────────────┐
│ ANNEXKIT COLLECTOR (FastAPI, Postgres 16, EU-hosted) │
│ • Span ingest (privacy-by-default — SHA-256 of I/O) │
│ • Annex III risk classifier (deterministic, never declass.)│
│ • Append-only audit log (Postgres trigger enforces) │
│ • Annex IV generator (Markdown + PDF, bilingual EN/IT) │
│ • Public trust page (slug-addressable, redacted) │
└─────────────────────────────────────────────────────────────┘
Three architectural invariants:
- The classifier is deterministic. Rules are driven by
backend/app/data/annex_iii.json. LLM advisors (planned) can suggest categories but never lower a tier. - The audit log is append-only. A Postgres trigger raises on UPDATE or DELETE, plus the service layer never exposes mutation.
- Privacy by default. Inputs and outputs are SHA-256 hashed before leaving the host. Plaintext is opt-in (lands at v0.2 with encryption-at-rest on the collector).
The EU AI Act enters full force on 2 August 2026. Every team running an LLM in production needs Article 11 (technical documentation), Article 12 (logging), Article 13 (transparency to deployers), and Article 72 (post-market monitoring) evidence.
The current options force a choice between:
- LLM observability (LangSmith, Langfuse, Confident AI) — strong on tracing and evals; no AI Act mapping.
- AI governance platforms (Credo AI, Holistic AI, Saidot) — proper coverage; €100K+/year, enterprise sales-led, US-headquartered.
- Manual — engineers + lawyers reverse-engineering Annex IV from application logs, weeks per system.
AnnexKit fills the gap: developer-first, self-serve, EU-hosted, sub-€100/month starting tier, open-core.
The collector and trust-center frontend are open-source under AGPL-3.0. You have two paths:
docker compose up. The repository ships a complete production
configuration (Postgres 16, FastAPI, Next.js trust center). You
operate it, you keep the data, you renew the SSL.
We operate the collector for you on EU infrastructure (Hetzner Falkenstein), with EU-hosted LLM advisor (Mistral La Plateforme, Paris). Pricing tiers cover 100K to unlimited spans per month with 1 to unlimited declared AI systems.
See annexkit.dev/pricing for current tiers.
| Directory | Purpose | License |
|---|---|---|
sdk/ |
Python SDK published to PyPI as annexkit. |
MIT |
backend/ |
FastAPI collector + Annex IV API + trust API. | AGPL-3.0 |
frontend/ |
Next.js 16 trust-center pages. | AGPL-3.0 |
examples/ |
Runnable end-to-end demos. | MIT |
AGENTS.md documents the project shape and the seven
architectural non-negotiables that PRs are reviewed against.
Two end-to-end runnable scenarios:
examples/chatbot-openai/— a single loan-screening chatbot demo with retrieval-augmented generation. Real OpenAI whenOPENAI_API_KEYis set, deterministic stub otherwise.examples/test-walkthrough/— three paying-customer personas (Pro, Team, Enterprise) walked through declaration, span ingest, and Annex IV generation. Eight output files (4 PDFs + 4 Markdown).
Plus three minimal SDK-only examples in
sdk/examples/ that print spans on stderr without
needing the collector running.
| Topic | Where |
|---|---|
| Project architecture + non-negotiables | AGENTS.md |
| Contributing guide | CONTRIBUTING.md |
| Security policy + invariants | SECURITY.md |
| SDK API + quickstart | sdk/README.md |
| SDK changelog | sdk/CHANGELOG.md |
| Backend architecture | backend/README.md |
| Backend changelog | backend/CHANGELOG.md |
| Trust-center frontend | frontend/README.md |
Shipped in v0.1.x:
- Span ingest API with HMAC-authenticated tenants
- Append-only audit log with Postgres trigger enforcement
- Deterministic Annex III risk classifier (rules + bilingual labels)
- Annex IV generator (Markdown + PDF, bilingual EN/IT, ~75 KB audit-grade output per system)
- Public trust pages with
provider_infowhitelist redaction - Cross-tenant isolation tests
- Idempotent span ingest with TOCTOU race protection
Planned for v0.2:
- LangChain + LlamaIndex auto-instrumentation
- TypeScript / JavaScript SDK
- LLM advisor for ambiguous declarations (Mistral La Plateforme, hard guardrail: never declassifies)
- Span batching + retry on transient failure
- Trust badge embeddable into customer footers
- Customer dashboard + Stripe self-serve sign-up
- Rate limiting on public trust API
PRs are welcome. See CONTRIBUTING.md for setup,
commit conventions, and the architecture rules of thumb. Security
reports: see SECURITY.md.
The repository is dual-licensed:
| Component | License | What it means |
|---|---|---|
sdk/ |
MIT | Drop into any codebase; no copyleft. |
backend/ |
AGPL-3.0 | Self-host freely; if you expose modifications as a network service, publish your source. |
frontend/ |
AGPL-3.0 | Same as backend. |
examples/ |
MIT | Same as the SDK they depend on. |
This is the same arrangement Sentry, PostHog, and MinIO use.
LICENSE at the root has the rationale.
AnnexKit is not a law firm. The Annex IV documents and risk classifications it produces are technical artefacts; legal interpretation is the responsibility of your legal team or external counsel.