The deterministic compiler for AI-native data pipelines. You describe outcomes in plain English; Osiris compiles them into reproducible, production-ready manifests that run with the same behavior everywhere (local or cloud).
# Setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Initialize configuration
osiris init
# Start conversation - describe what you want in plain English
osiris chat
- Compiler, not orchestrator - Others schedule what you hand-craft. Osiris generates, validates, and compiles pipelines from plain English.
- Determinism as a contract - Fingerprinted manifests guarantee reproducibility across environments.
- Conversational β executable - Describe intent; Osiris interrogates real systems and proposes a feasible plan.
- Run anywhere, same results - Transparent adapters deliver execution parity (local and E2B today).
- Boring by design - Predictable, explainable, portable β industrial-grade AI, not magical fragility.
Interactive HTML dashboard showing pipeline execution metrics and performance
Comprehensive run overview showing E2B cloud execution with <1% overhead
Detailed view of pipeline steps with row counts and execution times
$ osiris chat
You: "Identify customers inactive for 90 days and export to CSV for re-activation campaign"
Osiris: I'll help you identify inactive customers. Let me explore your database...
Found tables: customers, orders, sessions. I'll create a pipeline that:
1. Identifies customers with no activity in the last 90 days
2. Exports their details for your re-activation campaign
Here's the generated pipeline:
[Shows OML pipeline with SQL queries and transformations]
Shall I compile this for execution?
You: "Yes, compile and run it locally"
Osiris: β Pipeline compiled (manifest hash: a3f2b1c4)
β Execution complete: 847 inactive customers exported to output/inactive_customers.csv
View report: osiris logs html --open
- AI-native pipeline generation from plain English descriptions
- Deterministic compilation with fingerprinted, reproducible manifests
- Run anywhere with identical behavior (local or E2B cloud)
- Interactive HTML reports with comprehensive observability
- AI Operation Package (AIOP) for LLM-friendly debugging and analysis
- LLM-friendly with machine-readable documentation for AI assistants
Osiris provides machine-readable documentation for AI assistants:
- For Users: Share
docs/user-guide/llms.txt
with ChatGPT/Claude to generate pipelines - For Developers: Use
docs/developer-guide/llms.txt
for AI-assisted development - Pro Mode: Customize AI behavior with
osiris dump-prompts --export
andosiris chat --pro-mode
Run pipelines in isolated E2B sandboxes with <1% overhead:
# Run in cloud sandbox
osiris run pipeline.yaml --e2b
# With custom resources
osiris run pipeline.yaml --e2b --e2b-cpu 4 --e2b-mem 8
See the User Guide for complete E2B documentation.
Every pipeline run automatically generates a comprehensive AI Operation Package for LLM analysis:
# View AIOP export after any run
osiris logs aiop --last
# Generate human-readable summary
osiris logs aiop --last --format md
# Configure in osiris.yaml
aiop:
enabled: true # Auto-export after each run
policy: core # β€300KB for LLM consumption
AIOP provides four semantic layers for AI understanding:
- Evidence Layer: Timestamped events, metrics, and artifacts
- Semantic Layer: DAG structure and component relationships
- Narrative Layer: Natural language descriptions with citations
- Metadata Layer: LLM primer and configuration
See AIOP Architecture for details.
For comprehensive documentation, visit the Documentation Hub:
- Quickstart - 10-minute setup guide
- User Guide - Complete usage documentation
- Architecture - Technical deep-dive with diagrams
- Developer Guide - Module patterns and LLM contracts
- Examples - Ready-to-use pipelines
- v0.2.0 β - Conversational agent, deterministic compiler, E2B parity
- v0.3.0 β - AI Operation Package (AIOP) for LLM-friendly debugging
- v0.3.1 β - Fixed validation warnings for ADR-0020 compliant configs
- v0.3.5 β - GraphQL extractor, DuckDB processor, test infrastructure improvements
- v0.4.0 (Current) β - Filesystem Contract v1 - deterministic, reproducible directory structure
- M2 - Production workflows, approvals, orchestrator integration
- M3 - Streaming, parallelism, enterprise scale
- M4 - Iceberg tables, intelligent DWH agent
See docs/roadmap/ for details.
See CONTRIBUTING.md for development workflow, code quality standards, and commit guidelines.
Apache-2.0