Transform pipeline graphs into production-ready SageMaker pipelines automatically.
Cursus is an intelligent pipeline generation system that automatically creates complete SageMaker pipelines from user-provided pipeline graphs. Simply define your ML workflow as a graph structure, and Cursus handles all the complex SageMaker implementation details, dependency resolution, and configuration management automatically.
# Core installation
pip install cursus
# With ML frameworks
pip install cursus[pytorch,xgboost]
# Full installation with all features
pip install cursus[all]from cursus.core import compile_dag_to_pipeline
from cursus.api import PipelineDAG
from sagemaker.workflow.pipeline_context import PipelineSession
# Create a simple DAG
dag = PipelineDAG()
dag.add_node("CradleDataLoading_training")
dag.add_node("TabularPreprocessing_training")
dag.add_node("XGBoostTraining")
dag.add_edge("CradleDataLoading_training", "TabularPreprocessing_training")
dag.add_edge("TabularPreprocessing_training", "XGBoostTraining")
# Set up SageMaker session
pipeline_session = PipelineSession()
role = "arn:aws:iam::123456789012:role/SageMakerExecutionRole"
# Compile to SageMaker pipeline automatically
pipeline = compile_dag_to_pipeline(
dag=dag,
config_path="config.json",
sagemaker_session=pipeline_session,
role=role,
pipeline_name="fraud-detection"
)
pipeline.upsert() # Deploy and run!# Generate a new project
cursus init --template xgboost --name fraud-detection
# Validate your DAG
cursus validate my_dag.py
# Compile to SageMaker pipeline
cursus compile my_dag.py --name my-pipeline --output pipeline.json- Input: Simple pipeline graph with step types and connections
- Output: Complete SageMaker pipeline with all dependencies resolved
- Magic: Intelligent analysis of graph structure with automatic step builder selection
- Before: 2-4 weeks of manual SageMaker configuration
- After: 10-30 minutes from graph to working pipeline
- Result: 95% reduction in development time
- Automatic step connections and data flow
- Smart configuration matching and validation
- Type-safe specifications with compile-time checks
- Semantic compatibility analysis
- Built-in quality gates and validation
- Enterprise governance and compliance
- Comprehensive error handling and debugging
- 98% complete with 1,650+ lines of complex code eliminated
Based on production deployments across enterprise environments:
| Component | Code Reduction | Lines Eliminated | Key Benefit |
|---|---|---|---|
| Processing Steps | 60% | 400+ lines | Automatic input/output resolution |
| Training Steps | 60% | 300+ lines | Intelligent hyperparameter handling |
| Model Steps | 47% | 380+ lines | Streamlined model creation |
| Registration Steps | 66% | 330+ lines | Simplified deployment workflows |
| Overall System | ~55% | 1,650+ lines | Intelligent automation |
Cursus follows a sophisticated layered architecture:
- π― User Interface: Fluent API and Pipeline DAG for intuitive construction
- π§ Intelligence Layer: Smart proxies with automatic dependency resolution
- ποΈ Orchestration: Pipeline assembler and compiler for DAG-to-template conversion
- π Registry Management: Multi-context coordination with lifecycle management
- π Dependency Resolution: Intelligent matching with semantic compatibility
- π Specification Layer: Comprehensive step definitions with quality gates
from cursus.core import compile_dag_to_pipeline
from cursus.api import PipelineDAG
from sagemaker.workflow.pipeline_context import PipelineSession
# Create DAG
dag = PipelineDAG()
dag.add_node("CradleDataLoading_training")
dag.add_node("XGBoostTraining")
dag.add_edge("CradleDataLoading_training", "XGBoostTraining")
# Set up SageMaker session
pipeline_session = PipelineSession()
role = "arn:aws:iam::123456789012:role/SageMakerExecutionRole"
# Compile to SageMaker pipeline
pipeline = compile_dag_to_pipeline(
dag=dag,
config_path="config.json",
sagemaker_session=pipeline_session,
role=role,
pipeline_name="my-ml-pipeline"
)from cursus.core import compile_dag_to_pipeline, PipelineDAGCompiler
from cursus.api import PipelineDAG
from sagemaker.workflow.pipeline_context import PipelineSession
# Create DAG with more complex workflow
dag = PipelineDAG()
dag.add_node("CradleDataLoading_training")
dag.add_node("TabularPreprocessing_training")
dag.add_node("XGBoostTraining")
dag.add_node("CradleDataLoading_calibration")
dag.add_node("TabularPreprocessing_calibration")
dag.add_node("XGBoostModelEval_calibration")
# Add edges for training flow
dag.add_edge("CradleDataLoading_training", "TabularPreprocessing_training")
dag.add_edge("TabularPreprocessing_training", "XGBoostTraining")
# Add edges for calibration flow
dag.add_edge("CradleDataLoading_calibration", "TabularPreprocessing_calibration")
dag.add_edge("XGBoostTraining", "XGBoostModelEval_calibration")
dag.add_edge("TabularPreprocessing_calibration", "XGBoostModelEval_calibration")
# Set up SageMaker session
pipeline_session = PipelineSession()
role = "arn:aws:iam::123456789012:role/SageMakerExecutionRole"
# Compile with validation and reporting
compiler = PipelineDAGCompiler(
config_path="config.json",
sagemaker_session=pipeline_session,
role=role
)
# Validate DAG before compilation
validation = compiler.validate_dag_compatibility(dag)
if validation.is_valid:
print(f"β
DAG validation passed! Confidence: {validation.avg_confidence:.2f}")
# Compile with detailed report
pipeline, report = compiler.compile_with_report(
dag=dag,
pipeline_name="advanced-ml-pipeline"
)
print(f"π Pipeline compiled: {report.summary()}")
else:
print("β DAG validation failed:", validation.config_errors)from cursus.pipeline_catalog.pipelines.xgb_training_simple import XGBoostTrainingSimplePipeline
from sagemaker.workflow.pipeline_context import PipelineSession
# Set up SageMaker session
pipeline_session = PipelineSession()
role = "arn:aws:iam::123456789012:role/SageMakerExecutionRole"
# Use pre-built pipeline template
pipeline_instance = XGBoostTrainingSimplePipeline(
config_path="config.json",
sagemaker_session=pipeline_session,
execution_role=role,
enable_mods=False, # Regular pipeline
validate=True
)
# Generate the pipeline
pipeline = pipeline_instance.generate_pipeline()
# Deploy to SageMaker
pipeline.upsert()
print(f"β
Pipeline '{pipeline.name}' deployed successfully!")from cursus.core import PipelineDAGCompiler
from cursus.api import PipelineDAG
from cursus.pipeline_catalog.shared_dags.xgboost import create_xgboost_simple_dag
from sagemaker.workflow.pipeline_context import PipelineSession
# Create DAG using shared DAG definitions
dag = create_xgboost_simple_dag()
# Set up SageMaker session
pipeline_session = PipelineSession()
role = "arn:aws:iam::123456789012:role/SageMakerExecutionRole"
# Use compiler for more control
compiler = PipelineDAGCompiler(
config_path="config.json",
sagemaker_session=pipeline_session,
role=role
)
# Preview resolution before compilation
preview = compiler.preview_resolution(dag)
for node, config_type in preview.node_config_map.items():
confidence = preview.resolution_confidence.get(node, 0.0)
print(f" {node} β {config_type} (confidence: {confidence:.2f})")
# Compile the pipeline
pipeline = compiler.compile(dag, pipeline_name="my-pipeline")pip install cursusIncludes basic DAG compilation and SageMaker integration.
pip install cursus[pytorch] # PyTorch Lightning models
pip install cursus[xgboost] # XGBoost training pipelines
pip install cursus[nlp] # NLP models and processing
pip install cursus[processing] # Advanced data processingpip install cursus[dev] # Development tools
pip install cursus[docs] # Documentation tools
pip install cursus[all] # Everything included- Focus on model development, not infrastructure complexity
- Rapid experimentation with 10x faster iteration
- Business-focused interface eliminates SageMaker expertise requirements
- 60% less code to maintain and debug
- Specification-driven architecture prevents common errors
- Universal patterns enable faster team onboarding
- Accelerated innovation with faster pipeline development
- Reduced technical debt through clean architecture
- Built-in governance and compliance frameworks
Your gateway to all Cursus documentation - start here for comprehensive navigation
- Zettelkasten Principles - The knowledge management principles behind our slipbox documentation system, explaining how we organize and connect information for maximum discoverability and organic growth
- Developer Guide - Comprehensive guide for developing new pipeline steps and extending Cursus
- Design Documentation - Detailed architectural documentation and design principles
- Pipeline Catalog - Comprehensive collection of prebuilt pipeline templates organized by framework and task
- API Reference - Detailed API documentation including core, api, steps, and other components
- Examples - Ready-to-use pipeline blueprints and examples
- Getting Started - Start here for adding new pipeline steps
- Design Principles - Core architectural principles
- Best Practices - Recommended development practices
- Component Guide - Overview of key components
- Validation System - Comprehensive validation framework for pipeline alignment and quality assurance
We welcome contributions! See our Developer Guide for comprehensive details on:
- Prerequisites - What you need before starting development
- Creation Process - Step-by-step process for adding new pipeline steps
- Validation Checklist - Comprehensive checklist for validating implementations
- Common Pitfalls - Common mistakes to avoid
For architectural insights and design decisions, see the Design Documentation.
This project is licensed under the MIT License - see the LICENSE file for details.
- GitHub: https://github.com/TianpeiLuke/cursus
- Issues: https://github.com/TianpeiLuke/cursus/issues
- PyPI: https://pypi.org/project/cursus/
Cursus: Making SageMaker pipeline development 10x faster through intelligent automation. π