Welcome to the Dynamo Workshop! This hands-on workshop will guide you through deploying and managing NVIDIA's Dynamo - a datacenter-scale distributed inference serving framework.
Dynamo is a high-performance, distributed LLM inference serving framework that supports multiple backends (vLLM, SGLang, TensorRT-LLM) and is designed for datacenter-scale deployments. This workshop focuses on Kubernetes-based deployments and advanced features.
- Understanding Dynamo architecture
- Setting up local development environment
- Docker-based aggregated deployment
- Backend engine selection (vLLM/SGLang/TensorRT-LLM)
- Benchmarking with AI-Perf
- Deploying Dynamo on Kubernetes
- Installing Dynamo operator
- Aggregated and disaggregated serving topologies
- Multi-model deployment with shared frontend
- AI Configurator for optimal configurations
- Lab 3.1: Expert Parallelism Foundations
- Understanding parallelism strategies (DP, TP, PP, SP, EP)
- MoE architecture and expert routing mechanisms
- Wide EP, Deep EP, and Dynamic EP (EPLB)
- Large-scale deployment insights (GB200 NVL72)
- Lab 3.2: Wide EP Production Deployment
- Kubernetes deployment with Dynamo Operator
- Multi-node SGLang and TensorRT-LLM configurations
- EPLB load balancing strategies
- Monitoring and troubleshooting
Before starting this workshop, ensure you have:
-
Local Setup:
- Python 3.10 or higher
- Docker and Docker Compose
kubectlcommand-line tool- Access to a GPU (for local testing)
-
Kubernetes Cluster:
- Access to a Kubernetes cluster with GPU nodes
kubectlconfigured to access your cluster- Sufficient GPU resources (recommended: 2+ NVIDIA GPUs)
-
Knowledge:
- Basic understanding of Kubernetes concepts
- Familiarity with Python
- Basic understanding of LLMs and inference
For Workshop Participants: If you're attending a workshop using a shared workstation, follow these instructions instead of the "Getting Started" options below.
The workshop uses a shared workstation where:
- ~30 users run concurrently with individual user accounts
- Each user has unique port assignments based on UID (1000-1040)
- JupyterLab and services run natively (bare metal, not Docker)
- Python environments managed with
uvfor fast dependency installation - Kubernetes namespaces are pre-created per user
Your ports are automatically calculated based on your UID:
| Service | Port Formula | Example (UID=1005) |
|---|---|---|
| JupyterLab | 8888 + (UID - 1000) | 8893 |
| Frontend (Lab 1) | 10000 + (UID - 1000) | 10005 |
| Frontend (Lab 2) | 11000 + (UID - 1000) | 11005 |
| Prometheus | 19090 + (UID - 1000) | 19095 |
| Grafana | 13000 + (UID - 1000) | 13005 |
Note: Your environment variables are automatically set via workshop-env.sh which is sourced in your .bashrc.
ssh your-username@workshop-hostnamecd ~/dynamo-workshop
./start-workshop.shThis will:
- Load your user-specific port configuration
- Activate your Python virtual environment (managed by uv)
- Start JupyterLab on your assigned port
- Display connection information
From your local machine (not the workstation), run:
# Automatic setup (recommended)
./setup-tunnels.sh your-username@workshop-hostname
# Or manually (see TUNNELING.md for details)The tunnel script will:
- Detect your remote UID
- Calculate your assigned ports
- Set up all necessary SSH tunnels
- Display connection URLs
Open your browser to: http://localhost:8888
(The local port is always 8888, regardless of your remote port)
Navigate to lab1/lab1-introduction-setup.md in JupyterLab and begin!
At any time, you can check your port assignments:
# On the workstation
source ~/dynamo-workshop/workshop-env.sh
# Or run the verification script
~/dynamo-workshop/check-ports.shFor detailed SSH tunneling instructions, including:
- Manual setup
- VS Code Remote SSH configuration
- Troubleshooting port conflicts
- Keeping tunnels alive
See: TUNNELING.md
The workshop uses uv for Python package management:
- Fast dependency installation (much faster than pip)
- Virtual environments pre-created via Ansible
- All requirements pre-installed
- To add packages:
uv pip install package-name
Port conflicts:
# Check what's using your ports
./check-ports.sh
# Kill stuck processes
pkill -f 'jupyter.*YOUR_PORT'
pkill -f 'port-forward'Environment issues:
# Reload environment
source ~/dynamo-workshop/workshop-env.sh
# Reinstall dependencies
cd ~/dynamo-workshop
uv pip install -r requirements.txtFor Administrators: See ANSIBLE_PROVISIONING.md for setup instructions.
Note: These options are for single-user local development. Workshop participants should use the "Multi-User Shared Workstation Setup" above.
- Clone this repository:
git clone <repository-url>
cd dynamo-workshop- Start the workshop environment:
./start-workshop.sh-
Open your browser to
http://localhost:8888 -
Navigate to the lab notebooks and start with Lab 1
-
When finished, stop the environment:
docker compose down- Clone this repository:
git clone <repository-url>
cd dynamo-workshop- Install dependencies:
pip install -r requirements.txt- Start JupyterLab:
jupyter lab- Navigate to Lab 1 in the JupyterLab interface and start with
lab1/lab1-introduction-setup.ipynb
| Lab | Topic | Duration | Notebook |
|---|---|---|---|
| Lab 1 | Introduction and Docker-Based Deployment | 90 min | lab1-introduction-setup.ipynb |
| Lab 2 | Advanced Kubernetes Deployment | 120 min | lab2-kubernetes-deployment.ipynb |
| Lab 3.1 | Expert Parallelism Foundations | 45-60 min | lab3.1-expert-parallelism-foundations.ipynb |
| Lab 3.2 | Wide EP Production Deployment | 60-90 min | lab3.2-wide-ep-deployment.ipynb |
For issues or questions:
- Check the Dynamo GitHub Issues
- Refer to the official documentation
- Ask your workshop instructor
This workshop is provided under the Apache-2.0 license, consistent with the Dynamo project.