Stars
checkpoint-restore / kubernetes
Forked from kubernetes/kubernetesProduction-Grade Container Scheduling and Management
Mandatory Access Control for AI agent tool invocations. SELinux-inspired policy engine with OPA/Rego, Kubernetes CRD integration, and embedded binary enforcement.
A Datacenter Scale Distributed Inference Serving Framework
SNAP is a comprehensive container checkpointing and migration platform that enables you to capture, save, and restore running container states.
A tool for coordinated checkpoint/restore of distributed applications with CRIU
Static build of CRIU (Checkpoint/Restore tool)
prime is a framework for efficient, globally distributed training of AI models over the internet.
Sniff CUDA ioctls
Transform the CRIU image between different architectures for vanilla code.
NviWatch: A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU processes
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
GSoC 2024 Project: P4-Enabled Container Migration in Kubernetes
Confidential Containers Community
Encryption libraries for Encrypted OCI Container images
This artifact accompanies the paper 'Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts,' which has been accepted for presentation at EuroSys'24.
A simple, modern and secure encryption tool (and Go library) with small explicit keys, no config options, and UNIX-style composability.