-
colossalai
- Singapore
Stars
MiroThinker is a deep research agent optimized for complex research and prediction tasks. Our latest models, MiroThinker-1.7, achieves 74.0 and 75.3 on the BrowseComp and BrowseComp Zh, respectively.
fanshiqing / grouped_gemm
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Fast and memory-efficient exact attention
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
MiroTrain is an efficient and algorithm-first framework research agent.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.
a static analytical model for LLM distributed training
Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.
Efficient Triton Kernels for LLM Training
A PyTorch native platform for training generative AI models
Development repository for the Triton language and compiler
Using megatron style to do TP training.
Open-Sora: Democratizing Efficient Video Production for All
Test the GPU bandwidth of collectives operators like all-reduce, all-gather, broadcast and all-to-all primitives on single-node multi-GPU (2, 4, 8 cards) and multi-node multi-GPU (16 cards) setups,…
wangbluo / ColossalAI
Forked from hpcaitech/ColossalAIMaking large AI models cheaper, faster and more accessible
Build a llama fine-tuning script from scratch using PyTorch and transformers API. It needs to support 4 optional features: gradient checkpointing, mixed precision, data parallelism, tensor parallel…
Making large AI models cheaper, faster and more accessible