Skip to content
View wangbluo's full-sized avatar
  • colossalai
  • Singapore

Block or report wangbluo

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MiroThinker is a deep research agent optimized for complex research and prediction tasks. Our latest models, MiroThinker-1.7, achieves 74.0 and 75.3 on the BrowseComp and BrowseComp Zh, respectively.

Python 8,149 629 Updated Apr 25, 2026

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 187 50 Updated Apr 8, 2026

Fast and memory-efficient exact attention

Python 23,802 2,725 Updated May 15, 2026

[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Python 231 13 Updated Apr 10, 2026

Sampling profiler for Python programs

Rust 15,191 518 Updated May 12, 2026

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

Python 21,332 3,873 Updated May 16, 2026

MiroTrain is an efficient and algorithm-first framework research agent.

Python 142 17 Updated Aug 27, 2025
Python 1 Updated Sep 30, 2025
Python 1 Updated Sep 17, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 832 228 Updated Apr 2, 2026

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 155 24 Updated Aug 21, 2025

a static analytical model for LLM distributed training

Python 132 19 Updated May 11, 2026

Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialized for Physical AI applications.

Python 422 61 Updated May 14, 2026
Jupyter Notebook 1 1 Updated Jul 7, 2025

Efficient Triton Kernels for LLM Training

Python 6,354 528 Updated May 16, 2026

A PyTorch native platform for training generative AI models

Python 5,348 820 Updated May 16, 2026

Development repository for the Triton language and compiler

MLIR 19,190 2,852 Updated May 16, 2026

A fast MoE impl for PyTorch

Python 1,850 206 Updated Feb 10, 2025

Using megatron style to do TP training.

Python 2 Updated Oct 14, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 28,975 2,952 Updated Apr 9, 2026

Test the GPU bandwidth of collectives operators like all-reduce, all-gather, broadcast and all-to-all primitives on single-node multi-GPU (2, 4, 8 cards) and multi-node multi-GPU (16 cards) setups,…

Python 2 Updated Oct 21, 2024

Making large AI models cheaper, faster and more accessible

Python 1 Updated Jun 3, 2025

Build a llama fine-tuning script from scratch using PyTorch and transformers API. It needs to support 4 optional features: gradient checkpointing, mixed precision, data parallelism, tensor parallel…

Python 3 Updated Sep 20, 2024

Making large AI models cheaper, faster and more accessible

Python 41,382 4,511 Updated May 11, 2026