-
SkyWork
- ChengDu
- www.giantpandacv.com
-
how to optimize some algorithm in cuda.
-
-
sgl-cookbook Public
Forked from sgl-project/sgl-cookbookCookbook of SGLang - Recipe
-
torchtitan Public
Forked from pytorch/torchtitanA PyTorch native platform for training generative AI models
Python BSD 3-Clause "New" or "Revised" License UpdatedMar 16, 2026 -
Panzhihua-Mi-Yi-Pipa Public
If you want to purchase Panzhihua Mi Yi Pipa, please contact me.
-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
-
-
self-llm Public
Forked from datawhalechina/self-llm《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
-
InfiniteTalk Public
Forked from MeiGen-AI/InfiniteTalkUnlimited-length talking video generation that supports image-to-video and video-to-video generation
Python Apache License 2.0 UpdatedFeb 12, 2026 -
-
cache-dit Public
Forked from vipshop/cache-dit🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs: Z-Image, FLUX2, Qwen-Image, etc.
-
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
-
-
tilelang Public
Forked from tile-ai/tilelangDomain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
-
-
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedJul 14, 2025 -
Awesome-ML-SYS-Tutorial Public
Forked from zhaochenyang20/Awesome-ML-SYS-TutorialMy learning notes/codes for ML SYS.
-
-
-
DeepGEMM Public
Forked from deepseek-ai/DeepGEMMDeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Cuda MIT License UpdatedFeb 27, 2025 -
ml-engineering Public
Forked from stas00/ml-engineeringMachine Learning Engineering Open Book
-
-
HunyuanVideo Public
Forked from Tencent-Hunyuan/HunyuanVideoHunyuanVideo: A Systematic Framework For Large Video Generation Model
Python Other UpdatedDec 20, 2024 -
-
ao Public
Forked from pytorch/aoPyTorch native quantization and sparsity for training and inference
-
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
-
TiledCUDA Public
Forked from TiledTensor/TiledCUDATiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
C++ MIT License UpdatedSep 6, 2024 -
-
-