-
ByteDance
- Shanghai, China
-
08:59
(UTC +08:00) - https://fangjiarui.github.io/
- https://www.zhihu.com/people/feifeibear
- in/fangjiarui
-
long-context-attention Public
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
-
verl Public
Forked from volcengine/verlverl: Volcano Engine Reinforcement Learning for LLMs
Python Apache License 2.0 UpdatedSep 10, 2025 -
VeOmni Public
Forked from ByteDance-Seed/VeOmniVeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework
Python Apache License 2.0 UpdatedAug 14, 2025 -
-
verl-pipeline Public
Forked from agentica-project/verl-pipelineAsync pipelined version of Verl
Python Apache License 2.0 UpdatedMay 21, 2025 -
DiffSynth-Studio Public
Forked from modelscope/DiffSynth-StudioEnjoy the magic of Diffusion models!
Python Apache License 2.0 UpdatedMar 25, 2025 -
-
-
Comfy-WaveSpeed Public
Forked from chengzeyi/Comfy-WaveSpeed[WIP] The all in one inference optimization solution for ComfyUI, universal, flexible, and fast.
Python MIT License UpdatedJan 9, 2025 -
-
xDiT Public
Forked from xdit-project/xDiTxDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
-
comfy-pack Public
Forked from bentoml/comfy-packA comprehensive toolkit for reliably locking, packing and deploying environments for ComfyUI workflows.
Python Apache License 2.0 UpdatedDec 19, 2024 -
HunyuanVideo Public
Forked from Tencent-Hunyuan/HunyuanVideoHunyuanVideo: A Systematic Framework For Large Video Generation Model Training
Python Other UpdatedDec 5, 2024 -
distrifuser Public
Forked from mit-han-lab/distrifuser[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
-
ChituAttention Public
Quantized Attention on GPU
-
mochi1-models Public
Forked from genmoai/mochiThe best OSS video generation models
-
mochi-xdit Public
Forked from xdit-project/mochi-xditfaster parallel inference of mochi video generation model
Python Apache License 2.0 UpdatedNov 8, 2024 -
piflux Public
Forked from chengzeyi/piflux(WIP) Parallel inference for black-forest-labs' FLUX model.
Python Other UpdatedNov 6, 2024 -
ParaAttention Public
Forked from chengzeyi/ParaAttention[WIP] Context parallel attention that works with torch.compile
Python Other UpdatedNov 6, 2024 -
-
SageAttention Public
Forked from thu-ml/SageAttentionQuantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
-
-
CogVideo Public
Forked from zai-org/CogVideotext and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Python Apache License 2.0 UpdatedSep 30, 2024 -
INT-FlashAttention Public
Forked from INT-FlashAttention2024/INT-FlashAttentionPython UpdatedSep 21, 2024 -
EasyContext Public
Forked from jzhang38/EasyContextMemory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
-
LLMSpeculativeSampling Public
Fast inference from large lauguage models via speculative decoding
-
LLM-Viewer Public
Forked from hahnyuan/LLM-ViewerAnalyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
-
-
FastFold Public
Forked from hpcaitech/FastFoldOptimizing AlphaFold Training and Inference on GPU Clusters
-
SpeculativeDecodingPapers Public
Forked from hemingkx/SpeculativeDecodingPapers📰 Must-read papers and blogs on Speculative Decoding ⚡️