-
verl Public
Forked from volcengine/verlverl: Volcano Engine Reinforcement Learning for LLMs
Python Apache License 2.0 UpdatedAug 28, 2025 -
-
sglang Public
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedJul 18, 2025 -
PP-Schedule-Visualization Public
Forked from Victarry/PP-Schedule-VisualizationPipeline Parallelism Emulation and Visualization
Python MIT License UpdatedApr 21, 2025 -
-
-
TransformerEngine Public
Forked from NVIDIA/TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Python Apache License 2.0 UpdatedMar 12, 2025 -
-
less_slow.cpp Public
Forked from ashvardanian/less_slow.cppLearning how to write "Less Slow" code in C++20, from numerical micro-kernels and SIMD to coroutines, ranges, and polymorphic state machines
C++ UpdatedJan 9, 2025 -
NeighborHash Public
Forked from slow-steppers/NeighborHashA faster int-to-int hashmap implemented in C++.
C++ MIT License UpdatedJan 6, 2025 -
RLCoder Public
Forked from DeepSoftwareAnalytics/RLCoderReinforcement Learning for Repository-Level Code Completion
Python UpdatedAug 19, 2024 -
-
-
-
lightllm Public
Forked from ModelTC/LightLLMLightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Python Apache License 2.0 UpdatedApr 1, 2024 -
gnNHWC Public
Forked from latentCall145/channels-last-groupnormA CUDA kernel for NHWC GroupNorm for PyTorch
Cuda MIT License UpdatedFeb 21, 2024 -
SGEMM_CUDA Public
Forked from siboehm/SGEMM_CUDAFast CUDA matrix multiplication from scratch
Cuda MIT License UpdatedJan 26, 2024 -
streamlit_goodreads_app Public
Forked from tylerjrichards/streamlit_goodreads_appPython UpdatedJan 11, 2024 -
stable-fast Public
Forked from chengzeyi/stable-fastBest inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Python MIT License UpdatedDec 21, 2023 -
LLMSys-PaperList Public
Forked from AmberLJC/LLMSys-PaperListLarge Language Model Systems Paper List
UpdatedDec 4, 2023 -
rags Public
Forked from run-llama/ragsBuild ChatGPT over your data, all with natural language
Python MIT License UpdatedNov 26, 2023 -
-
-
-
CBRpc Public
Forked from Gooddbird/tinyrpcc++ async rpc framework. 14w+qps.
C++ Apache License 2.0 UpdatedNov 5, 2023 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedOct 20, 2023 -
hardware-effects-gpu Public
Forked from Kobzol/hardware-effects-gpuDemonstration of various hardware effects on CUDA GPUs.
C++ MIT License UpdatedOct 19, 2023 -
awesome-distributed-ml Public
Forked from Shenggan/awesome-distributed-mlA curated list of awesome projects and papers for distributed training or inference
1 UpdatedOct 18, 2023 -
-
LLaMA-Megatron Public
Forked from alibaba/Megatron-LLaMABest practice for training LLaMA models in Megatron-LM
Python Other UpdatedOct 10, 2023