-
Anyscale
- United States
-
DeepEP Public
Forked from deepseek-ai/DeepEPDeepEP: an efficient expert-parallel communication library
Cuda MIT License UpdatedApr 29, 2025 -
mini-redis Public
Forked from tokio-rs/mini-redisIncomplete Redis client and server implementation using Tokio - for learning purposes only
Rust MIT License UpdatedApr 29, 2024 -
-
-
openmlsys-zh Public
Forked from openmlsys/openmlsys-zh《Machine Learning Systems: Design and Implementation》- Chinese Version
TeX UpdatedMar 6, 2024 -
-
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedFeb 8, 2024 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedJan 17, 2024 -
lightllm Public
Forked from ModelTC/LightLLMLightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Python Apache License 2.0 UpdatedDec 27, 2023 -
how-to-optim-algorithm-in-cuda Public
Forked from BBuf/how-to-optim-algorithm-in-cudahow to optimize some algorithm in cuda.
Cuda UpdatedDec 24, 2023 -
cutlass-kernels Public
Forked from ColfaxResearch/cutlass-kernelsCuda MIT License UpdatedDec 20, 2023 -
ray Public
Forked from ray-project/rayAn open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyp…
-
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ Apache License 2.0 UpdatedDec 11, 2023 -
-
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
-
The-Art-of-Linear-Algebra Public
Forked from kenjihiranabe/The-Art-of-Linear-AlgebraGraphic notes on Gilbert Strang's "Linear Algebra for Everyone"
PostScript Creative Commons Zero v1.0 Universal UpdatedNov 30, 2023 -
ScaleLLM Public
Forked from vectorch-ai/ScaleLLMA high-performance inference system for large language models, designed for production environments.
C++ Apache License 2.0 UpdatedNov 23, 2023 -
grouped_gemm Public
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Cuda Apache License 2.0 UpdatedNov 17, 2023 -
awesome-tensor-compilers Public
Forked from merrymercy/awesome-tensor-compilersA list of awesome compiler projects and papers for tensor computation and deep learning.
1 UpdatedOct 19, 2023 -
lmdeploy Public
Forked from InternLM/lmdeployLMDeploy is a toolkit for compressing, deploying, and serving LLMs.
C++ Apache License 2.0 UpdatedOct 5, 2023 -
-
FasterTransformer Public
Forked from NVIDIA/FasterTransformerTransformer related optimization, including BERT, GPT
C++ Apache License 2.0 UpdatedSep 8, 2023 -
-
-
-
-
Lightrails Public
Yet another distributed training/inferencing framework.
Apache License 2.0 UpdatedApr 11, 2023 -
nanoGPT Public
Forked from karpathy/nanoGPTThe simplest, fastest repository for training/finetuning medium-sized GPTs.
Python MIT License UpdatedMar 25, 2023 -
Megatron-LM Public
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
Python Other UpdatedMar 25, 2023 -
og-equity-compensation Public
Forked from jlevy/og-equity-compensationStock options, RSUs, taxes — read the latest edition: www.holloway.com/ec
UpdatedOct 15, 2021