-
dash-infer Public
Forked from modelscope/dash-inferDashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including CUDA, x86 and ARMv9.
C Apache License 2.0 UpdatedApr 13, 2025 -
sglang Public
Forked from sgl-project/sglangSGLang is yet another fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedJan 8, 2025 -
vllm Public
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedOct 13, 2024 -
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedOct 11, 2024 -
AutoAWQ Public
Forked from casper-hansen/AutoAWQAutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Python MIT License UpdatedJul 30, 2024 -
Open standard for machine learning interoperability
PureBasic MIT License UpdatedFeb 8, 2023 -
-
-
YHs_Sample Public
Forked from Yinghan-Li/YHs_SampleYinghan's Code Sample
Cuda GNU General Public License v3.0 UpdatedFeb 18, 2022 -
CppTemplateTutorial Public
Forked from wuye9036/CppTemplateTutorial中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
C++ UpdatedDec 28, 2021 -
optimizer Public
Forked from onnx/optimizerActively maintained ONNX Optimizer
Python Apache License 2.0 UpdatedFeb 20, 2021