Highlights
- Pro
Lists (9)
Sort Name ascending (A-Z)
- All languages
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Cuda
- Dart
- Dockerfile
- Fortran
- Go
- HTML
- Hack
- Haskell
- Java
- JavaScript
- Jinja
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MDX
- MLIR
- Makefile
- Markdown
- OCaml
- Objective-C++
- PHP
- PLpgSQL
- PowerShell
- Python
- Rich Text Format
- Roff
- Rust
- SWIG
- Scala
- Shell
- Starlark
- Swift
- TeX
- TypeScript
- Verilog
- Vue
- XSLT
- Yacc
- Zig
Starred repositories
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.
QLoRA: Efficient Finetuning of Quantized LLMs
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
Collective communications library with various primitives for multi-machine training.
pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存的使用
Unicode routines (UTF8, UTF16, UTF32) and Base64: billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension, LoongArch64, POWER. Part of Node.js, WebKit/Safari, Lad…
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
A compact binary encoding for geographic data.
Parallel algorithms and data structures for tree-based adaptive mesh refinement (AMR) with arbitrary element shapes.
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Ongoing research training transformer models at scale
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Scalable toolkit for efficient model alignment
Fast, Flexible and Portable Structured Generation
The Art of Writing Efficient Programs, published by Packt
brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…
Solve Visual Understanding with Reinforced VLMs
Library for lifting machine code to LLVM bitcode
A TensorRT and C++ based deployment of FoundationPose, which makes integration lightweight and efficient. Supports Jetson Orin. Adapted from nvidia_isaac_pose_esitimation.