-
FreeLancer
- Beijing, China
Starred repositories
Fast Hadamard transform in CUDA, with a PyTorch interface
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Universal LLM Deployment Engine with ML Compilation
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
A collection of AWESOME things about mixture-of-experts
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"
Accessible large language models via k-bit quantization for PyTorch.
Running large language models on a single GPU for throughput-oriented scenarios.
Hackable and optimized Transformers building blocks, supporting a composable construction.
Experiments on speculative sampling with Llama models
Large Language Model Text Generation Inference
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
Fast and memory-efficient exact attention
🎙️🤖Create, Customize and Talk to your AI Character/Companion in Realtime (All in One Codebase!). Have a natural seamless conversation with AI everywhere (mobile, web and terminal) using LLM OpenAI …
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A framework for few-shot evaluation of language models.
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Instant neural graphics primitives: lightning fast NeRF and more
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models