Skip to content
View xutianming's full-sized avatar
  • FreeLancer
  • Beijing, China

Block or report xutianming

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Fast Hadamard transform in CUDA, with a PyTorch interface

C 280 50 Updated Oct 19, 2025

Go ahead and axolotl questions

Python 11,237 1,249 Updated Feb 4, 2026

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 13,137 1,403 Updated Feb 4, 2026

[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Python 720 34 Updated Dec 2, 2024

我的电视 电视直播软件,安装即可使用

C 32,217 3,617 Updated Jun 20, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,183 569 Updated Aug 22, 2025
Python 33 3 Updated Jun 6, 2023

Universal LLM Deployment Engine with ML Compilation

Python 21,993 1,927 Updated Feb 3, 2026

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 888 73 Updated Nov 26, 2025

A collection of AWESOME things about mixture-of-experts

1,258 83 Updated Dec 8, 2024

A fast MoE impl for PyTorch

Python 1,831 200 Updated Feb 10, 2025
Python 353 45 Updated Apr 2, 2024

A simple and effective LLM pruning approach.

Python 846 120 Updated Aug 9, 2024

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

Python 1,105 130 Updated Oct 7, 2024

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Python 323 25 Updated Mar 4, 2025

Accessible large language models via k-bit quantization for PyTorch.

Python 7,935 818 Updated Jan 22, 2026

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,384 591 Updated Oct 28, 2024

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,324 765 Updated Jan 26, 2026

Inference Llama 2 in one file of pure C

C 19,151 2,442 Updated Aug 6, 2024

Experiments on speculative sampling with Llama models

Python 128 8 Updated Jun 8, 2023

Large Language Model Text Generation Inference

Python 10,752 1,254 Updated Jan 8, 2026

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,138 626 Updated Feb 4, 2026

Fast and memory-efficient exact attention

Python 22,092 2,348 Updated Feb 5, 2026

🎙️🤖Create, Customize and Talk to your AI Character/Companion in Realtime (All in One Codebase!). Have a natural seamless conversation with AI everywhere (mobile, web and terminal) using LLM OpenAI …

JavaScript 6,199 778 Updated Jan 20, 2026

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,639 736 Updated Jan 22, 2026

A framework for few-shot evaluation of language models.

Python 11,361 3,018 Updated Feb 3, 2026

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 17,645 2,871 Updated Nov 3, 2025

Instant neural graphics primitives: lightning fast NeRF and more

Cuda 17,248 2,050 Updated Feb 2, 2026

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,600 195 Updated Jul 12, 2024
Next