Skip to content
View cyx0406's full-sized avatar

Block or report cyx0406

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 4 1 Updated Feb 13, 2021

[ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Python 17 1 Updated Jun 3, 2025

同时兼容 Mac 和 Windows 的常用字体

282 29 Updated May 19, 2017

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 148 13 Updated Nov 11, 2025

GPU documentation for humans

Python 415 51 Updated Dec 9, 2025

[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models

Python 34 4 Updated Nov 4, 2025

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 176 8 Updated Dec 16, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,016 584 Updated Dec 20, 2025

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 135 79 Updated May 29, 2025

A framework to compare low-bit integer and float-point formats

Python 50 5 Updated Nov 1, 2025

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 6,622 379 Updated Jun 2, 2025

This repository contains low-bit quantization papers from 2020 to 2025 on top conference.

83 2 Updated Sep 24, 2025

A selective knowledge distillation algorithm for efficient speculative decoders

31 3 Updated Nov 27, 2025

QeRL enables RL for 32B LLMs on a single H100 GPU.

Python 468 44 Updated Nov 27, 2025

[EMNLP 2025] LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization (Oral)

Python 7 Updated Oct 9, 2025

A collection of research papers on low-precision training methods

51 2 Updated May 10, 2025

Hierarchical Reasoning Model Official Release

Python 12,162 1,779 Updated Sep 9, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,442 1,991 Updated Nov 1, 2025
Jupyter Notebook 216 15 Updated Nov 25, 2025

Kimi K2 is the large language model series developed by Moonshot AI team

9,736 705 Updated Nov 7, 2025

[ICML2025, NeurIPS2025 Spotlight] Sparse VideoGen 1 & 2: Accelerating Video Diffusion Transformers with Sparse Attention

Python 599 31 Updated Dec 9, 2025

[NeurIPS 2025] Radial Attention: O(nlogn) Sparse Attention with Energy Decay for Long Video Generation

Python 565 31 Updated Nov 11, 2025

[ICML 2025 oral] Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Python 39 1 Updated Jun 5, 2025

The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Python 10,641 800 Updated Dec 20, 2025

[NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.

Python 39 Updated Oct 29, 2025

Work in progress.

Jupyter Notebook 75 7 Updated Nov 25, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,818 12,086 Updated Dec 20, 2025

Official repository of Agent Attention (ECCV2024)

Python 654 44 Updated Nov 17, 2024

[SIGGRAPH 2025] One Model to Rig Them All: Diverse Skeleton Rigging with UniRig

Python 1,252 106 Updated Sep 20, 2025
Next