Skip to content
View cs-qyzhang's full-sized avatar

Highlights

  • Pro

Block or report cs-qyzhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

107 stars written in Python
Clear filter

AG2 (formerly AutoGen): The Open-Source AgentOS. Join us at: https://discord.gg/pAbnFJrkgZ

Python 3,774 490 Updated Nov 6, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,765 295 Updated Nov 6, 2025

Computer Networks: A Systems Approach -- Textbook

Python 3,251 249 Updated Jul 20, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,173 213 Updated Nov 4, 2025

FlameScope is a visualization tool for exploring different time ranges as Flame Graphs.

Python 3,088 176 Updated Oct 6, 2023

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,165 216 Updated Oct 8, 2024
Python 2,106 177 Updated Nov 4, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 1,977 220 Updated Nov 5, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,951 116 Updated Apr 3, 2025

Official repository for the BPF Performance Tools book

Python 1,760 299 Updated Apr 30, 2024

An Open Large Reasoning Model for Real-World Solutions

Python 1,525 80 Updated May 30, 2025

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,356 115 Updated Oct 9, 2025

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,147 63 Updated Sep 30, 2025

LongBench v2 and LongBench (ACL 25'&24')

Python 1,008 107 Updated Jan 15, 2025

[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python 880 45 Updated Jul 10, 2025

Git Repository for Fraunces Font Family

Python 654 22 Updated Oct 21, 2025

SPy language

Python 631 36 Updated Nov 4, 2025

Simple Inkscape Scripting

Python 401 35 Updated Oct 28, 2025

[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark

Python 391 13 Updated Jul 9, 2024

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

Python 387 36 Updated Apr 20, 2024

Analysis leveldb source code step by step

Python 366 73 Updated Nov 11, 2024

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 357 21 Updated Sep 15, 2025

Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718

Python 355 29 Updated Sep 25, 2024

Extracts the historic word occurrence of a search term in academic papers

Python 327 89 Updated Feb 11, 2024

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 286 34 Updated Jun 10, 2025
Python 286 23 Updated Jul 10, 2025

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 269 18 Updated May 1, 2025

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 238 16 Updated Dec 16, 2024