Skip to content
View cs-qyzhang's full-sized avatar

Highlights

  • Pro

Block or report cs-qyzhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Persist and reuse KV Cache to speedup your LLM.

Python 98 32 Updated Nov 5, 2025

The code of our paper "InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory"

Python 386 36 Updated Apr 20, 2024

Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message…

TypeScript 31,363 6,087 Updated Nov 5, 2025
Python 2,101 178 Updated Nov 4, 2025

Try the demo of WebANNS on our GitHub page!

C++ 12 Updated Jul 14, 2025

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 16,927 1,292 Updated Nov 3, 2025

Mamba SSM architecture

Python 16,332 1,479 Updated Oct 10, 2025

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 355 21 Updated Sep 15, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,745 292 Updated Nov 5, 2025

Query-Adaptive Vector Search

C++ 60 12 Updated Nov 3, 2025

SPy language

Python 605 35 Updated Nov 4, 2025

Universal LLM Deployment Engine with ML Compilation

Python 21,564 1,851 Updated Nov 4, 2025

Prompt Orchestration Markup Language

TypeScript 4,714 248 Updated Oct 21, 2025

Lightweight coding agent that runs in your terminal

Rust 49,843 6,157 Updated Nov 5, 2025

Microsoft Azure Traces

Jupyter Notebook 1,019 167 Updated Oct 20, 2025

[ASPLOS'25] Towards End-to-End Optimization of LLM-based Applications with Ayo

Python 48 5 Updated Aug 5, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,105 1,902 Updated Nov 1, 2025

An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.

TypeScript 18,333 2,547 Updated Nov 5, 2025

🚀 The fast, Pythonic way to build MCP servers and clients

Python 19,984 1,459 Updated Nov 5, 2025

Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.

Rust 7,898 637 Updated Nov 5, 2025

This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding code links.

232 7 Updated Jul 29, 2025

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 775 53 Updated Mar 6, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,847 697 Updated Aug 18, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 434 33 Updated May 30, 2025

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 81,560 9,106 Updated Nov 5, 2025

Get started with building Fullstack Agents using Gemini 2.5 and LangGraph

Jupyter Notebook 17,240 2,937 Updated Oct 21, 2025

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

TypeScript 154,260 49,270 Updated Nov 5, 2025

The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.

Python 7,714 631 Updated Oct 27, 2025
Next