-
The Chinese University of Hong Kong, Shenzhen
- China
-
15:26
(UTC +08:00) - https://markwwen.github.io
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
- All languages
- AppleScript
- Batchfile
- BibTeX Style
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Cuda
- Cython
- D
- Dart
- Dockerfile
- Go
- Groovy
- HTML
- Haskell
- Java
- JavaScript
- Julia
- Jupyter Notebook
- Kotlin
- Lex
- Logos
- Lua
- MATLAB
- MDX
- Makefile
- Markdown
- Mermaid
- Objective-C
- PHP
- PowerShell
- Python
- R
- ReScript
- Roff
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Swift
- TeX
- TypeScript
- Typst
- Verilog
- Vim Script
- Vue
- reStructuredText
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
The official repo for the paper Direct Multi-token Decoding
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
🚀 Efficient implementations of state-of-the-art linear attention models
[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
HArmonizedSS / HASS
Forked from SafeAILab/EAGLEOfficial Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)
Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Summary of some awesome work for optimizing LLM inference
[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Mimosa-Lin / SpecForge
Forked from sgl-project/SpecForgeTrain speculative decoding models effortlessly and port them smoothly to SGLang serving.
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
PyTorch Code for Energy-Based Transformers paper -- generalizable reasoning and scalable learning
Official Schlably Repository by the Institute for TMDT
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)