-
Peking University
- Guangdong
-
10:17
(UTC +08:00)
Highlights
- Pro
Stars
[HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
PyTorch native quantization and sparsity for training and inference
Sparse Inferencing for transformer based LLMs
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
🚀 Efficient implementations for emerging model architectures
A machine learning accelerator core designed for energy-efficient AI at the edge.
Synthesisable SystemVerilog implementation of a Transformer Decoder block
Parameterised AXI4 crossbar interconnect in SystemVerilog — N masters, M slaves, round-robin arbitration, ID-based response routing
Simple, safe way to store and distribute tensors
Unified KV cache management for multi-task VLA inference.
A very simple and easy to understand RISC-V core.
Vision–Language–Action models for Autonomous Driving (VLA4AD) resources, serving as the companion repository to the survey paper “A Survey on Vision–Language–Action Models for Autonomous Driving”.
The official NaplesPU hardware code repository
A curated list of academic papers and resources on Vision-Language-Action (VLA) and World Action Models (WAM)
CUDA Templates and Python DSLs for High-Performance Linear Algebra
🏆 OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond — redefining the accuracy-efficiency Pareto front for X-LLMs KV quantization.
⚡ Clash for Lab 是为实验室环境设计的科学上网工具,无需sudo权限,优雅地一键式脚本安装
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving (ICLR 2026)
Graphs that teach > graphs that impress. Turn any code into an interactive knowledge graph you can explore, search, and ask questions about. Works with Claude Code, Codex, Cursor, Copilot, Gemini C…
An agentic skills framework & software development methodology that works.
A paper list of some recent works about Token Compress for Vit and VLM
F1: A Vision Language Action Model Bridging Understanding and Generation to Actions
A curated collection of papers on Vision-Language-Action (VLA) models for autonomous driving and robotics
Ramulator 2.0 is a modern, modular, extensible, and fast cycle-accurate DRAM simulator. It provides support for agile implementation and evaluation of new memory system designs (e.g., new DRAM stan…
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.
Index of hardware design repositories — CPUs, arithmetic units, SoC design, HDL, and power electronics