Lists (1)
Sort Name ascending (A-Z)
Stars
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
bcacdwk / vllmbench
Forked from vllm-project/vllmEnd to end benchmark using Slidesparse GEMM kernels
🧠「大模型」2小时完全从0训练64M的小参数LLM!Train a 64M-parameter LLM from scratch in just 2h!
Accessible large language models via k-bit quantization for PyTorch.
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
An open-source AI agent that brings the power of Gemini directly into your terminal.
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficien…
Official repository of paper titled "CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications"
real time face swap and one-click video deepfake with only a single image
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework
PyTorch Tutorial for Deep Learning Researchers
how to learn PyTorch and OneFlow
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
A coding-free framework built on PyTorch for reproducible deep learning studies. PyTorch Ecosystem. 🏆26 knowledge distillation methods presented at TPAMI, CVPR, ICLR, ECCV, NeurIPS, ICCV, AAAI, etc…
A collection of design patterns/idioms in Python
[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
An official implementation of "Network Quantization with Element-wise Gradient Scaling" (CVPR 2021) in PyTorch.