Starred repositories
torchcomms: a modern PyTorch communications API
Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
SGLang is a fast serving framework for large language models and vision language models.
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Train a 1B LLM with 1T tokens from scratch by personal
Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。
Reference PyTorch implementation and models for DINOv3
A single-file educational implementation for understanding vLLM's core concepts and running LLM inference.
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
[ICCV2025] PyTorch implementation of "Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models"
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
A high-throughput and memory-efficient inference and serving engine for LLMs
[CVPR 2025] Official implementation for "Empowering LLMs to Understand and Generate Complex Vector Graphics" https://arxiv.org/abs/2412.11102
从无名小卒到大模型(LLM)大英雄~ 欢迎关注后续!!!
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Solve Visual Understanding with Reinforced VLMs
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis (ICCV, 2025)
[ICLR 2025] Animate-X: Universal Character Image Animation with Enhanced Motion Representation
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
Exploiting unlabeled data with vision and language models for object detection, ECCV 2022
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding