Stars
这是一份入门AI/LLM大模型的逐步指南,包含教程和演示代码,带你从API走进本地大模型部署和微调,代码文件会提供Kaggle或Colab在线版本,即便没有显卡也可以进行学习。项目中还开设了一个小型的代码游乐场🎡,你可以尝试在里面实验一些有意思的AI脚本。同时,包含李宏毅 (HUNG-YI LEE)2024生成式人工智能导论课程的完整中文镜像作业。
Recommend new arxiv papers of your interest daily according to your Zotero libarary.
[EMNLP 2025] LightThinker: Thinking Step-by-Step Compression
[ACL 2026 Findings] "Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning"
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
NEO Series: Native Vision-Language Models from First Principles
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
[CVPR2026] Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
Training Large Language Model to Reason in a Continuous Latent Space
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.
Implementation of "Interleaved Latent Visual Reasoning with Selective Perceptual Modeling".
[NeurIPS 2025] Official code for paper: Latent Chain-of-Thought for Visual Reasoning
Official codebase for the paper Latent Visual Reasoning
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
A very simple GRPO implement for reproducing r1-like LLM thinking.
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
This repo contains the code for 1D tokenizer and generator
[CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
A collection of token reduction (token pruning, merging, clustering, etc.) techniques for ML/AI
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding