Highlights
- Pro
Stars
π¦ Just talk to your agent β it learns and EVOLVES π§¬.
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
[ICML'26] Agent0 Series: Self-Evolving Agents from Zero Data
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
Salesforce AI Research's open diffusion language model
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Codebase for paper ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
[TMLR 2026] Survey: https://arxiv.org/pdf/2507.20198
[NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Models
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
VidKV: Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
Pretraining and inference code for a large-scale depth-recurrent language model
This repository contains the code and released models for the paper Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model, accepted at TMLR.
[CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
A collection of resources on applications of multi-modal learning in medical imaging.
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
A suite of image and video neural tokenizers
Efficient DiT architecture for text2any tasks, ICLR2025
ππ΅π» Yo'LLaVA: Your Personalized Language and Vision Assistant (NeurIPS 2024)
PyTorch extensions for high performance and large scale training.
(NeurIPS 2024 Oral π₯) Improved Distribution Matching Distillation for Fast Image Synthesis
A simple bash script for switching between installed versions of CUDA.
Generic PyTorch dataset implementation to load and augment VIDEOS for deep learning training loops.