- New York, NY
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
The agent that grows with you
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Official implementation and experiment code for the paper "PETS: Principled and Efficient Test-Time Scaling via Optimal Trajectory Allocation".
BitDance & UniWeTok: Open-source autoregressive model with binary visual tokens. A research project for building powerful multimodal autoregressive model.
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…
Scalable toolkit for efficient model reinforcement
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
[EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
[NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
A comprehensive list of excellent research papers, models, datasets, and other resources on Vision-Language-Action (VLA) models in robotics.
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
Witness the aha moment of VLM with less than $3.
A simple pip-installable Python tool to generate your HTML citation world map from your Google Scholar ID.
web3.0知识整理 web3.0知识 web3.0学习资料 web3工作 web3书籍 web3job 区块链知识 blockchain
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
(NeurIPS 2025) Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
[NeurIPS'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
Official implementation for "MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge?"
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step