-
renmin university of china
- beijing
-
00:06
(UTC +08:00)
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Stars
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Elevate your AI research writing, no more tedious polishing ✨
[ICLR 2026] Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
GLM-Image: Auto-regressive for Dense-knowledge and High-fidelity Image Generation.
EvoToken-DLM (Beyond Hard Masks: Progressive Token Evolution for Diffusion Language)
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
dInfer: An Efficient Inference Framework for Diffusion Language Models
Official Implementation for the paper "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning"
Simple MoE - Day 17 of 365 Days of Repos
SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language model(1.7B, 4B, 8B, 30B)
Official PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT
[ICLR'26] Official PyTorch implementation of "Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models".
Awesome Unified Multimodal Models
Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inference
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.
Witness the aha moment of VLM with less than $3.
OLMoE: Open Mixture-of-Experts Language Models
An open-source AI agent that brings the power of Gemini directly into your terminal.
[NeurIPS 2025 Spotlight] ReasonFlux (long-CoT), ReasonFlux-PRM (process reward model) and ReasonFlux-Coder (code generation)
OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871
🚀 Lightning-fast computer vision models. Fine-tune SOTA models with just a few lines of code. Ready for cloud ☁️ and edge 📱 deployment.