Lists (9)
Sort Name ascending (A-Z)
Stars
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Lear…
[ICLR 2026] Uni-X: Mitigating Modality Conflict with a Two-End-Separated Architecture for Unified Multimodal Models
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
[ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" and "SparseVLM+: Visual Token Sparsification with Improved Text-Vis…
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
[ICLR 2026 Oral] Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
The agent that grows with you
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Modern, minimalist LaTeX Beamer theme for scientific presentations.
A collection of DESIGN.md files analysis by popular brand design systems. Drop one into your project and let coding agents generate a matching UI.
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
A single CLAUDE.md file to improve Claude Code behavior, derived from Andrej Karpathy's observations on LLM coding pitfalls.
🐧 Unify-Agent: An end-to-end unified multimodal agent for faithful, knowledge-grounded image generation.
Unified Codebase for Advanced World Models.
Data-centric LLM training with dynamic sample selection, domain mixture optimization, and example reweighting inside the LLaMA-Factory training loop.
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incen…
Drawing Bayesian networks, graphical models, tensors, technical frameworks, and illustrations in LaTeX.
[ACMMM 2024] AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
Spec-driven development (SDD) for AI coding assistants.
Interactively explore unstructured datasets from your dataframe.
Give your agents the power of the Hugging Face ecosystem
Code release for our NeurIPS 2024 Spotlight paper "GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing"
The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"