-
Seoul National University
- Seoul, Korea
-
00:18
(UTC +09:00) - https://bellos1203.github.io
Highlights
- Pro
Stars
Native Multimodal Models are World Learners
Official Implementation of "MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation"
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
This is the official repository for the paper "FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark"
Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding
Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)
Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning.
Reference PyTorch implementation and models for DINOv3
Official PyTorch Implementation of "Diffusion Autoencoders are Scalable Image Tokenizers"
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
[ICCV2025] TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
This repo contains the code for 1D tokenizer and generator
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Lets make video diffusion practical!
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Toolkit for linearizing PDFs for LLM datasets/training
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
(ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations. [EMNLP 2022]