-
KAIST
- Seoul, Republic of Korea
- https://rootyjeon.github.io/
- in/byungwoo-jeon-53224420a
- @rootyjeon
Stars
A 3B-active-parameter native unified multimodal model for image and video understanding, generation, and editing.
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related webs…
Official codebase for the paper Latent Visual Reasoning
A paper list for spatial reasoning
A paper list of some recent works about Token Compress for Vit and VLM
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
NVIDIA Isaac GR00T N1.7 - A Foundation Model for Generalist Robots.
Famous Vision Language Models and Their Architectures
Tips for Writing a Research Paper using LaTeX
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
[ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning
the resources I use to learn computer science in my spare time
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Main Web Site (Online Books)
EPFL Course - Optimization for Machine Learning - CS-439
Gaussian Splatting from VGGSfM and Mast3r, and their comparison
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Summer 2026 software engineering, data science, AI, quant, product management, and hardware internship postings. Updated daily by Simplify and Pitt CSC.
2024 Gaussian Splatting Paper List(Arxiv)