Skip to content
View RobertLuo1's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report RobertLuo1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
49 stars written in Jupyter Notebook
Clear filter

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,470 2,337 Updated Dec 25, 2024

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,181 1,578 Updated Jan 30, 2026

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

Jupyter Notebook 8,609 556 Updated Nov 10, 2025

CoreNet: A library for training deep neural networks

Jupyter Notebook 7,016 543 Updated Oct 9, 2025

Taming Transformers for High-Resolution Image Synthesis

Jupyter Notebook 6,425 1,230 Updated Jul 30, 2024

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Jupyter Notebook 5,792 545 Updated Aug 29, 2025

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook 4,309 370 Updated Dec 4, 2025

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Jupyter Notebook 4,294 360 Updated Nov 27, 2025

OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871

Jupyter Notebook 4,020 17 Updated Dec 2, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,914 316 Updated Jun 12, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,395 210 Updated Jan 8, 2026
Jupyter Notebook 2,752 364 Updated May 2, 2025

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter Notebook 2,728 204 Updated Nov 15, 2025

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Jupyter Notebook 1,940 298 Updated Aug 9, 2025

A suite of image and video neural tokenizers

Jupyter Notebook 1,703 85 Updated Feb 11, 2025

An easy/swift-to-adapt PyTorch-Lighting template. 套壳模板,简单易用,稍改原来Pytorch代码,即可适配Lightning。You can translate your previous Pytorch code much easier using this template, and keep your freedom to edit a…

Jupyter Notebook 1,539 193 Updated Aug 6, 2023

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,538 60 Updated Jun 14, 2025

This repo contains the code for 1D tokenizer and generator

Jupyter Notebook 1,109 63 Updated Mar 20, 2025

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 1,082 86 Updated Jan 22, 2025
Jupyter Notebook 1,069 131 Updated Sep 18, 2024

The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)

Jupyter Notebook 996 112 Updated Jan 3, 2024

A linear estimator on top of clip to predict the aesthetic quality of pictures

Jupyter Notebook 668 27 Updated Aug 15, 2022

Referring Expression Datasets API

Jupyter Notebook 556 85 Updated Aug 27, 2024

Official Jax Implementation of MaskGIT

Jupyter Notebook 554 53 Updated Nov 18, 2022

Fine-Tuning Embedding for RAG with Synthetic Data

Jupyter Notebook 523 74 Updated Sep 11, 2023

Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"

Jupyter Notebook 307 11 Updated Sep 28, 2025

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Jupyter Notebook 301 21 Updated Sep 21, 2025

FlexTok: Resampling Images into 1D Token Sequences of Flexible Length

Jupyter Notebook 290 14 Updated Jun 2, 2025

[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models

Jupyter Notebook 286 11 Updated Dec 4, 2024
Jupyter Notebook 213 6 Updated Feb 11, 2025
Next