RobertLuo1

🎯

Focusing

Robert Luo RobertLuo1

🎯

Focusing

Master Student at Tsinghua University. I am always a learner in this amazing artificial intelligence field.

71 followers · 88 following

Tsinghua University
Beijing
08:06 (UTC +08:00)
robertluo1.github.io

Lists (3)

Sort

Stars

49 stars written in Jupyter Notebook

Clear filter

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 18,470 2,337 Updated Dec 25, 2024

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,181 1,578 Updated Jan 30, 2026

FoundationVision / VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

Jupyter Notebook 8,609 556 Updated Nov 10, 2025

apple / corenet

CoreNet: A library for training deep neural networks

Jupyter Notebook 7,016 543 Updated Oct 9, 2025

CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Jupyter Notebook 6,425 1,230 Updated Jul 30, 2024

OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Jupyter Notebook 5,792 545 Updated Aug 29, 2025

VectorSpaceLab / OmniGen

OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

Jupyter Notebook 4,309 370 Updated Dec 4, 2025

Tencent-Hunyuan / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Jupyter Notebook 4,294 360 Updated Nov 27, 2025

VectorSpaceLab / OmniGen2

OmniGen2: Exploration to Advanced Multimodal Generation. https://arxiv.org/abs/2506.18871

Jupyter Notebook 4,020 17 Updated Dec 2, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,914 316 Updated Jun 12, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,395 210 Updated Jan 8, 2026

mshumer / OpenDeepResearcher

Jupyter Notebook 2,752 364 Updated May 2, 2025

EleutherAI / pythia

The hub for EleutherAI's work on interpretability and learning dynamics

Jupyter Notebook 2,728 204 Updated Nov 15, 2025

tatsu-lab / alpaca_eval

An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.

Jupyter Notebook 1,940 298 Updated Aug 9, 2025

NVIDIA / Cosmos-Tokenizer

A suite of image and video neural tokenizers

Jupyter Notebook 1,703 85 Updated Feb 11, 2025

miracleyoo / pytorch-lightning-template

An easy/swift-to-adapt PyTorch-Lighting template. 套壳模板，简单易用，稍改原来Pytorch代码，即可适配Lightning。You can translate your previous Pytorch code much easier using this template, and keep your freedom to edit a…

Jupyter Notebook 1,539 193 Updated Aug 6, 2023

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,538 60 Updated Jun 14, 2025