Skip to content
View LuoweiZhou's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@MichiganCOG @deepvision-class

Block or report LuoweiZhou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model

Python 1,714 176 Updated Oct 4, 2025

Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model

Python 2,376 198 Updated Oct 22, 2025

Code for ICCV 2025 paper: "ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation"

Python 7 Updated Aug 22, 2025

Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.

TypeScript 52,103 7,629 Updated Nov 5, 2025

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Python 5,104 605 Updated Jul 11, 2025

This repo contains the code for 1D tokenizer and generator

Jupyter Notebook 1,071 56 Updated Mar 20, 2025

Open-Sora: Democratizing Efficient Video Production for All

Python 27,755 2,753 Updated Apr 30, 2025

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 3,256 419 Updated Nov 3, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,673 366 Updated Oct 21, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,375 1,112 Updated Nov 5, 2025

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 12,531 1,264 Updated Nov 4, 2025

[ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"

Python 174 16 Updated Mar 17, 2025

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

8,065 519 Updated Jun 9, 2025

xLAM: A Family of Large Action Models to Empower AI Agent Systems

Python 577 48 Updated Aug 21, 2025

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

Python 3,629 303 Updated Oct 20, 2025

Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.

443 15 Updated Apr 18, 2024

A suite of image and video neural tokenizers

Jupyter Notebook 1,678 83 Updated Feb 11, 2025

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

Python 1,133 172 Updated Nov 5, 2025

Codebase for Aria - an Open Multimodal Native MoE

Jupyter Notebook 1,078 85 Updated Jan 22, 2025

Composable building blocks to build Llama Apps

Python 8,144 1,205 Updated Nov 5, 2025

Agentic components of the Llama Stack APIs

4,274 635 Updated Aug 5, 2025

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬

Jupyter Notebook 11,659 1,710 Updated Apr 26, 2025

MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone

Python 22,185 1,665 Updated Sep 24, 2025

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 17,542 2,178 Updated Dec 25, 2024

[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Jupyter Notebook 132 10 Updated Aug 26, 2024

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curatio…

Python 2,315 226 Updated Nov 7, 2024

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Python 1,049 198 Updated Jul 21, 2025

A curated list of recent diffusion models for video generation, editing, and various other applications.

5,170 319 Updated Oct 15, 2025

AI Browser

JavaScript 5,658 541 Updated Nov 5, 2025

Focus on prompting and generating

Python 46,967 7,587 Updated Sep 2, 2025
Next