Skip to content
View ProvenceStar's full-sized avatar

Block or report ProvenceStar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 27 1 Updated Dec 19, 2025

Pixio: a SSL encoder dedicated to dense CV tasks

Python 165 5 Updated Dec 22, 2025

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

32 Updated Dec 16, 2025

Code repository of "GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation"

33 2 Updated Dec 17, 2025

GroundCUA

Python 56 6 Updated Dec 11, 2025

[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Python 453 21 Updated Nov 29, 2025

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Python 267 17 Updated Nov 17, 2025

The official repo of VideoAgentTrek

Python 36 3 Updated Oct 24, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,643 55 Updated Nov 15, 2025

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Python 1,129 62 Updated Oct 13, 2025

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Python 2,402 352 Updated Dec 15, 2025

LongLive: Real-time Interactive Long Video Generation

Python 921 63 Updated Dec 4, 2025
Python 8,618 609 Updated Nov 12, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,316 1,449 Updated Nov 28, 2025
Python 1,037 63 Updated Nov 20, 2025

[NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.

Python 204 6 Updated Oct 17, 2025

😎 A curated list of awesome GitHub Profile which updates in real time

28,712 4,234 Updated Aug 19, 2024

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,685 1,355 Updated Dec 17, 2025

Fully Open Framework for Democratized Multimodal Training

Python 660 50 Updated Dec 15, 2025

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

SCSS 16,110 4,603 Updated Dec 21, 2025

The missing star history graph of GitHub repos - https://star-history.com

TypeScript 8,199 308 Updated Dec 18, 2025

Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Python 184 3 Updated Dec 19, 2025

Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"

Python 378 15 Updated Sep 15, 2025

Recommend new arxiv papers of your interest daily according to your Zotero libarary.

Python 4,273 3,780 Updated Dec 17, 2025

Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

779 88 Updated Aug 27, 2025

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Python 468 17 Updated Sep 22, 2025

Pixel-Level Reasoning Model trained with RL [NeuIPS25]

Python 257 9 Updated Nov 6, 2025

Pointcept: Perceive the world with sparse points, a codebase for point cloud perception research. Latest works: Concerto (NeurIPS'25), Sonata (CVPR'25 Highlight), PTv3 (CVPR'24 Oral)

Python 2,722 328 Updated Dec 3, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,769 375 Updated Oct 21, 2025
Next