ProvenceStar

Follow

LI Junyi ProvenceStar

Follow

PhD Student, The University of Hong Kong

24 followers · 27 following

HKU | Research Intern at ByteDance
Hong Kong, China
https://provencestar.github.io/

Achievements

Achievements

Stars

KlingTeam / Alchemist

Python 27 1 Updated Dec 19, 2025

facebookresearch / pixio

Pixio: a SSL encoder dedicated to dense CV tasks

Python 165 5 Updated Dec 22, 2025

happinesslz / DrivePI

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

32 Updated Dec 16, 2025

Huster-YZY / GenieDrive

Code repository of "GenieDrive: Towards Physics-Aware Driving World Model with 4D Occupancy Guided Video Generation"

33 2 Updated Dec 17, 2025

ServiceNow / GroundCUA

GroundCUA

Python 56 6 Updated Dec 11, 2025

Pointcept / Concerto

[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Python 453 21 Updated Nov 29, 2025

dvlab-research / MGM-Omni

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Python 267 17 Updated Nov 17, 2025

xlang-ai / VideoAgentTrek

The official repo of VideoAgentTrek

Python 36 3 Updated Oct 24, 2025

bytetriper / RAE

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,643 55 Updated Nov 15, 2025

PRIME-RL / SimpleVLA-RL

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Python 1,129 62 Updated Oct 13, 2025

xlang-ai / OSWorld

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Python 2,402 352 Updated Dec 15, 2025

NVlabs / LongLive

LongLive: Real-time Interactive Long Video Generation

Python 921 63 Updated Dec 4, 2025

bytedance / UI-TARS

Python 8,618 609 Updated Nov 12, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,316 1,449 Updated Nov 28, 2025

Visual-Agent / DeepEyes

Python 1,037 63 Updated Nov 20, 2025

InternRobotics / InternScenes

[NeurIPS 2025] InternScenes: A Large-scale Interactive Indoor Scene Dataset with Realistic Layouts.

Python 204 6 Updated Oct 17, 2025

abhisheknaiidu / awesome-github-profile-readme

😎 A curated list of awesome GitHub Profile which updates in real time

28,712 4,234 Updated Aug 19, 2024

Alibaba-NLP / DeepResearch

Tongyi Deep Research, the Leading Open-source Deep Research Agent

Python 17,685 1,355 Updated Dec 17, 2025

EvolvingLMMs-Lab / LLaVA-OneVision-1.5

Fully Open Framework for Democratized Multimodal Training

Python 660 50 Updated Dec 15, 2025

academicpages / academicpages.github.io

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

SCSS 16,110 4,603 Updated Dec 21, 2025

star-history / star-history

The missing star history graph of GitHub repos - https://star-history.com

TypeScript 8,199 308 Updated Dec 18, 2025

Fr0zenCrane / UniCoT

Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Python 184 3 Updated Dec 19, 2025

Physical-Intelligence / openpi

Python 9,449 1,266 Updated Dec 18, 2025

Mini-o3 / Mini-o3

Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"

Python 378 15 Updated Sep 15, 2025

TideDra / zotero-arxiv-daily

Recommend new arxiv papers of your interest daily according to your Zotero libarary.

Python 4,273 3,780 Updated Dec 17, 2025

FoundationVision / Waver

Industry-level video foundation model for unified Text-to-Video (T2V) and Image-to-Video (I2V) generation.

779 88 Updated Aug 27, 2025

Qi-Zhangyang / GPT4Scene-and-VLN-R1

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Python 468 17 Updated Sep 22, 2025

TIGER-AI-Lab / Pixel-Reasoner

Pixel-Level Reasoning Model trained with RL [NeuIPS25]

Python 257 9 Updated Nov 6, 2025

Pointcept / Pointcept

Pointcept: Perceive the world with sparse points, a codebase for point cloud perception research. Latest works: Concerto (NeurIPS'25), Sonata (CVPR'25 Highlight), PTv3 (CVPR'24 Oral)

Python 2,722 328 Updated Dec 3, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 5,769 375 Updated Oct 21, 2025