zcablii

Yuxuan Li zcablii

139 followers · 34 following

Achievements

Highlights

Stars

Visionary-Laboratory / visionary

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Python 343 15 Updated Dec 15, 2025

sansan0 / TrendRadar

🎯 告别信息过载，AI 助你看懂新闻资讯热点，简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台（抖音、知乎、B站、华尔街见闻、财联社等），智能筛选+自动推送+AI对话分析（用自然语言深度挖掘新闻：趋势追踪、情感分析、相似检索等13种工具）。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送，1分钟手机通知，无需…

Python 39,972 20,796 Updated Dec 20, 2025

VisionXLab / RSCoVLM

[ArXiv 2025] Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Python 16 Updated Nov 30, 2025

LTH14 / JiT

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,832 108 Updated Dec 8, 2025

Princeton-AI2-Lab / DeepOCR

A reproduction of the Deepseek-OCR model including training

Python 200 17 Updated Nov 21, 2025

zcablii / ViTP

Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"

Python 132 1 Updated Nov 12, 2025

IDEA-Research / Rex-Omni

Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)

Jupyter Notebook 1,014 66 Updated Dec 15, 2025

YXB-NKU / SE-GUI

[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"

Python 83 5 Updated Oct 21, 2025

zhongyi51 / flagged_pointer

Rust 4 Updated Nov 22, 2025

facebookresearch / dinov3

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,882 655 Updated Nov 20, 2025

google-research / pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

Jupyter Notebook 935 72 Updated Nov 7, 2023

wokaikaixinxin / ai4rs

AI for remote sensing, remote sense, object detection, oriented object detection, computer vision, cv

Python 52 2 Updated Dec 5, 2025

KlingTeam / MODA

[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

Python 63 2 Updated Jul 10, 2025

Martinser / REG

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 230 17 Updated Oct 4, 2025

HumanMLLM / LLaVA-Scissor

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 114 1 Updated Jul 1, 2025

zitian-gao / one-shot-em

One-shot Entropy Minimization

Python 186 11 Updated Jun 13, 2025

Gen-Verse / MMaDA

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,533 78 Updated Nov 16, 2025

bytedance / Sa2VA

Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"

Python 1,466 102 Updated Dec 16, 2025

Jimmyxichen / SARLANG-1M

SARLANG-1M is a large-scale benchmark tailored for multimodal SAR image understanding, with a primary focus on integrating SAR with textual modality.

Python 38 1 Updated Jun 20, 2025

PhoenixZ810 / RISEBench

[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Python 129 7 Updated Dec 17, 2025

OpenGVLab / InternVL-MMDetSeg

Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed

Jupyter Notebook 108 6 Updated Oct 25, 2024

Visual-AI / Mr.DETR

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Python 158 7 Updated Sep 6, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,308 1,448 Updated Nov 28, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,292 7,792 Updated Dec 21, 2025

Liuziyu77 / Visual-RFT

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,286 103 Updated Oct 29, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,745 2,405 Updated Nov 24, 2025

JimmyMa99 / SARChat

The first large-scale multimodal dialogue dataset focusing on Synthetic Aperture Radar (SAR) imagery.

Shell 64 3 Updated Feb 15, 2025

om-ai-lab / VLM-R1

Solve Visual Understanding with Reinforced VLMs

Python 5,769 375 Updated Oct 21, 2025

PKU-YuanGroup / MoE-LLaVA

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

Python 2,283 140 Updated Jul 15, 2025

deepcs233 / Visual-CoT

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 413 21 Updated Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yuxuan Li zcablii

Achievements

Achievements

Highlights

Block or report zcablii

Stars

Visionary-Laboratory / visionary

sansan0 / TrendRadar

VisionXLab / RSCoVLM

LTH14 / JiT

Princeton-AI2-Lab / DeepOCR

zcablii / ViTP

IDEA-Research / Rex-Omni

YXB-NKU / SE-GUI

zhongyi51 / flagged_pointer

facebookresearch / dinov3

google-research / pix2seq

wokaikaixinxin / ai4rs

KlingTeam / MODA

Martinser / REG

HumanMLLM / LLaVA-Scissor

zitian-gao / one-shot-em

Gen-Verse / MMaDA

bytedance / Sa2VA

Jimmyxichen / SARLANG-1M

PhoenixZ810 / RISEBench

OpenGVLab / InternVL-MMDetSeg

Visual-AI / Mr.DETR

QwenLM / Qwen3-VL

hiyouga / LLaMA-Factory

Liuziyu77 / Visual-RFT

huggingface / open-r1

JimmyMa99 / SARChat

om-ai-lab / VLM-R1

PKU-YuanGroup / MoE-LLaVA

deepcs233 / Visual-CoT