zcablii

Follow

Yuxuan Li zcablii

Follow

151 followers · 34 following

Achievements

Achievements

Highlights

Pro

Stars

zhasion / nkuthesis

Nankai University Thesis LaTeX Template

TeX 17 1 Updated Mar 24, 2026

AAwcAA / WOW-Seg-Meta

Python 30 1 Updated Mar 29, 2026

alibaba / page-agent

JavaScript in-page GUI agent. Control web interfaces with natural language.

TypeScript 16,786 1,360 Updated Apr 11, 2026

yangzhangok / crystal

official repository of article "CrystaL: Spontaneous Emergence of Visual Latents in MLLMs"

Python 14 1 Updated Mar 27, 2026

HVision-NKU / ASID-Caption

ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.

Python 59 2 Updated Mar 3, 2026

browserbase / stagehand

The SDK For Browser Agents

TypeScript 22,012 1,467 Updated Apr 11, 2026

openclaw / openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 354,921 71,780 Updated Apr 11, 2026

Visionary-Laboratory / visionary

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Python 477 29 Updated Apr 7, 2026

sansan0 / TrendRadar

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载，你的 AI 舆情监控助手与热点筛选工具！聚合多平台热点 + RSS 订阅，支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机，也支持接入 MCP 架构…

Python 51,451 23,066 Updated Apr 9, 2026

VisionXLab / RSCoVLM

[Remote Sensing 2026] Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Jupyter Notebook 32 Updated Feb 12, 2026

LTH14 / JiT

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 2,241 156 Updated Dec 8, 2025

Princeton-AI2-Lab / DeepOCR

A reproduction of the Deepseek-OCR model including training

Python 208 21 Updated Nov 21, 2025

NK-JittorCV / ViTP

Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"

Python 174 3 Updated Nov 12, 2025

IDEA-Research / Rex-Omni

[CVPR2026] Detect Anything via Next Point Prediction

Jupyter Notebook 1,293 89 Updated Feb 22, 2026

YXB-NKU / SE-GUI

[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"

Python 100 6 Updated Oct 21, 2025

zhongyi51 / flagged_pointer

Rust 4 Updated Mar 8, 2026

facebookresearch / dinov3

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 10,080 808 Updated Mar 30, 2026

google-research / pix2seq

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

Jupyter Notebook 942 73 Updated Nov 7, 2023

wokaikaixinxin / ai4rs

AI for remote sensing, remote sense, object detection, oriented object detection, computer vision, cv

Python 107 5 Updated Apr 9, 2026

KlingAIResearch / MODA

[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

Python 69 2 Updated Jul 10, 2025

Martinser / REG

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 257 18 Updated Oct 4, 2025

HumanMLLM / LLaVA-Scissor

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 120 2 Updated Jul 1, 2025

UbiquantAI / one-shot-em

One-shot Entropy Minimization

Python 189 11 Updated Jun 13, 2025

Gen-Verse / MMaDA

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)

Python 1,622 86 Updated Feb 14, 2026

bytedance / Sa2VA

Official Repo For Pixel-LLM Codebase: Sa2VA (Arxiv-25), SAMTok (CVPR-26), VRT, SaSaSa2VA (1-st solution for LSVOS)

Python 1,581 115 Updated Feb 27, 2026

Jimmyxichen / SARLANG-1M

Official PyTorch Implementation of SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding [IEEE TGRS 2026].

Python 47 2 Updated Mar 19, 2026

PhoenixZ810 / RISEBench

[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Python 144 7 Updated Apr 8, 2026

OpenGVLab / InternVL-MMDetSeg

Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed

Jupyter Notebook 109 6 Updated Oct 25, 2024

Visual-AI / Mr.DETR

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Python 168 11 Updated Sep 6, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,936 1,726 Updated Jan 30, 2026