Skip to content
View zcablii's full-sized avatar

Highlights

  • Pro

Block or report zcablii

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Python 343 15 Updated Dec 15, 2025

🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、华尔街见闻、财联社等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/个人微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 推送,1分钟手机通知,无需…

Python 39,972 20,796 Updated Dec 20, 2025

[ArXiv 2025] Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Python 16 Updated Nov 30, 2025

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,832 108 Updated Dec 8, 2025

A reproduction of the Deepseek-OCR model including training

Python 200 17 Updated Nov 21, 2025

Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"

Python 132 1 Updated Nov 12, 2025

Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)

Jupyter Notebook 1,014 66 Updated Dec 15, 2025

[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"

Python 83 5 Updated Oct 21, 2025
Rust 4 Updated Nov 22, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,882 655 Updated Nov 20, 2025

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

Jupyter Notebook 935 72 Updated Nov 7, 2023

AI for remote sensing, remote sense, object detection, oriented object detection, computer vision, cv

Python 52 2 Updated Dec 5, 2025

[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

Python 63 2 Updated Jul 10, 2025

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 230 17 Updated Oct 4, 2025

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 114 1 Updated Jul 1, 2025

One-shot Entropy Minimization

Python 186 11 Updated Jun 13, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,533 78 Updated Nov 16, 2025

Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"

Python 1,466 102 Updated Dec 16, 2025

SARLANG-1M is a large-scale benchmark tailored for multimodal SAR image understanding, with a primary focus on integrating SAR with textual modality.

Python 38 1 Updated Jun 20, 2025

[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Python 129 7 Updated Dec 17, 2025

Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed

Jupyter Notebook 108 6 Updated Oct 25, 2024

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Python 158 7 Updated Sep 6, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 17,308 1,448 Updated Nov 28, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,292 7,792 Updated Dec 21, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,286 103 Updated Oct 29, 2025

Fully open reproduction of DeepSeek-R1

Python 25,745 2,405 Updated Nov 24, 2025

The first large-scale multimodal dialogue dataset focusing on Synthetic Aperture Radar (SAR) imagery.

Shell 64 3 Updated Feb 15, 2025

Solve Visual Understanding with Reinforced VLMs

Python 5,769 375 Updated Oct 21, 2025

【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

Python 2,283 140 Updated Jul 15, 2025

[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 413 21 Updated Dec 22, 2024
Next