Skip to content
View zcablii's full-sized avatar

Highlights

  • Pro

Block or report zcablii

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Nankai University Thesis LaTeX Template

TeX 17 1 Updated Mar 24, 2026
Python 30 1 Updated Mar 29, 2026

JavaScript in-page GUI agent. Control web interfaces with natural language.

TypeScript 16,786 1,360 Updated Apr 11, 2026

official repository of article "CrystaL: Spontaneous Emergence of Visual Latents in MLLMs"

Python 14 1 Updated Mar 27, 2026

ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.

Python 59 2 Updated Mar 3, 2026

The SDK For Browser Agents

TypeScript 22,012 1,467 Updated Apr 11, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 354,921 71,780 Updated Apr 11, 2026

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Python 477 29 Updated Apr 7, 2026

⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构…

Python 51,451 23,066 Updated Apr 9, 2026

[Remote Sensing 2026] Co-Training Vision Language Models for Remote Sensing Multi-task Learning

Jupyter Notebook 32 Updated Feb 12, 2026

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 2,241 156 Updated Dec 8, 2025

A reproduction of the Deepseek-OCR model including training

Python 208 21 Updated Nov 21, 2025

Offical implementation of "Visual Instruction Pretraining for Domain-Specific Foundation Models"

Python 174 3 Updated Nov 12, 2025

[CVPR2026] Detect Anything via Next Point Prediction

Jupyter Notebook 1,293 89 Updated Feb 22, 2026

[NeurIPS 2025]"Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning"

Python 100 6 Updated Oct 21, 2025
Rust 4 Updated Mar 8, 2026

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 10,080 808 Updated Mar 30, 2026

Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)

Jupyter Notebook 942 73 Updated Nov 7, 2023

AI for remote sensing, remote sense, object detection, oriented object detection, computer vision, cv

Python 107 5 Updated Apr 9, 2026

[ICML 2025 Spotlight] MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding

Python 69 2 Updated Jul 10, 2025

[NeurIPS 2025 Oral] Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Python 257 18 Updated Oct 4, 2025

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 120 2 Updated Jul 1, 2025

One-shot Entropy Minimization

Python 189 11 Updated Jun 13, 2025

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models (dLLMs with block diffusion, mixed-CoT, unified RL)

Python 1,622 86 Updated Feb 14, 2026

Official Repo For Pixel-LLM Codebase: Sa2VA (Arxiv-25), SAMTok (CVPR-26), VRT, SaSaSa2VA (1-st solution for LSVOS)

Python 1,581 115 Updated Feb 27, 2026

Official PyTorch Implementation of SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding [IEEE TGRS 2026].

Python 47 2 Updated Mar 19, 2026

[NIPS 2025 DB Oral] Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Python 144 7 Updated Apr 8, 2026

Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed

Jupyter Notebook 109 6 Updated Oct 25, 2024

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Python 168 11 Updated Sep 6, 2025

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,936 1,726 Updated Jan 30, 2026
Next