Lists (6)
Sort Name ascending (A-Z)
Starred repositories
A diffusion-based framework for document OCR that replaces autoregressive decoding with block-level parallel diffusion decoding.
🚀 Open source Claude Code CLI source code. Advanced AI Agent for developers. Includes TypeScript codebase for LLM tool-calling, agentic workflows, and terminal UI. Remember this is just the skeleto…
Ghostty-based macOS terminal with vertical tabs and notifications for AI coding agents
AI Agent Framework, the Pydantic way
A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes…
OmX - Oh My codeX: Your codex is not alone. Add hooks, agent teams, HUDs, and so much more.
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models
torchange - A Unified Change Representation Learning Benchmark Library
GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning
A comprehensive and up-to-date compilation of datasets, tools, methods, review papers, and competitions for remote sensing change detection.
This repo contains a curative list of scene change detection(SCD), including papers, videos, codes, and related websites.
An agentic skills framework & software development methodology that works.
A collection of 100+ specialized Claude Code subagents covering a wide range of development use cases
Intelligent automation and multi-agent orchestration for Claude Code
45 tips for getting the most out of Claude Code, from basics to advanced - includes a custom status line script, cutting the system prompt in half, using Gemini CLI as Claude Code's minion, and Cla…
A Claude Code plugin that shows what's happening - context usage, active tools, running agents, and todo progress
🚀 Beautiful highly customizable statusline for Claude Code CLI with powerline support, themes, and more.
A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future …
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
🧮 Calculator for vision tokens in VLMs.
The absolute trainer to light up AI agents.
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
This is the official repository for our recent work: PIDNet
The official implementation of "Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes"
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.