Lists (10)
Sort Name ascending (A-Z)
Stars
Fully Open Framework for Democratized Multimodal Training
[ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
A lightweight GPU node manager designed for agentic coding workflows on SLURM clusters.
Professional Antigravity Account Manager & Switcher. One-click seamless account switching for Antigravity Tools. Built with Tauri v2 + React (Rust).专业的 Antigravity 账号管理与切换工具。为 Antigravity 提供一键无缝账号切…
A Super AI Lab with massive AI Doctors as Assistants. Best IDE for Research via AI Power.
This repository contains the official implementation of the research papers, "MobileCLIP" CVPR 2024 and "MobileCLIP2" TMLR August 2025
[ICLR 2026] The official repo of "MMTok: Multimodal Coverage Maximization for Efficient Inference of VLMs"
Small toolchain to turn spoken audio into ASS subtitles using Whisper, and optionally burn them into video with FFmpeg (hardsubs).
Lightweight Image Video Action Generation Inference Framework
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
Reinforcement Learning via Self-Distillation (SDPO)
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
slime is an LLM post-training framework for RL Scaling.
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
A Massive Multi-Discipline Lecture Understanding Benchmark
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
A plugin for Mac WeChat
[CVPR'26 Highlight] Cupid: A 3D generator that links 2D image with camera
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
Agentic AI research papers, benchmarks, frameworks, and tools curated across 24 domains.
Virtual whiteboard for sketching hand-drawn like diagrams
Materials and demo code for CSE 572 tutorial sessions (environment setup, Git, Python projects).
📚 Collection of token-level model compression resources.
FlashInfer: Kernel Library for LLM Serving