-
twelve-labs
- Seoul
Stars
MEME: Multi-Entity & Evolving Memory Evaluation — reference implementation (companion to arXiv preprint)
[ACL 2026 Main] MM-JudgeBias: A Benchmark for Evaluating Compositional Biases in MLLM-as-a-Judge
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
One CLAUDE.md file. Keeps Claude responses terse. Reduces output verbosity on heavy workflows. Drop-in, no code changes.
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.
🚀 Efficient implementations for emerging model architectures
A lightweight, lightning-fast, in-process vector database
A smarter cd command. Supports all major shells.
A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API
Repair malformed JSON from LLMs, APIs, logs, and user input in Python.
The agent harness performance optimization system. Skills, instincts, memory, security, and research-first development for Claude Code, Codex, Opencode, Cursor and beyond.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
StreamingVLM: Real-Time Understanding for Infinite Video Streams
implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880
A Paper List for Humanoid Robot Learning.
Official repository of the paper "Does audio matter for modern video-LLMs and their benchmarks?"
An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.
Query-aware Token Selector (QTSplus), a lightweight yet powerful visual token selection module that serves as an information gate between the vision encoder and LLMs.
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
[ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
slime is an LLM post-training framework for RL Scaling.
A framework for efficient model inference with omni-modality models
ChatDev 2.0: Dev All through LLM-powered Multi-Agent Collaboration
Ring attention implementation with flash attention
🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training