-
University of California, Santa Barbara
Highlights
- Pro
Stars
Official repository for paper: "OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs"
Official Implementation of Papar CM2
[Up-To-Date] Awesome Agent Memory Paper Resource
SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration
ICLR2026 SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
[ICLR26] Official codebase for the paper "Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations"
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
[NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
[NeurIPS 2025] More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models
[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"
Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"
Official implementation of the NeurIPS 2025 paper "Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space"
Agent S: an open agentic framework that uses computers like a human
[ICLR 2025] EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
[ACL 2025 Findings] "Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models"
Official repo for the paper "Mojito: Motion Trajectory and Intensity Control for Video Generation""
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Large Concept Models: Language modeling in a sentence representation space
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
A simple screen parsing tool towards pure vision based GUI agent
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"