Stars
[ICML'25] The PyTorch implementation of paper: "AdaWorld: Learning Adaptable World Models with Latent Actions".
Visualization of dataset splits for surgical phase and instrument recognition
Official Implementation of Dyn-O: Building Structured World Models with Object-Centric Representations (NeurIPS 2025)
Medical 3D Vision-language alignment for abnormality zero-shot diagnosis
OpenManus is an open-source initiative to replicate the capabilities of the Manus AI agent, a state-of-the-art general-purpose AI developed by Monica, which excels in autonomously executing complex…
Vector-Quantized Vision Foundation Models for Object-Centric Learning, ACM MM 2025.
[CVPR 2025] Official Pytorch Code for Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Official implementation of: "PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning" by Villar-Corrales & Behnke. ICML 2025
A vision-language model for recognizing surgical objects in surgical images and videos.
The official code for TMI2025 work "Instrument-Tissue-Guided Surgical Action Triplet Detection via Textual-Temporal Trail Exploration".
Reference PyTorch implementation and models for DINOv3
[MICCAI 2025 Young Scientist Award] Official implementation of "Learning Concept-Driven Logical Rules for Interpretable and Generalizable Medical Image Classification"
This is anonymous repository for submitting our work to a conference
[CVPR'2025] EntitySAM: Segment Everything in Video
RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
[ICLR'24] Learning to Compose: Improving Object Centric Learning by Injecting Compositionality
Code for this paper "SimSMoE: Toward Efficient Training Mixture of Experts via Solving Representational Collapse".
"Object-Region Video Transformers”, Herzig et al., CVPR 2022
Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".
Code for MICCAI2025 paper "Next slot prediction for unsupervised object discovery"
Official implementation of "Exploring Temporally-Aware Features for Point Tracking" (CVPR 2025)
Official Code for "Large-scale Self-supervised Video Foundation Model for Intelligent Surgery"
Official implementation of Pix2SG, the first location-free scene graph generation method, as well as the corresponding heuristic tree search-based evaluation implemented in C++.
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"