-
University of Strasbourg & Technical University of Munich
- Strasbourg
- https://flaick.github.io/
- in/kun-yuan-b2425219b
- https://scholar.google.com/citations?user=zId4EqoAAAAJ
- @KY_Yuan1
Stars
Adding scripts and dataset related to the acceptance of the paper for WACV 2025.
Official Code for "SurgMotion: A Video-Native Foundation Model for Universal Understanding of Surgical Videos"
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
Unified framework for robot learning built on NVIDIA Isaac Sim
MICCAI 2025: SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting
PyTorch code and models for VJEPA2 self-supervised learning from video.
Command-line program to download videos from YouTube.com and other video sites
Reference PyTorch implementation and models for DINOv3
Domain-adaptive CLIP retrieval system for surgical video keyframe–text matching. Includes adapter training, INT8 acceleration, PQ compression, and Reliability analysis.
Latest Advances on Agentic AI & AI Agents for Healthcare
Referring Surgical Instrument Segmentation via Motion
[MICCAI 2025] Official code implementation for paper: Instrument-Splatting: Controllable Photorealistic Reconstruction of Surgical Instruments Using Gaussian Splatting
GUI for intuitively creating 3D reconstructions of the real world
Open-H-Embodiment is a community‑driven dataset initiative building the open, shared foundation needed to train and evaluate a generalist Vision‑Language‑Action (VLA) model for healthcare robotics
[NeurIPS 2023] SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
Benchmarking physical understanding in generative video models
Code for the paper "Compositional Entailment Learning for Hyperbolic Vision-Language Models".
[NeurIPS'25] EndoBench: A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis
MedAgentSim: Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions, MICCAI 2025 (oral and early accepted)
Official code of ORQA, corresponding to the paper "Specialized Foundation Models for Intelligent Operating Rooms". This repo includes the ORQA benchmark and its construction, the evaluation with OR…
Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 2025)
There are compilations of surgery-related tasks, datasets, and papers.
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
Renderer for the harmony response format to be used with gpt-oss
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"
Implementation for "DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations" (NeurIPS 2022))