I’m Mehdi, a Postdoctoral Research Fellow at the Australian Institute for Machine Learning (AIML). My research sits at the intersection of robotics and computer vision, with a focus on building perception and reasoning components that help embodied systems operate robustly in real-world environments.
I completed my PhD at the University of Adelaide under Prof. Ian Reid (Australian Centre for Robotic Vision), focusing on real-time, structure-aware, object-centric semantic visual SLAM, that integrates geometric estimation with object- and scene-level priors for robust operation in complex environments.
Broadly, I work on three connected research streams:
- Embodied perception & mapping: multimodal scene understanding for navigation and manipulation, including geometry-aware representations for localization, mapping, and decision-making.
- 3D reconstruction & neural representations: learning-based 3D reconstruction and neural rendering with an emphasis on geometric consistency and practical generalization.
- Vision-language for robot reasoning & interaction: using VLMs together with structured visual evidence to support robot introspection—e.g., detecting and localizing failures in long-horizon tasks, producing grounded explanations, and suggesting corrective actions.
I also draw on prior industry experience building deployable vision/SLAM systems, which helps me stay close to constraints like runtime, sensing limitations, and robustness.
Recent work (selected):
- G³Splat: geometric priors for pose-free, generalizable Gaussian splatting to improve geometrically consistent reconstruction, relative pose estimation, and novel view synthesis (G3Splat: Geometrically Consistent Generalizable Gaussian Splatting).
- KITE: a training-free front-end that converts long robot-execution videos into compact, interpretable evidence to improve VLM-based failure understanding and correction.(KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis).