About Me
I am a Research Scientist at Google Deepmind in London. I received my PhD from the University of Washington, where I was advised by Dieter Fox. Previously, I was a researcher at NUS under David Hsu. My interests are in Human-Robot Interaction, Computer Vision, Natural Language Processing, and Machine Learning. I graduated with a Bachelors in Computer Eng. from NUS. During my undergrad, I spent a year at Stanford, and also interned at a YCombinator AR startup.
See my CV for more details.
Contact: mshr └[∵┌]└[ ∵ ]┘[┐∵]┘ cs.washington.edu
Publications
See Google Scholar for all publications and recent works.
Representative works
Generative Image as Action Models
Mohit Shridhar*, Yat Long Lo*, Stephen James
Conference on Robot Learning (CoRL) 2024.
Website | Abstract | PDF | Video | Code | Blog | BibTex
Perceiver-Actor: A Multi-Task Transformer
for Robotic Manipulation.
Mohit Shridhar, Lucas Manuelli, Dieter Fox
Conference on Robot Learning (CoRL) 2022.
Website | Abstract | PDF | Video | Colab | Code | Talk | BibTex
CLIPort: What and Where Pathways
for Robotic Manipulation.
Mohit Shridhar, Lucas Manuelli, Dieter Fox
Conference on Robot Learning (CoRL) 2021.
Website | Abstract | PDF | Video | Code | BibTex
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning.
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté,
Yonatan Bisk, Adam Trischler, Matthew Hausknecht
International Conference on Learning Representations (ICLR) 2021.
Website | Abstract | PDF | Video | Code | BibTex
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks.
Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk,
Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox
Computer Vision and Pattern Recognition (CVPR) 2020.
Website | Abstract | PDF | Video | Code | BibTex
Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction.
Mohit Shridhar, David Hsu
Robotics: Science & Systems (RSS) 2018.
Abstract | PDF | Video | Code | Poster | Slides | BibTeX
XPose: Reinventing user interaction with flying cameras.
Ziquan Lan, Mohit Shridhar, David Hsu, Shengdong Zhao.
Robotics: Science & Systems (RSS) 2017.
Best Systems Paper Award
Abstract | PDF | Video | Slides | BibTeX
Experience
Google Deepmind – London
Robotics Team, Research Scientist
Nov 2024 – Present
Dyson – London
Dyson Robot Learning Lab, Research Scientist
July 2023 – Oct 2024
NVIDIA – Seattle
Seattle Robotics Lab, Research Intern
July 2022 – Sep 2022
Microsoft – Redmond
Reinforcement Learning Group, Research Intern
June 2020 – Sep 2020
NVIDIA – Seattle
Seattle Robotics Lab, Research Intern
Jan 2020 – Mar 2020
Meta – Redwood City
Computer Vision & Graphics Team, Software Intern
Jan 2015 – Dec 2015
HopeTechnik – Singapore
Robotics Team, Software Intern
May 2014 – Aug 2014
Projects
Research
Shield SLAM (2015)
Monocular SLAM for Android devices.
Paper | Code
Dense-Semantic SLAM (2017)
Combining Monocular SLAM with Dense Captioning for object retrieval
Video
Higgs Boson Detection Challenge (2015)
Deep-learning for classifying Higgs Boson to tau-tau signal events
Paper | Poster | Code
Others
Multi-Map Manager (2014) Video | Code
ROS package for managing multiple static maps (e.g: different floors)
Oculus-Rift Gazebo Navigator (2014) Video | Code
Joystick based navigation tool for FPS navigation in Gazebo
Textured Quads (2016) Code
Rviz plugin for displaying images and videos
TASCA (2013) Video | Code
Java-based todo-list application
Arduino Oscilloscope (2013) Code
Cheap alternative to digital oscilloscopes
Media
Some of my collaborative work has been featured in:
Miscellaneous
My Erdos number is at most three (Mohit Shridhar → David Hsu → Maria Klawe → Paul Erdős).
But unfortunately my Erdos–Bacon–Sabbath number is undefined.
I am a fan of films, science fiction, and Oxford commas.
