Tasked with developing deep learning models that can perform the task of segmenting and recognizing the instruments shown in the videos. In each frame, a variety of surgical tools are present (Instruments, Clamps, threads). These tools need to be segmented according to their category along with the tissue appearing in the background.
-> Built a surgical video segmentation pipeline by fine-tuning Detectron2, a modular framework by Meta AI Research, and Mask-RCNN for binary classification and surgical tool recognition.
-> Developed Pipeline for automated conversion of segmented video frames to quantifiable files and used GPT-3.5 for real-time inferencing.
The training dataset is made up of 16 robotic procedures. The original video data was recorded at 60 Hz and to reduce labelling cost they have been subsampled to 2 Hz. Sequences with little or no motion are manually removed to leave 149 frames per procedure. Video frames are 1280x1024 for which labeling is provided.
The classes found in the training and test are:
- Instrument
- Drop in Ultrasound Probe
- Suturing Needles
- Suturing thread
- Clips/clamps
- Background tissue
Each class will have a distinct numerical label in a ground truth image. A supplied json file will contain the class name to numerical label mapping.
- Ojas Patil
- Garvita Kesarwani