If we'd use OCR on the video, and then use something like sumy to figure out what's going, we could for example detect 'chapters' in the video.