Actually very useful even for for other things, thanks for sharing! For example ripping DVD subtitles to SRT, or (I'm using my imagination) maybe in the future with content-aware fill removing hard coded subtitles and replacing them with filler space?
That should actually be possible with todays technology. Take an image and draw subtitles on it. This is input to train NN while original image is training output. Even better, use video stream directly... Not easy, but not impossible either.
DVD subtitles are already a separate layer to the movie stream, but it is a bitmap. Because it's a separate layer, OCR-ing should be easy.
And if you ask why it's a bitmap, that's because bitmaps support more than just plain text: color and typefaces to name 2 things. Imagine if DVD players have to implement text decoding ("Is this subtitle stream in UTF-8 or maybe some Cyrillic Code Page?") and rendering (color, placement, font files, etc...)