It's an HLS video, which comes in the form of a text manifest file (.m3u8). The manifest file can define subtitle segments (.vtt, 608, 708) in addition to video segments (.ts) - it's the job of the video player to stitch these together to play at the same time. Safari & Edge have their own HLS implementations but every other browser uses a third party library. Server-side transcoding is responsible for making these manifest files from video & subtitle files.
Interesting, thanks! I had a quick peek at the developer tools but didn't delve more deeply. I assume (from the fact that it was live TV) that all of this can be done just in time as well and that the stream of captions is sent along with the video stream.
Also, this video is in Flash