The lossy transform is important, but I think what's actually most important in video compression is getting rid of redundancy --- H.264 actually has a lossless mode in which that transform is not used, and it still compresses rather well (especially for noiseless scenes like a screencast.) You can see the difference if you compare with something like MJPEG which is essentially every frame independently encoded as a JPEG.
The key idea is to encode differences; even in an I-frame, macroblocks can be encoded as differences from previous macroblocks, and with various filterings applied: https://www.vcodex.com/h264avc-intra-precition/ This reduces the spatial redundancies within a frame, and motion compensation reduces the temporaral redundancies between frames.
You can sometimes see this when seeking through video that doesn't contain many I-frames, as all the decoder can do is try to decode and apply differences to the last full frame; if that isn't the actual preceding frame, you will see the blocks move around and change in odd ways to create sometimes rather amusing effects, until it reaches the next I-frame. The first example I found on the Internet shows this clearly, likely resulting from jumping immediately into the middle of a file: http://i.imgur.com/G4tbmTo.png That frame contains only the differences from the previous one.
As someone who has written a JPEG decoder just for fun and learning purposes, I'm probably going to try a video decoder next; although I think starting from something simpler like H.261 and working upwards from there would be much easier than starting immediately with H.264. The principles are not all that different, but the number of modes/configurations the newer standards have --- essentially for the purpose of eliminating more redundancies from the output --- can be overwhelming. H.261 only supports two frame sizes, no B-frames, and no intra-prediction. It's certainly a fascinating area to explore if you're interested in video and compression in general.
> MJPEG which is essentially every frame independently encoded as a JPEG.
"essentially" makes it sound like it isn't precisely true. MJPEG is literally just a stream of JPEG images. The framing of the stream varies a bit, but many implementations are just literal JPEG images bundled one after the other into a MIME "multipart/x-mixed-replace" message.
This is really interesting and the imgur picture you linked (with your explanation) explains it really clearly!
But when seeking, why wouldn't any local media playback seek backwards and reconstruct the full frame? It's not like the partial frame after seeking is useful - I'd rather wait 2 seconds while it scrambles (i mean "hurries up") to show me a proper seek, wouldn't everyone?
What was your Internet search for finding that imgur frame? What is this effect called?
>why wouldn't any local media playback seek backwards and reconstruct the full frame?
Most codecs/players do. VLC used to be criticized for being different in that regard. One possible advantage is istantaneous seeking, as there's no need to decode all the needed frames (which could amount to several seconds of video) between the nearest I-frames[1] (the complete reference pictures) and the desired one.
[1]: plural, because prediction can also be bidirectional in time
The use of incomplete video frame data for artistic purposes is called "datamoshing".
I try to use VLC when I can because it offers intuitive playlist support, but for high-resolution H.264 and friends I usually have to switch to Media Player Classic.
VLC is willing to let my entire screen look like a blob of grey alien shit for 10 seconds instead of just taking a moment to reconstruct frames.
And its hardware acceleration for newer codecs is balls. Sucks because otherwise, it's right up there with f2k for me.
I stopped using VLC when I found mpv [0]. I really like it because it exposes everything from the CLI, so once you're familiarized with the flags you're interested in using, it's easy to play anything. For everyday usage it "just works" too, as expected of any video player.
Does it include all the codecs by default? I think this was a major reason VLC succeeded the way it did. With all other players (BPlayer anyone?) you needed to find and install tons of codecs while in VLC it just worked.
>VLC is willing to let my entire screen look like a blob of grey alien shit for 10 seconds instead of just taking a moment to reconstruct frames.
Yes, this is what I was talking about, and yes, specifically for VLC. Plus it's not like playback is so taxing that all cores are pegged at 100% during playback. When I seek, VLC should get off its ass and scramble to come up with the correct full frame then. I'll wait.
I recently bought a camera that has 4k video recording. VLC just gives up playing the video. Even Windows Media Player can handle it. No idea what's going on, but I was really surprised and disappointed with VLC.
See if you can cut a small segment and submit it as a sample to ffmpeg. Hell, see if ffprobe and ffmpeg can play it. Happy to help, if you've got enough upstream bandwidth.
Isn't the other advantage that VLC can play incomplete movie files? Any other players I have tried 'crash' on incomplete torrents, when VLC just fails until it finds the next I frame.
In a course I taught (2010) on music visualizations that's the term I used.
The example I used in the lecture where datamoshing came up was the music video for Charlift's "Evident Utensil"[1]; I always thought this was a neat example.
The key idea is to encode differences; even in an I-frame, macroblocks can be encoded as differences from previous macroblocks, and with various filterings applied: https://www.vcodex.com/h264avc-intra-precition/ This reduces the spatial redundancies within a frame, and motion compensation reduces the temporaral redundancies between frames.
You can sometimes see this when seeking through video that doesn't contain many I-frames, as all the decoder can do is try to decode and apply differences to the last full frame; if that isn't the actual preceding frame, you will see the blocks move around and change in odd ways to create sometimes rather amusing effects, until it reaches the next I-frame. The first example I found on the Internet shows this clearly, likely resulting from jumping immediately into the middle of a file: http://i.imgur.com/G4tbmTo.png That frame contains only the differences from the previous one.
As someone who has written a JPEG decoder just for fun and learning purposes, I'm probably going to try a video decoder next; although I think starting from something simpler like H.261 and working upwards from there would be much easier than starting immediately with H.264. The principles are not all that different, but the number of modes/configurations the newer standards have --- essentially for the purpose of eliminating more redundancies from the output --- can be overwhelming. H.261 only supports two frame sizes, no B-frames, and no intra-prediction. It's certainly a fascinating area to explore if you're interested in video and compression in general.