H.264 is Magic

lostgame · on Nov 4, 2016

Absolutely love this:

'Suppose you have some strange coin - you've tossed it 10 times, and every time it lands on heads. How would you describe this information to someone? You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" - bam! You've just compressed some data! Easy. I saved you hours of mindfuck lectures.'

This is a really great, simple way to explain what is otherwise a fairly complex concept to the average bear. Great work.

chrisseaton · on Nov 4, 2016

That's only one half of the problem - you now have an alphabet five times the size so you have actually increased the size of the message! You also need to explain how to encode this efficiently to explain compression.

deanCommie · on Nov 4, 2016

Not really - the knowledge of what an alphabet can be universally agreed upon and doesn't need to be transmitted with the data. The metaphor here is that software and hardware-based decoding can now be much more powerful because the hardware is more powerful than it used to be.

And of course the truth is you would just transmit "H_10" with the universally agreed upon knowledge that "H" is "Heads" and "_" is number of times.

chrisseaton · on Nov 4, 2016

> you would just transmit "H_10" with the universally agreed upon knowledge that "H" is "Heads" and "_" is number of times.

Yes I get that the alphabet is already agreed upon.

But if I only transmit H or T (uncompressed) that's just one bit needed per symbol. So I can transmit HHHHHHHHH in ten bits. If I introduce this simplified compression to the system and add 0-9 to the alphabet, that now needs four bits per symbol, so the message H10 is 12 bits long (which is longer than uncompressed). And HTHTHTHTHT would be forty bits so if the message doesn't repeat simply it's now four times larger!

See what I mean? It's not successfully compressed anything.

The solution to this is easy and is Huffman coding, but it doesn't make sense to show it for a ten bit message as it won't work well, and in the trite explanation of compression of 'just the symbol and then the number of times it's repeated' this isn't mentioned, so it's only half the story and people will still be puzzled because they will see that your message contains MORE entropy after 'compression', not LESS!

eevilspock · on Nov 4, 2016

You are entirely missing the point. His purpose isn't to give the reader a rigorous mathematical understanding. It is to convey a concept. It is an analogy, not a proof. And his analogy is perfectly good. Just do it: say "HHHHHHHHHH" and say "ten tosses, all heads" and get back to me which one transmits the info to another human in more compact form.

"To another human" is the key phrase, and sometimes I wonder if HN is populated with humans or androids. No offense intended to androids with feelings.

pelario · on Nov 6, 2016

I think there's no need to be so pedantic. Replace 10 for 1000 and now the scheme "works".

Regarding

> The solution to this is easy and is Huffman

Well, not. As you said, for 10 bits doesn't matter; and in general it will depend on the input; sometimes run length encoding performs better than Huffman; and also there are cases were Huffman won't capture high order entropy. Also, for zero order entropy arithmetic encoder is superior than Huffman. Unless you care about decompression speed...

Which bring me back to the fact that there is not such a thing as"the solution" in data compression. But more importantly: it was just an example to show an idea; and actually a pretty good one (run length encoding)

qwertyuiop924 · on Nov 4, 2016

But actually, no. Because you could set up HTTHTTTHHHHHHHHHH the format like this:

  01001000
  11001010

That's sixteen bits for 17 coinflips. With no continuous sequences longer than seven, this format takes up one extra bit every seven flips.

How does it work? The first bit is a sign bit. If it's zero, the next seven bits are raw coinflips, 0 for tails, 1 for heads. If it's one, the second bit signifies whether the next sequence consists of heads or tails (again, 0 for tails, 1 for heads), and the remaining six bits tell how long said sequence is.

This is a fairly simple encoding for the strategy described in the article, which I thought of off the top of my head in about five minutes, and I know nothing about compression. It's slightly broken (what if the sequence ends mid-byte?), but it does kind of work. Somebody who actually knows about compression could probably do this better.

andrepd · on Nov 4, 2016

I know that, the point is that this kind of stuff needs some thought, it's not so simple as "HTHTHTHT" = "HT five times". The article kind of glosses over that.

setr · on Nov 4, 2016

In the context of the analogy, it's probably better to read it as saving time to human-parse rather than the space required to send. (And it definitely takes less time to verbally state, even if the sentence is clearly longer; caching) The general idea is the same through; compression by describing patterns rather than explicitly stating the event

qwertyuiop924 · on Nov 4, 2016

Well, yes. Many programmers have trouble mapping ideas into bits: imagine how hard it is for people aren't in programming?

thesimpsons5454 · on Nov 4, 2016

you know it was just an example in terms of what you might tell a friend over the phone about a coin flip right? i suppose you'd send your friend a huffman tree and then say "1110"

pyre · on Nov 4, 2016

This entire thread makes me think of that Silicon Valley scene[1].

[1] https://www.youtube.com/watch?v=-5jF5jtMM_4

csomar · on Nov 4, 2016

wouldn't you just convert the alphabet to 0-1? Shouldn't the 0-1 compression be the most optimal? That is, you can't find a better deal with any alphabets or numerical?

conradev · on Nov 4, 2016

That's how compression was first explained to me, and it really stuck with me ever since. It was in the context of an image though, and instead of heads, it was red pixels.

What's really cool is that the simple explanation can be extended to explain things like why ciphertext doesn't compress well: because ciphertext has no patterns

ghayes · on Nov 4, 2016

Not compression per se, but I remember when I was reverse engineering maps for Chip's Challenge. I would often see a tile (represented as a byte) that was encoded as 0xFF0502. I ended up realizing it meant "Repeat 'tile 2' 5 times." It was fun to figure that out as a kid.

agumonkey · on Nov 4, 2016

RLE a given. It's true that the average person rarely understand that this is what computers call compression, but everything after that involves a bit of thinking. Optimal huffman.

userbinator · on Nov 5, 2016

I think the LZ family are all pretty intuitive --- just replace repeated sequences with a reference to where they occurred before. Even human languages have things like contractions, abbreviations, and acronyms. Thus it is, at least to me, somewhat surprising that LZ was only documented academically several decades after Huffman; perhaps it was thought to be too trivial? LZ can also be thought of as an extension of RLE.

In any case, an LZ decompressor fits in less than 100 bytes of machine instructions and a simple compressor can be implemented in not more than several hundred, all the while providing extremely high compression for its simplicity. It will easily outperform order-0 static or dynamic Huffman on practical files like English text, and would probably make a good assignment in an undergraduate-level beginning data structures/algorithms/programming course; yet it seems more popular to give an assignment on Huffman using trees, which is somewhat ironic since in the real world Huffman is implemented using bit operations and lookup tables, not actual tree data structures.

To give a trivial example, LZ will easily compress ABCDABCDABCDABCD while order-0 Huffman can't do much since each individual symbol has the same frequency.

pelario · on Nov 6, 2016

Another LZ fan here.

My guess is that the "late" development of LZ is due to mainly two reasons:

i) At that moment the pattern matching algorithms were not so advanced. E.g. suffix tree was very recent, and in the next years lots of advances occurred in that area...

ii) Although LZ can appear easier or more intuitive than Huffman, I think it is much less intuitive to prove a good bound in the compression achieved by LZ. OTOH, Huffman is build in a way that shows that it achieves zeroth-order compression.

kbenson · on Nov 4, 2016

The DEFLATE[1] algorithm is actually fairly accessible, and will give a good idea of how compression works.

1: https://en.wikipedia.org/wiki/DEFLATE

hobo_mark · on Nov 4, 2016

I liked this explanation of DEFLATE here on HN a few months ago:

https://news.ycombinator.com/item?id=12334270

orlp · on Nov 4, 2016

IMO Huffman is conceptually more complicated (not the implementation, but the logic) than arithmetic coding.

And Huffman isn't optimal unless you are lucky, unlike arithmetic coding.

agumonkey · on Nov 4, 2016

I never learned AC. It's on my overflowing stack of thing to read about.

orlp · on Nov 4, 2016

AC is conceptually stupidly simple. All you do is encode a string of symbols into a range of real numbers.

To start your range is [0, 1). For each symbol you want to encode you take your range and split it up according to your probabilities. E.g. if your symbols are 25% A, 50% B and 25% C, then you split up that range in [0, 0.25) for A, [0.25, 0.75) for B and [0.75, 1) for C.

Encoding multiple symbols is just applying this recursively. So to encode the two symbols Bx we split up [0.25, 0.75) proportionally just like we did [0, 1) before to encode x (where x is A, B or C).

As an example, A is the range [0, 0.25), and AC is the range [0.1875, 0.25).

Now to actually turn these ranges into a string of bits we choose the shortest binary representation that fits within the range. If we look at a decimal number:

    0.1875

We know that this means 1/10 + 8/100 + 7/1000 + 5/10000. A binary representation:

    0.0011

This means 0/2 + 0/4 + 1/8 + 1/16 = 0.1875. So we encode AC as 0011.

---

The beauty of arithmetic coding is that after encoding/decoding any symbol we can arbitrarily change how we split up the range, giving rise to adaptive coding. Arithmetic coding can perfectly represent any data that forms a discrete string of symbols, including changes to our knowledge of data as we decode.

Dylan16807 · on Nov 4, 2016

Or on a more abstract level to compare to Huffman encoding: Huffman turns each symbol into a series of bits like "011". Arithmetic encoding lets you use fractional bits.

A Huffman tree for digits might assign 0-5 to 3 bits and 6-9 to 4 bits. Encoding three digits will use on average slightly more than 10 bits. Using AC will let you give the same amount of space to each possibility, so that encoding three digits always uses less than 10 bits.

titanomachy · on Nov 4, 2016

Nice explanation. Can you explain how to remove ambiguity relating to string length?

"0" = 0.0b = 0 falls in the range [0,0.25) so it's a valid encoding for "A"; but isn't it also a valid encoding for "AA", "AAA", etc.?

AA = [0,0.25) * [0, 0.25) = [0, 0.125), and so on.

It seems that adding "A"s to a string in general doesn't change its encoding.

Dylan16807 · on Nov 4, 2016

You either reserve a symbol for "end of stream" or externally store the length.

It's the equivalent to pretending a Huffman stream never ends and is padded with infinite 0s.

titanomachy · on Nov 4, 2016

Huffman seems simpler to me, but I've implemented both at various times so that might colour my perspective.

orlp · on Nov 4, 2016

AC implementation is actually quite tricky, but conceptually IMO it's much simpler and more elegant than Huffman.

mamurphy · on Nov 4, 2016

I definitely wouldn't say "HHHHHHHHH," since I tossed it 10 times, not 9.

Saying "10 tosses, all heads" reduces the chance of omitting a toss in data entry, which is all to the better.

partomniscient · on Nov 5, 2016

You're making the assumption the other party knows English, rather than say the abstraction of 'coinflip' which in itself can be abstracted. Do they understand the concept of fairness - is it even odds or not? There's a reason that numbers are considered a more universal 'language' than other forms of communication.

jhoechtl · on Nov 5, 2016

You likely enjoy this read too:

http://antirez.com/news/75

On the HyperLogLog algorithm to count things.

jayd16 · on Nov 4, 2016

I would probably say I found a two headed coin. I also like how "HHHHHHHHHH" is shorter than "I tossed it 10 times, all heads"

robotresearcher · on Nov 4, 2016

The all-heads sequence is exactly as likely as any other sequence.

pasquinelli · on Nov 5, 2016

right, but you can also say there are many sequences that aren't all heads.

err4nt · on Nov 4, 2016

Which is shortest, you can recover the original dataset from any of the following:

- heads, heads, heads, tails, tails, heads

- hhhtth

- h3t2h

- 3b1

crimsonalucard · on Nov 4, 2016

FAIL. "10 tosses, all heads" is 20 characters while "HHHHHHHHHH" is 10 characters. You've conducted expansion rather than compression.

mort96 · on Nov 5, 2016

That example is about conveying the information verbally to another human, so syllables is interesting, not characters. "h h h h h h h h h h" is 10 syllables, "10 tosses, all heads" is 4 (and could be compressed further to "10 heads").

Retra · on Nov 5, 2016

"10H" is the string you should be comparing it to (or something equivalent.)

tracker1 · on Nov 5, 2016

If we're going there, H10 may be more efficient.... h10t5h1... Thinking the heads vs tails expression would be before the iteration as a better use case.

Retra · on Nov 5, 2016

Depends which way you're iterating.

userbinator · on Nov 4, 2016

The lossy transform is important, but I think what's actually most important in video compression is getting rid of redundancy --- H.264 actually has a lossless mode in which that transform is not used, and it still compresses rather well (especially for noiseless scenes like a screencast.) You can see the difference if you compare with something like MJPEG which is essentially every frame independently encoded as a JPEG.

The key idea is to encode differences; even in an I-frame, macroblocks can be encoded as differences from previous macroblocks, and with various filterings applied: https://www.vcodex.com/h264avc-intra-precition/ This reduces the spatial redundancies within a frame, and motion compensation reduces the temporaral redundancies between frames.

You can sometimes see this when seeking through video that doesn't contain many I-frames, as all the decoder can do is try to decode and apply differences to the last full frame; if that isn't the actual preceding frame, you will see the blocks move around and change in odd ways to create sometimes rather amusing effects, until it reaches the next I-frame. The first example I found on the Internet shows this clearly, likely resulting from jumping immediately into the middle of a file: http://i.imgur.com/G4tbmTo.png That frame contains only the differences from the previous one.

As someone who has written a JPEG decoder just for fun and learning purposes, I'm probably going to try a video decoder next; although I think starting from something simpler like H.261 and working upwards from there would be much easier than starting immediately with H.264. The principles are not all that different, but the number of modes/configurations the newer standards have --- essentially for the purpose of eliminating more redundancies from the output --- can be overwhelming. H.261 only supports two frame sizes, no B-frames, and no intra-prediction. It's certainly a fascinating area to explore if you're interested in video and compression in general.

LukeShu · on Nov 4, 2016

> MJPEG which is essentially every frame independently encoded as a JPEG.

"essentially" makes it sound like it isn't precisely true. MJPEG is literally just a stream of JPEG images. The framing of the stream varies a bit, but many implementations are just literal JPEG images bundled one after the other into a MIME "multipart/x-mixed-replace" message.

logicallee · on Nov 4, 2016

This is really interesting and the imgur picture you linked (with your explanation) explains it really clearly!

But when seeking, why wouldn't any local media playback seek backwards and reconstruct the full frame? It's not like the partial frame after seeking is useful - I'd rather wait 2 seconds while it scrambles (i mean "hurries up") to show me a proper seek, wouldn't everyone?

What was your Internet search for finding that imgur frame? What is this effect called?

noisem4ker · on Nov 4, 2016

>why wouldn't any local media playback seek backwards and reconstruct the full frame?

Most codecs/players do. VLC used to be criticized for being different in that regard. One possible advantage is istantaneous seeking, as there's no need to decode all the needed frames (which could amount to several seconds of video) between the nearest I-frames[1] (the complete reference pictures) and the desired one.

[1]: plural, because prediction can also be bidirectional in time

The use of incomplete video frame data for artistic purposes is called "datamoshing".

kakarot · on Nov 4, 2016

I try to use VLC when I can because it offers intuitive playlist support, but for high-resolution H.264 and friends I usually have to switch to Media Player Classic.

VLC is willing to let my entire screen look like a blob of grey alien shit for 10 seconds instead of just taking a moment to reconstruct frames.

And its hardware acceleration for newer codecs is balls. Sucks because otherwise, it's right up there with f2k for me.

TheAceOfHearts · on Nov 4, 2016

I stopped using VLC when I found mpv [0]. I really like it because it exposes everything from the CLI, so once you're familiarized with the flags you're interested in using, it's easy to play anything. For everyday usage it "just works" too, as expected of any video player.

[0] https://mpv.io/

nitrogen · on Nov 4, 2016

How does it compare to mplayer? My biggest complaint about mplayer is it still doesn't play VFR videos well.

77pt77 · on Nov 5, 2016

I've tried it.

* Sane defaults (encodings and fonts, scaletempo for audio)

* instantaneous play of next and previous videos

* navigation in random playlist actually works

* Easy always on top key binding

* Most mplayer key bindings work

I'll definitely keep on trying it for a while.

Drdrdrq · on Nov 5, 2016

Does it include all the codecs by default? I think this was a major reason VLC succeeded the way it did. With all other players (BPlayer anyone?) you needed to find and install tons of codecs while in VLC it just worked.

77pt77 · on Nov 5, 2016

It has played everything I've thrown at it so far...

kakarot · on Nov 4, 2016

I'll check it out and let you know what I think. Thanks~

kakarot · on Nov 6, 2016

man... that's a big manual :)

I think I can find some use for this in certain situations. Still lacks a good playlist building schema.

logicallee · on Nov 4, 2016

>VLC is willing to let my entire screen look like a blob of grey alien shit for 10 seconds instead of just taking a moment to reconstruct frames.

Yes, this is what I was talking about, and yes, specifically for VLC. Plus it's not like playback is so taxing that all cores are pegged at 100% during playback. When I seek, VLC should get off its ass and scramble to come up with the correct full frame then. I'll wait.

jdmichal · on Nov 4, 2016

I recently bought a camera that has 4k video recording. VLC just gives up playing the video. Even Windows Media Player can handle it. No idea what's going on, but I was really surprised and disappointed with VLC.

voltagex_ · on Nov 6, 2016

See if you can cut a small segment and submit it as a sample to ffmpeg. Hell, see if ffprobe and ffmpeg can play it. Happy to help, if you've got enough upstream bandwidth.

jdmichal · on Nov 7, 2016

Sure. I'll give those a try tonight. I assume if it works in `ffplay` directly, there's no need to submit it?

rahkiin · on Nov 5, 2016

Isn't the other advantage that VLC can play incomplete movie files? Any other players I have tried 'crash' on incomplete torrents, when VLC just fails until it finds the next I frame.

tveita · on Nov 4, 2016

"datamoshing" is a term I've heard for people deliberately removing I-frames, so P-frames are applied to the wrong base image.

thevibesman · on Nov 5, 2016

In a course I taught (2010) on music visualizations that's the term I used.

The example I used in the lecture where datamoshing came up was the music video for Charlift's "Evident Utensil"[1]; I always thought this was a neat example.

[1]: https://www.youtube.com/watch?v=mvqakws0CeU

dmihal · on Nov 5, 2016

Two more examples of datamoshing in music videos:

Kanye West's "Welcome to Heartbreak" (https://www.youtube.com/watch?v=wMH0e8kIZtE)

A$AP Mob's "Yamborghini High" (https://www.youtube.com/watch?v=tt7gP_IW-1w)

szemet · on Nov 4, 2016

I thought I'll learn something special about H.264, but all information here is high level and generic.

For example if you replace H.264 with a much older technology like mpeg-1 (from 1993) every sentence stays correct, except this:

"It is the result of 30+ years of work" :)

warpzero · on Nov 4, 2016

I was a bit disappointed in this article for the same reason: this is a great primer for people new to MPEG video compression, but it doesn't have anything to do with H.264.

I was hoping the author would write about H.264 specifically, for instance, how it was basically the "dumping ground" of all the little tweaks and improvements that were pulled out of MPEG-4 for one reason or another (usually because they were too computationally expensive), and why, as a result, it has thousands of different combinations of features that are extremely complicated to support, which is why it had to be grouped into "profiles" (e.g., Baseline, Main, High): http://blog.mediacoderhq.com/h264-profiles-and-levels/

I was also hoping that he would at least touch on the features that make H.264 unique from previous MPEG standards, like in-loop deblocking, CABAC Entropy Coding, etc..

Again, it's fine as an introduction to video encoding, but there's nothing in here specific to H.264.

striking · on Nov 4, 2016

Sure, but also keep in mind that the technology hasn't changed much over time. Even HEVC, which causes extreme gains in compression on high-res video with minimal loss in quality, is still mostly the same algorithm as H.264 but with larger blocks, slightly more flexible coding units rather than frame-wide interpolation changes, and 35 rather than 9 directions recognized for predictions.

phire · on Nov 4, 2016

Also the fleeting mention of b-frames, which mpeg-1 doesn't have. And I believe mpeg-1 doesn't use 16×16 macroblocks.

Still, it's a good overview of generic video compression.

szemet · on Nov 4, 2016

https://en.wikipedia.org/wiki/MPEG-1#B-frames

https://en.wikipedia.org/wiki/MPEG-1#Macroblocks

BEEdwards · on Nov 4, 2016

"This post will give insight into some of the details at a high level - I hope to not bore you too much with the intricacies."

Did you miss the third paragraph?

As someone who knew nothing about it before, I found it lived up to it's goal.

iainmerrick · on Nov 4, 2016

I think they just mean it shouldn't be "H.264 is magic", it should just be "video compression is magic" or some such. That irked me a little bit too.

amluto · on Nov 4, 2016

Nice article! The motion compensation bit could be improved, though:

> The only thing moving really is the ball. What if you could just have one static image of everything on the background, and then one moving image of just the ball. Wouldn't that save a lot of space? You see where I am going with this? Get it? See where I am going? Motion estimation?

Reusing the background isn't motion compensation -- you get that by encoding the differences between frames so unchanging parts are encoded very efficiently.

Motion compensation is when you have the camera follow the ball and the background moves. Rather than encoding the difference between frames itself, you figure out that most of the frame moved and you encode the different from one frame to a shifted version of the blocks from a previous frame.

Motion compensation won't work particularly well for a tennis ball because it's spinning rapidly (so the ball looks distinctly different in consecutive frames) but more importantly because the ball occupies a tiny fraction of the total space so it doesn't help that much.

Motion compensation should work much better for things like moving cars and moving people.

erydo · on Nov 4, 2016

Your example seems to assume translation only. I wonder how difficult/useful it would be to identify other kinds of time-varying characteristics (translation, rotation, scale, hue, saturation, brightness, etc) of partial scene elements in an automated way.

Along the same lines, it would be interesting to figure out an automated time-varying-feature detection algorithm to determine which kinds of transforms are the right ones to encode.

Do video encoders already do something like this? It seems like a pretty difficult problem since there are so many permutations of applicable transformations.

Animats · on Nov 4, 2016

I wonder how difficult/useful it would be to identify other kinds of time-varying characteristics (translation, rotation, scale, hue, saturation, brightness, etc) of partial scene elements in an automated way.

That's how Framefree worked. It segments the image into layers, computes a full morph, including movement of the boundary, between successive frames for each layer, and transmits the before and after for each morph. Any number of frames can be interpolated between keyframes, which allows for infinite slow motion without jerk.[1] You can also upgrade existing content to higher frame rates.

This was developed back in 2006 by the Kerner Optical spinoff of Lucasfilm.[2] It didn't catch on, partly because decompression and playback requires a reasonably good GPU, and partly because Kerner Optical went bust. The segment-into-layers technology was repurposed for making 3D movies out of 2D movies, and the compression product was dropped. There was a Windows application and a browser plug-in. The marketing was misdirected - somehow, it was targeted to digital signs with limited memory, a tiny niche.

It's an idea worth revisiting. Segmentation algorithms have improved since 2006. Everything down to midrange phones now has a GPU capable of warping a texture. And it provides a way to drive a 120FPS display from 24/30 FPS content.

[1] http://creativepro.com/framefree-technologies-launches-world... [2] https://web.archive.org/web/20081216024454/http://www.framef...

danieltillett · on Nov 5, 2016

John do you know where all the patents on Framefree ended up?

Animats · on Nov 5, 2016

Ask Tom Randoph, who was CEO of FrameFree. He's now at Quicksilver Scientific in Denver.

Animats · on Nov 5, 2016

Some venture IP company in Tokyo called "Monolith Co." also had rights in the technology.[1] "As of today (Sept. 5, 2007), the company has achieved a compression rate equivalent to that of H.264 and intends to further improve the compression rate and technology, Monolith said."[2] (This is not Monolith Studios, a game development company in Osaka.) Monolith appears to be defunct.

The parties involved with Framefree were involved in fraud litigation around 2010.[3] The case record shows various business units in the Cayman Islands and the Isle of Jersey, along with Monolith in Japan and Framefree in Delaware. No idea what the issues were. It looks like the aftermath of failed business deals.

The inventors listed on the patents are Nobuo Akiyoshi and Kozo Akiyoshi.[4]

[1] https://www.youtube.com/watch?v=VBfss0AaNaU [2] http://techon.nikkeibp.co.jp/english/NEWS_EN/20070907/138905... [3] http://www.plainsite.org/dockets/x8gi572m/superior-court-of-... [4] http://patents.justia.com/inventor/nobuo-akiyoshi

danieltillett · on Nov 6, 2016

Great dectective work. I suspect the IP is now a total mess - with luck nobody has been paying the patent renewal fees and everything is now free.

TD-Linux · on Nov 4, 2016

Most codecs split the image into prediction blocks (for example, 16x16 for MPEG-2, or from 4x4 to 64x64 for VP9). Each of these blocks has its own motion vector. All of the transformations you mentioned look like a translation if you look at them locally, so they can all be fairly well represented by this. Codecs have, in the past, attempted global motion compensation, which tries to fully model a camera (rotating, translating, lens distortion, zooming) but all of those extra parameters are very difficult to search for.

Daala and AV1's PVQ is an example of a predictor for contrast and brightness (in a very broad sense).

astrange · on Nov 4, 2016

Yes, H.264 has brightness/fade compensation for past frames. It's called "weighted prediction".

The previous codec MPEG4 part 2 ASP (aka DivX&XviD) had "global motion compensation" which could encode scales and rotation, but like most things in that codec it was broken in practice. Most very clever ideas in compression either take too many bits to describe or can't be done in hardware.

userbinator · on Nov 5, 2016

It seems like a pretty difficult problem since there are so many permutations of applicable transformations.

That's part of why video encoding can be very slow --- with motion compensation, to produce the best results the encoder should search through all the possible motion vectors and pick the one that gives the best match. To speed things up, at a slight cost in compression ratio, not all of them are searched, and there are heuristics on choosing a close-to-optimal one instead: https://en.wikipedia.org/wiki/Block-matching_algorithm

amluto · on Nov 4, 2016

Now I'm out of my depth, but I think motion compensation does okay at rotation and scaling. The motion vector varies throughout the frame, and I think codecs interpolate it, so all kinds of warping can be represented.

nitrogen · on Nov 4, 2016

As evidence of this, sometimes when an I-frame is dropped from a stream or you jump around in a stream you can see the texture of what was previously on the screen wrapped convincingly around the 3D surface of what's now supposed to be on the screen, all accomplished with 2D motion vectors.

adilparvez · on Nov 4, 2016

Related, how h265 works: http://forum.doom9.org/showthread.php?t=167081

This is a great overview and the techniques are similar to those of h264.

I found it invaluable to get up to speed when I had to do some work on the screen content coding extensions of hevc in Argon Streams. They are a set of bit streams to verify hevc and vp9, take a look, it is a very innovative technique:

http://www.argondesign.com/products/argon-streams-hevc/ http://www.argondesign.com/products/argon-streams-vp9/

agumonkey · on Nov 4, 2016

Heh, happy to see doom9 still alive and kicking. They were the n°1 resource in the early days of mainstream video compression.

IshKebab · on Nov 4, 2016

It's not really alive and kicking. The forum is still active but the rest of the site hasn't been touched since 2008.

woliveirajr · on Nov 4, 2016

I love how you can edit photos from people to correct some skin imperfections without loosing the touch that the image is real (and not that blurred, plastic look) when you decompose it in wavelets and just edit some frequencies.

Don't know in photoshop, but in Gimp there's a plugin called "wavelet decomposer" that does that.

avian · on Nov 4, 2016

I guess this is the plugin you are talking about? Interesting.

http://registry.gimp.org/node/11742

woliveirajr · on Nov 4, 2016

Exactly that.

There was a question about retouching photos some while ago (http://photo.stackexchange.com/questions/48999/how-do-i-take...) that using wavelets was a good use of it.

zeveb · on Nov 4, 2016

Well, that's the most awesome thing I've seen in a long time! Thanks for sharing.

mherrmann · on Nov 4, 2016

I recently experienced this as follows: https://www.sublimetext.com has an animation which is drawn via JavaScript. In essence, it loads a huge .png [1] that contains all the image parts that change during the animation, then uses <canvas> to draw them.

I wanted to recreate this for the home page of my file manager [2]. The best I could come up with was [3]. This PNG is 900KB in size. The H.264 .mp4 I now have on the home page is only 200 KB in size (though admittedly in worse quality).

It's tough to beat a technology that has seen so much optimization!

1: http://www.sublimetext.com/anim/rename2_packed.png

2: https://fman.io

3: https://www.dropbox.com/s/89inzvt161uo1m8/out.png?dl=0

hrjet · on Nov 5, 2016

You could give FLIF [1] a try. With the help of Poly-FLIF [2] you can render it in the browser. Don't forget to try the lossy mode, it gives better compression with negligible loss in quality.

1: http://flif.info

2: https://github.com/UprootLabs/poly-flif/

the8472 · on Nov 4, 2016

> Chroma Subsampling.

Sadly, this is what makes video encoders designed for photographic content unsuitable for transferring text or computer graphics. Fine edges, especially red-black contrasts start to color-bleed due to subsampling.

While a 4:4:4 profile exists a lot of codecs either don't implement it or the software using them does not expose that option. This is especially bad when used for screencasting.

Another issue is banding, since h.264's main and high profiles only use 8bit precision, including for internal processing, and the rounding errors accumulate, resulting in banding artifacts in shallow gradients. High10 profile solves this, but again, support is lacking.

wmil · on Nov 4, 2016

This is also the bane of powerpoint presentations. Many TVs only support 4:2:0, so red on black text quickly becomes an smudgey mess.

astrange · on Nov 4, 2016

It's easy to make a 4:2:0 upscaler that doesn't color bleed. Everyone just uses nearest-neighbor, which sucks, and then blames the other guy.

shabbyrobe · on Nov 4, 2016

How would you make a 4:2:0 upscaler that doesn't color bleed?

astrange · on Nov 5, 2016

50% solution: bicubic or bilinear. 90% solution: EEDI3. (kinda slow) 99% solution: use the full resolution Y plane for edge-direction.

the8472 · on Nov 5, 2016

I don't think that can accurately restore the details that have been created by subpixel-AA font rendering.

But if you have source/subsampled/interpolated comparisons that show 99% identical results i would be interested to see them.

Of course all that is useless if you don't have control over the output device. Just having the ability to record 4:4:4 makes the issue go away as long as the target can display it, no matter what interpolation they use.

dluan · on Nov 4, 2016

By the way, this is an incredible example of scientific writing done well. It's very tangible jelly-like feeling that the author clearly has for the topic, conveyed well to the readers. This whole thread is people excited about a video codec!

LASR · on Nov 4, 2016

Thank you! It means a lot to me. Yes, I try to convey my sense of excitement about technology to other people.

titanomachy · on Nov 4, 2016

> This whole thread is people excited about a video codec

That's not really a weird thing on HN though. Video codecs are exactly the kind of thing that we get excited about.

algesten · on Nov 4, 2016

"See how the compressed one does not show the holes in the speaker grills in the MacBook Pro? If you don't zoom in, you would even notice the difference. "

Ehm, what?! The image on the right looks really bad and the missing holes was the first thing I noticed. No zooming needed.

And that's exactly my problem with the majority of online video (iTunes store, Netflix, HBO etc). Even when it's called "HD", there are compression artefacts and gradient banding everywhere.

I understand there must be compromises due to bandwidth, but I don't agree on how much that compromise currently is.

coryfklein · on Nov 4, 2016

Of course while reading the article you are going to be very conscious of detail and image quality because that is the subject matter of the post.

However if that MacBook Pro image was placed on the side of an article where the primary content was the text you were reading, you'd glance at the image and your brain would fill in the details for you. You probably wouldn't notice the difference in that context.

For most use cases, there likely is very little functional difference between the two images. At least, that was how I understood it.

StillBored · on Nov 4, 2016

I find the 480p setting on my plex server at home actually looks better than most of the 1080p HD streams on the internet.

Although to be fair, I suspect that a lot of times what I'm looking at are mpeg videos that have been recompressed a half dozen or more times with different encoders. Each encoder having prioritized different metrics. So, the the quality gets worse until it doesn't really matter how good the compression algorithm is. Each new re-compression is basically spending 3/4 of its bits maintaining the compression artifacts from the previous two passes.

dosshell · on Nov 4, 2016

>No zooming needed

Isn't the images above the text a zoomeed version?

>Here is a close-up of the original...

algesten · on Nov 4, 2016

I took it to mean that we had to zoom to see that the holes were gone in the compressed version.

iotscale · on Nov 4, 2016

It indeed is, as long as we accept the first screenshot as the normal scale.

orting · on Nov 4, 2016

The first thing I noticed was the ringing, which is an artifact of low-pass filtering so it's a nice opportunity to go into problems with that kind of filtering. Other than that I think it was an ok teaser that gives an idea of how compression is done and what the trade-offs are.

sundvor · on Nov 4, 2016

Yep, me too - more like, if I was blind, I wouldn't notice the difference. Which is why the bitrate is always the first thing I look at when sourcing video.

tracker1 · on Nov 5, 2016

But given other settings with even h.264 vs. h.265 and the source content, that isn't always a valid metric either.

I mean for fast action scenes, I rarely notice the difference between 720p and 1080p at 10ft away... but different encoding and sources, not just size alone can make significant differences.

exprA · on Nov 4, 2016

There's another false claim like that a bit below. I can only assume that the author is close to legally blind, or uses a VGA-res display to watch the page.

eutectic · on Nov 4, 2016

Anyone who likes this would probably also enjoy the Daala technology demos at https://xiph.org/daala/ for a little taste of some newer, and more experimental, techniques in video compression.

dao- · on Nov 4, 2016

Note that Daala has been discontinued in favor of AV1: https://en.wikipedia.org/wiki/AOMedia_Video_1

Previously Daala was presented as a candidate for NETVC but apparently this didn't go anywhere? https://en.wikipedia.org/wiki/NETVC

TD-Linux · on Nov 4, 2016

A lot of Daala tools are now being copied into AV1, the largest being PVQ: https://aomedia-review.googlesource.com/#/c/3220/

eutectic · on Nov 4, 2016

The demos are still neat, and some of the ideas are being used in AV1.

mbebenita · on Nov 4, 2016

Daala continues to be research platform for new ideas. New techniques are a lot easier to prototype in Daala than in more mature code bases.

ZeroGravitas · on Nov 4, 2016

I'm not sure if there's been an official announcement, but I had assumed that AV1 was going to be adopted/ratified as NETVC so that it's got a standards body rubber stamp, as well as the practical adoption/support from browser vendors, GPU manufacturers, streaming sites etc.

hal9000xp · on Nov 4, 2016

Just yesterday, I've read this one:

http://web.cs.ucla.edu/classes/fall03/cs218/paper/H.264_MPEG...

How does DCT work:

https://www.youtube.com/watch?v=Q2aEzeMDHMA&

alexandrerond · on Nov 4, 2016

Very well explained. But I could have understood it all without the bro-approach to the reader. You see where I am going with this? Get it? See where I am going? Ok!

alexk307 · on Nov 4, 2016

Maybe I'm in the minority here but I think it adds a bit of color to an otherwise dry topic to write about.

nothrabannosir · on Nov 4, 2016

I remember loving this style when I was a novice, e.g. Beej's networking tutorial. Not a big fan anymore, either, but certainly valuable for (part of) the target audience, I think.

ComodoHacker · on Nov 4, 2016

Today you need tricks like that to get Twitter-generation people to read even half of that amount of text.

spacehacker · on Nov 4, 2016

The part about entropy encoding only seems explain run-length encoding (RLE). Isn't the interesting aspect of making use of entropy in compression rather to represent rarer events with longer longer code strings?

The fair coin flip is also an example of a process that cannot be compressed well at all because (1) the probably of the same event happening in a row is not as high as for unfair coins (RLE is minimally effective) and (2) the uniform distribution has maximal entropy, so there is no advantage in using different code lengths to represent the events. (Since the process has a binary outcome, there is also nothing to gain in terms of code lengths for unfair coins.)

on Nov 4, 2016

[deleted]

LASR · on Nov 4, 2016

Thank you! I hope you now share my feeling of awe over this technical wonder.

willvarfar · on Nov 4, 2016

Excellent! :)

It would be really cool to further extend it showing actually how the various tiles are encoded and between frames, something along the lines of: http://jvns.ca/blog/2013/10/24/day-16-gzip-plus-poetry-equal...

H.265 can even do deltas between blocks in the same frame, IIRC, and is excellent for still image compression too.

LASR · on Nov 4, 2016

H.265 is my next post :)

devy · on Nov 4, 2016

So if H.264 is magic, what is H.265 then? :)

CuGi · on Nov 4, 2016

Excuse me, i would like my account back.

john111 · on Nov 4, 2016

Can someone explain how the frequency domain stuff works? I've never really understood that, and the article just waves it away with saying it's like converting from binary to hex.

rayiner · on Nov 4, 2016

It's a bad analogy. Binary and hex are just different formats for representing the same number. Spatial domain and frequency domain are different views of a complex data set. In the spatial domain, you are looking at the intensity of different points of the image. In the frequency domain, you are looking at the frequencies of intensity changes in patterns in the image.

A good way to develop an intuition for the fourier space is to look at simple images and their DFT transforms: http://web.cs.wpi.edu/~emmanuel/courses/cs545/S14/slides/lec... (3/4 of the way through the slide deck).

This analysis of a "bell pepper" image and its transform is also helpful: https://books.google.com/books?id=6TOUgytafmQC&pg=PA116&lpg=....

As for why you want to do this: throwing away bits in the spatial domain eliminates distinctions between similar intensities, making things look blocky. In the frequency domain, however, you can throw away high-frequency information, which tends to soften patterns like the speaker grills in the MBP image that the human eye isn't that sensitive to to begin with.

astrange · on Nov 4, 2016

> Spatial domain and frequency domain are different views of a complex data set.

Or in this case, a real data set.

jerf · on Nov 4, 2016

The search keyword to learn about it is Fourier Transform: https://en.wikipedia.org/wiki/Fourier_transform

Along with the Wikipedia article and the obvious Internet search, there's a lot of good stuff that has been on HN: https://hn.algolia.com/?query=fourier%20transform&sort=byPop...

orting · on Nov 4, 2016

Basically we can represent any signal as an infinite sum of sinusoids. If you know about Taylor expansion of a function, then you know that the first order term is the most important, then the second and so on. Same principle with the sinusoids. So if we remove the sinusoids with very high frequency we remove the terms with least information.

TD-Linux · on Nov 4, 2016

Image and video codecs don't actually use the fourier transform as presented in the article, they use the DCT. Check out the example section on Wikipedia: https://en.wikipedia.org/wiki/Discrete_cosine_transform

The JPEG article also has a very good, step by step example of the DCT, followed by quantization and entropy coding: https://en.wikipedia.org/wiki/JPEG

Koshkin · on Nov 4, 2016

In the most basic terms, not even talking about frequency, the mechanics of this is that one series of numbers (pixel values, audio samples, etc.) is replaced, according to some recipe or another, with a different series of numbers from which the original can be recovered (using a similar "inverse" recipe). The benefit of doing this comes from the discovery that this new series has more redundancy in it and can be compressed more efficiently than the original, and even if some of the data are thrown away at this point, the purpose of which is to make compression even more effective, the original can still be recovered with high fidelity.

raverbashing · on Nov 4, 2016

It's the Fourier transform basically. There were even some past links on HN that explained it nicely so you might check those first

(Though for images it's in 2D, not 1D which is more commonly done)

amelius · on Nov 4, 2016

> discard information which will contain the information with high frequency components. Now if you convert back to your regular x-y coordinates, you'll find that the resulting image looks similar to the original but has lost some of the fine details.

I would expect also the edges in the image to become more blurred, as edges correspond to high-frequency content. However, this only seems to be slightly the case in the example images.

shoggs · on Nov 4, 2016

This is probably because in the sample image you have clean vertical edges. It's pretty easy to represent these edges with a waveform.

LordDragonfang · on Nov 4, 2016

You can see exactly that with the speaker grill and the text (This type of transformation is notoriously bad at compressing images of text, and is why you shouldn't use jpg for pictures of text)

In this context, the edges of, say, the macbook are not "high frequency" content, since they only feature one change (low to high luminosity) in a given block rather than several (high-low-high-low-high) like for the grill.

amelius · on Nov 5, 2016

You should have a look at the Fourier transform of a step-function. It has high frequency components.

LASR · on Nov 4, 2016

You're right! The images that I am using are zoomed cropped sections of a much larger image of the entire Apple home page.

amelius · on Nov 4, 2016

What are directions for the future? Could neural networks become practically useful for video compression? [1]

[1] http://cs.stanford.edu/people/eroberts/courses/soco/projects...

jerf · on Nov 4, 2016

Suppose I have a table of 8-digit numbers that I need to add and subtract for various reasons. Do I A: have a child, train them how to read numbers, add, and subtract, and then have the child do it or B: use a calculator purpose built to add and subtract numbers?

Neural nets are always expensive to train. You'd better be getting something from them that you can't get some other way.

gugagore · on Nov 4, 2016

Yes, you don't need the machinery of learning when you already have an algorithm you're happy with. Adding a table of numbers, I don't think anyone hopes to do much better than we already do with our circuits and computer architectures.

With video compression, I think most would agree that there might be better architectures/algorithms that we haven't stumbled upon yet. Whether specifically "neural networks" will be the shape of a better architecture, I don't know. But almost surely some meta-algorithm that can try out tons of different parameters/data-pipeline-topologies for something that vaguely resembles h.264 might find something better than h.264.

Neural nets are expensive to train. But so is designing h.264.

Ajedi32 · on Nov 4, 2016

Google has been experimenting with image compression using AI for a while now: https://research.googleblog.com/2016/09/image-compression-wi...

p0nce · on Nov 4, 2016

Perhaps for intra-prediction. I wouldn't hold my breathe.

iplaw · on Nov 4, 2016

H.265 gets you twice the resolution for the same bandwidth, or the same resolution for half the bandwidth.

ksec · on Nov 4, 2016

H.265 gets you half the file size for ten times more in royalty fees, or saving 50% of bandwidth for 1000% more in royalty.

_puk · on Nov 4, 2016

Do you have a reference for that?

I was under the impression that the first 100,000 units are free, and then 20c per unit afterwards to a max of $25m.

H264 drops to 10c per unit after 5m units, to a max of $6.5m.

You need to be shipping 125 million units annually to hit the full $25m.

Yes it's more, but it's not quite ten times. And notably if the chip maker pays the royalties, then the content creators don't need to (though that was excepted indefinitely with H264).

Parts regurgitated from a quick google for reference [1]

[1] http://www.theregister.co.uk/2014/10/03/hevc_patent_terms_th...

ksec · on Nov 5, 2016

It is actually more then 10x. The annual cap for H.264 royalty fees is 6.5M from MPEG-LA. For H.265 it is 25M from MPEG-LA, AND 50M from HEVC-Advance. That is a total of 75M. And like others have pointed out there are Technicolor patents fees not included.

So it depends how these royalty works in details. If only the chip manufacture are paying, Mediatek, Qualcomm, Samsung, Intel, AMD, Nvidia, Apple. That is at least 10 players paying maximum. And if you consider small players, the total contribution of Royalty fees to HEVC is 1 Billion / Year. ONE BILLION!! In the life time of a Video Codec that typically run at least a decade, these patents are 10 Billions.

Do you think that is a fair price, i think everyone should decide for their selves.

brigade · on Nov 4, 2016

HEVC got an additional licensing pool in HEVC Advance that demanded significantly greater license fees on top of MPEG LA's.

Said group's demands are basically the reason Netflix started considering VP9.

TD-Linux · on Nov 4, 2016

Two additional, as Technicolor later dropped out of HEVC Advance and is now licensing theirs individually: http://www.streamingmedia.com/Articles/Editorial/Featured-Ar...

iplaw · on Nov 4, 2016

Hilarious!

AndrewUnmuted · on Nov 4, 2016

No, it doesn't. Though that may have been the goal, HEVC has only thus far achieved an improvement of around 25%, not 50%.

keldaris · on Nov 4, 2016

In my anecdotal experience, h265 gets me 50-60% improvements in file sizes at the same quality for fairly low quality targets and the gains drop off rather quickly as you increase the quality. For videos where you don't care about the quality all that much, it's superb.

izacus · on Nov 4, 2016

It also uses the same techniques and principles.

kakarot · on Nov 4, 2016

Ya'll wanna get the most out of your H.264 animu rips? Check out Kawaii Codec Pack, it's based on MPC and completely changed my mind about frame interpolation. http://haruhichan.com/forum/showthread.php?7545-KCP-Kawaii-C...

voltagex_ · on Nov 6, 2016

a) offtopic

b) Leave codec packs in 2000 where they belong. They are a great malware vector and also good at messing with settings they shouldn't.

>KCP utilizes the following components: MPC-HC - A robust DirectShow media player. madVR - High quality gpu assisted video renderer. Included as an alternative to EVR-CP. xy-vsfilter / XySubFilter(future) - Superior subtitle renderer. LAV-Filters - A package with the fastest and most actively developed DirectShow Media Splitter and Decoders. (Optional) ReClock - Addresses the problem of audio judder by adapting media for smooth playback OR utilized for bit perfect audio.

I'm actually using MPC-HC and AC3Filter to deal with some files where I couldn't hear the centre channel on VLC (on stereo speakers). Everything else isn't really needed.

kakarot · on Nov 6, 2016

oh crap it's the topic police. I use it specifically for madVR and interpolating frames for high-quality low FPS anime. It looks really great. The best I've found for this particular purpose. Be nice.

Savageman · on Nov 4, 2016

I wonder if across a lot of videos, the frequency domain representations look similar and if instead of masking in a circle we could mask with other (pre-determined) shapes to keep more information (this would require decoders to know them, of course). Or maybe this article is too high-level and it's not possible to "shape" the frequencies.

LASR · on Nov 4, 2016

It's certainly possible to use any arbitrary shape. The way it really works is that there is a quantization matrix - which essentially is a configurable mask for your frequency domain signal.

Yes, I've dumbed it down in the article to a simple circle to illustrate the point.

nojvek · on Nov 4, 2016

This is a really well written article. Exactly why I love HN. Sometimes you get this nice technical intros into fields you thought were black magic.

rimbombante · on Nov 4, 2016

Articles like this are what makes HN great, and not all those repeated links to the visual studio 1.7.1.1.0.1.pre02-12323-beta3 changelog.

mtw · on Nov 4, 2016

Even better H.265 with 40-50% bit rate reduction compared with H.264, at the same visual quality!

olegkikin · on Nov 4, 2016

But much higher hardware requirements for both encoding and decoding. Encoding is like 8x slower too.

el0j · on Nov 4, 2016

The PNG size seems to be misrepresented. The actual PNG is 637273 bytes when I download it, and 597850 if I recompress it to make sure we're not getting fooled by a bad PNG writer.

So instead of the reported 916KiB we're looking at 584KiB.

This doesn't change the overall point, but details matter.

  $ wget https://sidbala.com/content/images/2016/11/FramePNG.png
  --2016-11-04 22:08:08--  https://sidbala.com/content/images/2016/11/FramePNG.png
  Resolving sidbala.com (sidbala.com)... 104.25.17.18, 104.25.16.18, 2400:cb00:2048:1::6819:1112, ...
  Connecting to sidbala.com (sidbala.com)|104.25.17.18|:443... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: unspecified [image/png]
  Saving to: ‘FramePNG.png’

  FramePNG.png                      [ <=>                                             ] 622.34K  --.-KB/s   in 0.05s

  2016-11-04 22:08:08 (12.1 MB/s) - ‘FramePNG.png’ saved [637273]

  $ pngout FramePNG.png
   In:  637273 bytes               FramePNG.png /c2 /f5
  Out:  597850 bytes               FramePNG.png /c2 /f5
  Chg:  -39423 bytes ( 93% of original)

nartsbtaa · on Nov 4, 2016

Why even compare PNG and H.264 to begin with? PNG is a lossless compression format. A better comparison would be something lossy like JPG, which could easily shrink the size to ~100 kB. The point still stands, but at least it's a more relevant comparison.

notlisted · on Nov 4, 2016

Well done. The only thing that could make this better is an interactive model/app for me to play around with. The frequency spectrum can probably be used while retouching images as well.

A video on youtube led me to Joofa Mac Photoshop FFT/Inverse FFT plugins [1] which was worth a try. I was unable to register it, as have others. Then I came across ImageJ [2], which is a really great tool (with FFT/IFFT).

Edit: if anyone checks out ImageJ, there's a bundled app called Fiji [3] that makes installation easier and has all the plugins.

If anyone has other apps/plugins to consider, please comment.

[1] http://www.djjoofa.com/download

[2] https://imagej.nih.gov/ij/download.html

[3] http://fiji.sc/

0x09 · on Nov 4, 2016

I published a set of utilities that I developed for playing and to help myself learn about frequency analysis here, you might find them interesting:

https://github.com/0x09/dspfun

i336_ · on Nov 4, 2016

I found this explanation of Xiph.org's Daala (2013) very interesting and enlightening in terms of understanding video encoding: https://xiph.org/daala/

For a runner-up lossless image format unencumbered by H265 patents (completely libre), try http://flif.info/.

afghanPower · on Nov 4, 2016

A real fun read. Had an assignment a couple of weeks ago where we used the most k most significant singular values of matrices (from picture of Marilyn M.) to compress the image. H.264 is on a whole other level, though ;)

bananicorn · on Nov 4, 2016

Now the Question is - Manson or Monroe - and which one would be easier to compress? ;)

optimuspaul · on Nov 4, 2016

I enjoyed this for the most part and even learned a little. But it started out very simple terms and really appealing to the common folk. But then about halfway through the tone changed completely and was a real turn off to me. It's silly but this "If you paid attention in your information theory class" was the spark for me. I didn't take any information theory classes, why would I have paid attention? I don't necessarily think it was condescending, but maybe, it's just that the consistency of the writing changed dramatically.

Anyway super interesting subject.

problems · on Nov 4, 2016

Really cool stuff, one thing though seems a little odd:

> Even at 2%, you don't notice the difference at this zoom level. 2%!

I'm not supposed to see that major streakiness? The 2% difference is extremely visible, even 11% leaves a noticably bad pattern on the keys (though I'd probably be okay with it in a moving video), only the 30% difference looks acceptable in a still image.

dirtbox · on Nov 4, 2016

I like this video explaining the difference between H.264 and H.265 https://www.youtube.com/watch?v=hRIesyNuxkg

Simplistic as it is, it touches on all the main differences. The only problem with H.265 is the higher requirements and time needed for encoding and decoding.

markatkinson · on Nov 4, 2016

Damn, lost me during the frequency part.

Koshkin · on Nov 4, 2016

Sometimes it's just easier to learn the math. (I am not kidding.)

ludwigvan · on Nov 4, 2016

What is the latest in video compression technology after H264 and H265?

The article discusses lossy compression in broad terms, but have we reaped all the low hanging fruit? Can we expect some sort of saturation just like we have with Moore's law where it gets harder and harder to optimize videos?

el0j · on Nov 5, 2016

If the author truly wants 'magic', how about we take a 64KiB demo that runs for 4 minutes. That's 64KiB containing 240 seconds of video, and your H.264 had to use 175 for only five seconds on video.

We can conclude that 64KiB demos are at least 48 times as magical as H.264.

vcool07 · on Nov 4, 2016

This was a good and interesting read. Is h.264 an open standard ?

shocks · on Nov 4, 2016

Doesn't look like it; https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC

> H.264 is protected by patents owned by various parties. A license covering most (but not all) patents essential to H.264 is administered by patent pool MPEG LA.[2] Commercial use of patented H.264 technologies requires the payment of royalties to MPEG LA and other patent owners. MPEG LA has allowed the free use of H.264 technologies for streaming internet video that is free to end users, and Cisco Systems pays royalties to MPEG LA on behalf of the users of binaries for its open source H.264 encoder.

0x09 · on Nov 4, 2016

It is an open standard. Anyone can purchase and implement it, and it was developed by ISO. The technologies are not royalty free in the US. Don't conflate the two. *

Edit: I emphasize this mainly because the terms have a specific meaning in standards jargon but also because it places the blame for software patent abuses on the wrong parties (the standards developers rather than the lawyers and legislators).

gcp · on Nov 4, 2016

blame for software patent abuses on the wrong parties (the standards developers)

Uh, anyone familiar with the MPEG process will assure you that the companies involved love (let me restate that: PREFER) to bring in technology on which they own the patents so they get a good cut of the resulting patent pool.

Sometimes this is even done even though it technically makes no sense. Best example: hybrid filter-bank in MP3.

The process also provides no protection or discouragement from patents from semi-involved industry partners appearing later on, etc.

This difference in approach is a stark contrast to the IETF, which is why Opus work, and future AV1 work are happening under the IETF rather than the MPEG groups.

vcool07 · on Nov 4, 2016

OK, so it is open, but not free ? Is it available for academic purposes free of cost ?

Khoth · on Nov 4, 2016

You have to pay royalties to actually use it, but if you just want to read the thing, you can get it for free from the ITU. https://www.itu.int/rec/T-REC-H.264-201602-S/en

userbinator · on Nov 5, 2016

As the grandparent comment says, it is free for non-commercial use.

Of course, this also only applies in countries which enforce software patents.

gcp · on Nov 4, 2016

No, not at all. There's even restrictions on distributing H264/AVC video files themselves.

neo2006 · on Nov 4, 2016

The comparison doe not make any sense, and no h264 is not magic!!: - The guy is comparing a lossless format PNG to H264 which is a lossy video format, that is not fair. - he is generating a 5 frame video and compared to 1 frame image, only the I-frame at the begining of the video matter in that case al the others are derived from it, P-Frame. - What is the point of having that comparaison we already have images format comparable to the size of a H264 I-frame and using the same science (entropy coding, frequency domain, intra frame MB derivation...)?

mcherm · on Nov 4, 2016

Did you read the article?

The point you are making here is PRECISELY the point that the author was making in the article: that a lossy format can be far, far smaller. He then goes into the details (from a high-level point of view) of what kinds of losses H264 incurs.

syastrov · on Nov 4, 2016

An enjoyable, short and to the point article with many examples and analogies. But my favorite part was this:

"Okay, but what the freq are freqX and freqY?"

umbs · on Nov 5, 2016

"1080p @ 60 Hz = 1920x1080x60x3 => ~370 MB/sec of raw data."

I apologize if this is trivial. What does 1920 in above equation represent?

boundlessdreamz · on Nov 5, 2016

1080p is 1920x1080 px

Btw question is trivial but don't feel apologetic about asking questions. None of us know everything and in a field we don't know, our questions will be trivial.

kayoone · on Nov 5, 2016

1920x1080 is the typical resolution of 1080p Full HD, is that you mean?

some1else · on Nov 4, 2016

Try scrubbing backwards. H264 seeking only works nice if you're fast-forwarding the video. Actually, that is kind of magical.

11thEarlOfMar · on Nov 4, 2016

Do H.264 and WebRTC have different use cases? Or do they compete directly?

eddieh · on Nov 4, 2016

Let's say you want to video chat with someone using only web browsers, you would establish a direct peer-to-peer connection with WebRTC and then you could stream H.264 video to each other. I'd say WebRTC and H.264 compliment each other. However, the shared stream or data need not be H.264.

imperialdrive · on Nov 4, 2016

Great Write-up, thank you for your time and effort!

molind · on Nov 4, 2016

Wow, now tell me how H.265 works!

xyproto · on Nov 5, 2016

Copyrighted and patented magic.

bjn · on Nov 5, 2016

Well written article.

andrey_utkin · on Nov 4, 2016

Too trivial, too general, too pompous. I'd downvote.

mohda786921 · on Nov 6, 2016

I need hacker

mohdz4939 · on Nov 6, 2016

I need hacker