Hacker News new | past | comments | ask | show | jobs | submit login
H.264 is Magic (sidbala.com)
1228 points by LASR on Nov 4, 2016 | hide | past | favorite | 219 comments



Absolutely love this:

'Suppose you have some strange coin - you've tossed it 10 times, and every time it lands on heads. How would you describe this information to someone? You wouldn't say HHHHHHHHH. You would just say "10 tosses, all heads" - bam! You've just compressed some data! Easy. I saved you hours of mindfuck lectures.'

This is a really great, simple way to explain what is otherwise a fairly complex concept to the average bear. Great work.


That's only one half of the problem - you now have an alphabet five times the size so you have actually increased the size of the message! You also need to explain how to encode this efficiently to explain compression.


Not really - the knowledge of what an alphabet can be universally agreed upon and doesn't need to be transmitted with the data. The metaphor here is that software and hardware-based decoding can now be much more powerful because the hardware is more powerful than it used to be.

And of course the truth is you would just transmit "H_10" with the universally agreed upon knowledge that "H" is "Heads" and "_" is number of times.


> you would just transmit "H_10" with the universally agreed upon knowledge that "H" is "Heads" and "_" is number of times.

Yes I get that the alphabet is already agreed upon.

But if I only transmit H or T (uncompressed) that's just one bit needed per symbol. So I can transmit HHHHHHHHH in ten bits. If I introduce this simplified compression to the system and add 0-9 to the alphabet, that now needs four bits per symbol, so the message H10 is 12 bits long (which is longer than uncompressed). And HTHTHTHTHT would be forty bits so if the message doesn't repeat simply it's now four times larger!

See what I mean? It's not successfully compressed anything.

The solution to this is easy and is Huffman coding, but it doesn't make sense to show it for a ten bit message as it won't work well, and in the trite explanation of compression of 'just the symbol and then the number of times it's repeated' this isn't mentioned, so it's only half the story and people will still be puzzled because they will see that your message contains MORE entropy after 'compression', not LESS!


You are entirely missing the point. His purpose isn't to give the reader a rigorous mathematical understanding. It is to convey a concept. It is an analogy, not a proof. And his analogy is perfectly good. Just do it: say "HHHHHHHHHH" and say "ten tosses, all heads" and get back to me which one transmits the info to another human in more compact form.

"To another human" is the key phrase, and sometimes I wonder if HN is populated with humans or androids. No offense intended to androids with feelings.


I think there's no need to be so pedantic. Replace 10 for 1000 and now the scheme "works".

Regarding

> The solution to this is easy and is Huffman

Well, not. As you said, for 10 bits doesn't matter; and in general it will depend on the input; sometimes run length encoding performs better than Huffman; and also there are cases were Huffman won't capture high order entropy. Also, for zero order entropy arithmetic encoder is superior than Huffman. Unless you care about decompression speed...

Which bring me back to the fact that there is not such a thing as"the solution" in data compression. But more importantly: it was just an example to show an idea; and actually a pretty good one (run length encoding)


But actually, no. Because you could set up HTTHTTTHHHHHHHHHH the format like this:

  01001000
  11001010
That's sixteen bits for 17 coinflips. With no continuous sequences longer than seven, this format takes up one extra bit every seven flips.

How does it work? The first bit is a sign bit. If it's zero, the next seven bits are raw coinflips, 0 for tails, 1 for heads. If it's one, the second bit signifies whether the next sequence consists of heads or tails (again, 0 for tails, 1 for heads), and the remaining six bits tell how long said sequence is.

This is a fairly simple encoding for the strategy described in the article, which I thought of off the top of my head in about five minutes, and I know nothing about compression. It's slightly broken (what if the sequence ends mid-byte?), but it does kind of work. Somebody who actually knows about compression could probably do this better.


I know that, the point is that this kind of stuff needs some thought, it's not so simple as "HTHTHTHT" = "HT five times". The article kind of glosses over that.


In the context of the analogy, it's probably better to read it as saving time to human-parse rather than the space required to send. (And it definitely takes less time to verbally state, even if the sentence is clearly longer; caching) The general idea is the same through; compression by describing patterns rather than explicitly stating the event


Well, yes. Many programmers have trouble mapping ideas into bits: imagine how hard it is for people aren't in programming?


you know it was just an example in terms of what you might tell a friend over the phone about a coin flip right? i suppose you'd send your friend a huffman tree and then say "1110"


This entire thread makes me think of that Silicon Valley scene[1].

[1] https://www.youtube.com/watch?v=-5jF5jtMM_4


wouldn't you just convert the alphabet to 0-1? Shouldn't the 0-1 compression be the most optimal? That is, you can't find a better deal with any alphabets or numerical?


That's how compression was first explained to me, and it really stuck with me ever since. It was in the context of an image though, and instead of heads, it was red pixels.

What's really cool is that the simple explanation can be extended to explain things like why ciphertext doesn't compress well: because ciphertext has no patterns


Not compression per se, but I remember when I was reverse engineering maps for Chip's Challenge. I would often see a tile (represented as a byte) that was encoded as 0xFF0502. I ended up realizing it meant "Repeat 'tile 2' 5 times." It was fun to figure that out as a kid.


RLE a given. It's true that the average person rarely understand that this is what computers call compression, but everything after that involves a bit of thinking. Optimal huffman.


I think the LZ family are all pretty intuitive --- just replace repeated sequences with a reference to where they occurred before. Even human languages have things like contractions, abbreviations, and acronyms. Thus it is, at least to me, somewhat surprising that LZ was only documented academically several decades after Huffman; perhaps it was thought to be too trivial? LZ can also be thought of as an extension of RLE.

In any case, an LZ decompressor fits in less than 100 bytes of machine instructions and a simple compressor can be implemented in not more than several hundred, all the while providing extremely high compression for its simplicity. It will easily outperform order-0 static or dynamic Huffman on practical files like English text, and would probably make a good assignment in an undergraduate-level beginning data structures/algorithms/programming course; yet it seems more popular to give an assignment on Huffman using trees, which is somewhat ironic since in the real world Huffman is implemented using bit operations and lookup tables, not actual tree data structures.

To give a trivial example, LZ will easily compress ABCDABCDABCDABCD while order-0 Huffman can't do much since each individual symbol has the same frequency.


Another LZ fan here.

My guess is that the "late" development of LZ is due to mainly two reasons:

i) At that moment the pattern matching algorithms were not so advanced. E.g. suffix tree was very recent, and in the next years lots of advances occurred in that area...

ii) Although LZ can appear easier or more intuitive than Huffman, I think it is much less intuitive to prove a good bound in the compression achieved by LZ. OTOH, Huffman is build in a way that shows that it achieves zeroth-order compression.


The DEFLATE[1] algorithm is actually fairly accessible, and will give a good idea of how compression works.

1: https://en.wikipedia.org/wiki/DEFLATE


I liked this explanation of DEFLATE here on HN a few months ago:

https://news.ycombinator.com/item?id=12334270


IMO Huffman is conceptually more complicated (not the implementation, but the logic) than arithmetic coding.

And Huffman isn't optimal unless you are lucky, unlike arithmetic coding.


I never learned AC. It's on my overflowing stack of thing to read about.


AC is conceptually stupidly simple. All you do is encode a string of symbols into a range of real numbers.

To start your range is [0, 1). For each symbol you want to encode you take your range and split it up according to your probabilities. E.g. if your symbols are 25% A, 50% B and 25% C, then you split up that range in [0, 0.25) for A, [0.25, 0.75) for B and [0.75, 1) for C.

Encoding multiple symbols is just applying this recursively. So to encode the two symbols Bx we split up [0.25, 0.75) proportionally just like we did [0, 1) before to encode x (where x is A, B or C).

As an example, A is the range [0, 0.25), and AC is the range [0.1875, 0.25).

Now to actually turn these ranges into a string of bits we choose the shortest binary representation that fits within the range. If we look at a decimal number:

    0.1875
We know that this means 1/10 + 8/100 + 7/1000 + 5/10000. A binary representation:

    0.0011
This means 0/2 + 0/4 + 1/8 + 1/16 = 0.1875. So we encode AC as 0011.

---

The beauty of arithmetic coding is that after encoding/decoding any symbol we can arbitrarily change how we split up the range, giving rise to adaptive coding. Arithmetic coding can perfectly represent any data that forms a discrete string of symbols, including changes to our knowledge of data as we decode.


Or on a more abstract level to compare to Huffman encoding: Huffman turns each symbol into a series of bits like "011". Arithmetic encoding lets you use fractional bits.

A Huffman tree for digits might assign 0-5 to 3 bits and 6-9 to 4 bits. Encoding three digits will use on average slightly more than 10 bits. Using AC will let you give the same amount of space to each possibility, so that encoding three digits always uses less than 10 bits.


Nice explanation. Can you explain how to remove ambiguity relating to string length?

"0" = 0.0b = 0 falls in the range [0,0.25) so it's a valid encoding for "A"; but isn't it also a valid encoding for "AA", "AAA", etc.?

AA = [0,0.25) * [0, 0.25) = [0, 0.125), and so on.

It seems that adding "A"s to a string in general doesn't change its encoding.


You either reserve a symbol for "end of stream" or externally store the length.

It's the equivalent to pretending a Huffman stream never ends and is padded with infinite 0s.


Huffman seems simpler to me, but I've implemented both at various times so that might colour my perspective.


AC implementation is actually quite tricky, but conceptually IMO it's much simpler and more elegant than Huffman.


I definitely wouldn't say "HHHHHHHHH," since I tossed it 10 times, not 9.

Saying "10 tosses, all heads" reduces the chance of omitting a toss in data entry, which is all to the better.


You're making the assumption the other party knows English, rather than say the abstraction of 'coinflip' which in itself can be abstracted. Do they understand the concept of fairness - is it even odds or not? There's a reason that numbers are considered a more universal 'language' than other forms of communication.


You likely enjoy this read too:

http://antirez.com/news/75

On the HyperLogLog algorithm to count things.


I would probably say I found a two headed coin. I also like how "HHHHHHHHHH" is shorter than "I tossed it 10 times, all heads"


The all-heads sequence is exactly as likely as any other sequence.


right, but you can also say there are many sequences that aren't all heads.


Which is shortest, you can recover the original dataset from any of the following:

- heads, heads, heads, tails, tails, heads

- hhhtth

- h3t2h

- 3b1


FAIL. "10 tosses, all heads" is 20 characters while "HHHHHHHHHH" is 10 characters. You've conducted expansion rather than compression.


That example is about conveying the information verbally to another human, so syllables is interesting, not characters. "h h h h h h h h h h" is 10 syllables, "10 tosses, all heads" is 4 (and could be compressed further to "10 heads").


"10H" is the string you should be comparing it to (or something equivalent.)


If we're going there, H10 may be more efficient.... h10t5h1... Thinking the heads vs tails expression would be before the iteration as a better use case.


Depends which way you're iterating.


The lossy transform is important, but I think what's actually most important in video compression is getting rid of redundancy --- H.264 actually has a lossless mode in which that transform is not used, and it still compresses rather well (especially for noiseless scenes like a screencast.) You can see the difference if you compare with something like MJPEG which is essentially every frame independently encoded as a JPEG.

The key idea is to encode differences; even in an I-frame, macroblocks can be encoded as differences from previous macroblocks, and with various filterings applied: https://www.vcodex.com/h264avc-intra-precition/ This reduces the spatial redundancies within a frame, and motion compensation reduces the temporaral redundancies between frames.

You can sometimes see this when seeking through video that doesn't contain many I-frames, as all the decoder can do is try to decode and apply differences to the last full frame; if that isn't the actual preceding frame, you will see the blocks move around and change in odd ways to create sometimes rather amusing effects, until it reaches the next I-frame. The first example I found on the Internet shows this clearly, likely resulting from jumping immediately into the middle of a file: http://i.imgur.com/G4tbmTo.png That frame contains only the differences from the previous one.

As someone who has written a JPEG decoder just for fun and learning purposes, I'm probably going to try a video decoder next; although I think starting from something simpler like H.261 and working upwards from there would be much easier than starting immediately with H.264. The principles are not all that different, but the number of modes/configurations the newer standards have --- essentially for the purpose of eliminating more redundancies from the output --- can be overwhelming. H.261 only supports two frame sizes, no B-frames, and no intra-prediction. It's certainly a fascinating area to explore if you're interested in video and compression in general.


> MJPEG which is essentially every frame independently encoded as a JPEG.

"essentially" makes it sound like it isn't precisely true. MJPEG is literally just a stream of JPEG images. The framing of the stream varies a bit, but many implementations are just literal JPEG images bundled one after the other into a MIME "multipart/x-mixed-replace" message.


This is really interesting and the imgur picture you linked (with your explanation) explains it really clearly!

But when seeking, why wouldn't any local media playback seek backwards and reconstruct the full frame? It's not like the partial frame after seeking is useful - I'd rather wait 2 seconds while it scrambles (i mean "hurries up") to show me a proper seek, wouldn't everyone?

What was your Internet search for finding that imgur frame? What is this effect called?


>why wouldn't any local media playback seek backwards and reconstruct the full frame?

Most codecs/players do. VLC used to be criticized for being different in that regard. One possible advantage is istantaneous seeking, as there's no need to decode all the needed frames (which could amount to several seconds of video) between the nearest I-frames[1] (the complete reference pictures) and the desired one.

[1]: plural, because prediction can also be bidirectional in time

The use of incomplete video frame data for artistic purposes is called "datamoshing".


I try to use VLC when I can because it offers intuitive playlist support, but for high-resolution H.264 and friends I usually have to switch to Media Player Classic.

VLC is willing to let my entire screen look like a blob of grey alien shit for 10 seconds instead of just taking a moment to reconstruct frames.

And its hardware acceleration for newer codecs is balls. Sucks because otherwise, it's right up there with f2k for me.


I stopped using VLC when I found mpv [0]. I really like it because it exposes everything from the CLI, so once you're familiarized with the flags you're interested in using, it's easy to play anything. For everyday usage it "just works" too, as expected of any video player.

[0] https://mpv.io/


How does it compare to mplayer? My biggest complaint about mplayer is it still doesn't play VFR videos well.


I've tried it.

* Sane defaults (encodings and fonts, scaletempo for audio)

* instantaneous play of next and previous videos

* navigation in random playlist actually works

* Easy always on top key binding

* Most mplayer key bindings work

I'll definitely keep on trying it for a while.


Does it include all the codecs by default? I think this was a major reason VLC succeeded the way it did. With all other players (BPlayer anyone?) you needed to find and install tons of codecs while in VLC it just worked.


It has played everything I've thrown at it so far...


I'll check it out and let you know what I think. Thanks~


man... that's a big manual :)

I think I can find some use for this in certain situations. Still lacks a good playlist building schema.


>VLC is willing to let my entire screen look like a blob of grey alien shit for 10 seconds instead of just taking a moment to reconstruct frames.

Yes, this is what I was talking about, and yes, specifically for VLC. Plus it's not like playback is so taxing that all cores are pegged at 100% during playback. When I seek, VLC should get off its ass and scramble to come up with the correct full frame then. I'll wait.


I recently bought a camera that has 4k video recording. VLC just gives up playing the video. Even Windows Media Player can handle it. No idea what's going on, but I was really surprised and disappointed with VLC.


See if you can cut a small segment and submit it as a sample to ffmpeg. Hell, see if ffprobe and ffmpeg can play it. Happy to help, if you've got enough upstream bandwidth.


Sure. I'll give those a try tonight. I assume if it works in `ffplay` directly, there's no need to submit it?


Isn't the other advantage that VLC can play incomplete movie files? Any other players I have tried 'crash' on incomplete torrents, when VLC just fails until it finds the next I frame.


"datamoshing" is a term I've heard for people deliberately removing I-frames, so P-frames are applied to the wrong base image.


In a course I taught (2010) on music visualizations that's the term I used.

The example I used in the lecture where datamoshing came up was the music video for Charlift's "Evident Utensil"[1]; I always thought this was a neat example.

[1]: https://www.youtube.com/watch?v=mvqakws0CeU


Two more examples of datamoshing in music videos:

Kanye West's "Welcome to Heartbreak" (https://www.youtube.com/watch?v=wMH0e8kIZtE)

A$AP Mob's "Yamborghini High" (https://www.youtube.com/watch?v=tt7gP_IW-1w)


I thought I'll learn something special about H.264, but all information here is high level and generic.

For example if you replace H.264 with a much older technology like mpeg-1 (from 1993) every sentence stays correct, except this:

"It is the result of 30+ years of work" :)


I was a bit disappointed in this article for the same reason: this is a great primer for people new to MPEG video compression, but it doesn't have anything to do with H.264.

I was hoping the author would write about H.264 specifically, for instance, how it was basically the "dumping ground" of all the little tweaks and improvements that were pulled out of MPEG-4 for one reason or another (usually because they were too computationally expensive), and why, as a result, it has thousands of different combinations of features that are extremely complicated to support, which is why it had to be grouped into "profiles" (e.g., Baseline, Main, High): http://blog.mediacoderhq.com/h264-profiles-and-levels/

I was also hoping that he would at least touch on the features that make H.264 unique from previous MPEG standards, like in-loop deblocking, CABAC Entropy Coding, etc..

Again, it's fine as an introduction to video encoding, but there's nothing in here specific to H.264.


Sure, but also keep in mind that the technology hasn't changed much over time. Even HEVC, which causes extreme gains in compression on high-res video with minimal loss in quality, is still mostly the same algorithm as H.264 but with larger blocks, slightly more flexible coding units rather than frame-wide interpolation changes, and 35 rather than 9 directions recognized for predictions.


Also the fleeting mention of b-frames, which mpeg-1 doesn't have. And I believe mpeg-1 doesn't use 16×16 macroblocks.

Still, it's a good overview of generic video compression.



"This post will give insight into some of the details at a high level - I hope to not bore you too much with the intricacies."

Did you miss the third paragraph?

As someone who knew nothing about it before, I found it lived up to it's goal.


I think they just mean it shouldn't be "H.264 is magic", it should just be "video compression is magic" or some such. That irked me a little bit too.


Nice article! The motion compensation bit could be improved, though:

> The only thing moving really is the ball. What if you could just have one static image of everything on the background, and then one moving image of just the ball. Wouldn't that save a lot of space? You see where I am going with this? Get it? See where I am going? Motion estimation?

Reusing the background isn't motion compensation -- you get that by encoding the differences between frames so unchanging parts are encoded very efficiently.

Motion compensation is when you have the camera follow the ball and the background moves. Rather than encoding the difference between frames itself, you figure out that most of the frame moved and you encode the different from one frame to a shifted version of the blocks from a previous frame.

Motion compensation won't work particularly well for a tennis ball because it's spinning rapidly (so the ball looks distinctly different in consecutive frames) but more importantly because the ball occupies a tiny fraction of the total space so it doesn't help that much.

Motion compensation should work much better for things like moving cars and moving people.


Your example seems to assume translation only. I wonder how difficult/useful it would be to identify other kinds of time-varying characteristics (translation, rotation, scale, hue, saturation, brightness, etc) of partial scene elements in an automated way.

Along the same lines, it would be interesting to figure out an automated time-varying-feature detection algorithm to determine which kinds of transforms are the right ones to encode.

Do video encoders already do something like this? It seems like a pretty difficult problem since there are so many permutations of applicable transformations.


I wonder how difficult/useful it would be to identify other kinds of time-varying characteristics (translation, rotation, scale, hue, saturation, brightness, etc) of partial scene elements in an automated way.

That's how Framefree worked. It segments the image into layers, computes a full morph, including movement of the boundary, between successive frames for each layer, and transmits the before and after for each morph. Any number of frames can be interpolated between keyframes, which allows for infinite slow motion without jerk.[1] You can also upgrade existing content to higher frame rates.

This was developed back in 2006 by the Kerner Optical spinoff of Lucasfilm.[2] It didn't catch on, partly because decompression and playback requires a reasonably good GPU, and partly because Kerner Optical went bust. The segment-into-layers technology was repurposed for making 3D movies out of 2D movies, and the compression product was dropped. There was a Windows application and a browser plug-in. The marketing was misdirected - somehow, it was targeted to digital signs with limited memory, a tiny niche.

It's an idea worth revisiting. Segmentation algorithms have improved since 2006. Everything down to midrange phones now has a GPU capable of warping a texture. And it provides a way to drive a 120FPS display from 24/30 FPS content.

[1] http://creativepro.com/framefree-technologies-launches-world... [2] https://web.archive.org/web/20081216024454/http://www.framef...


John do you know where all the patents on Framefree ended up?


Ask Tom Randoph, who was CEO of FrameFree. He's now at Quicksilver Scientific in Denver.


Some venture IP company in Tokyo called "Monolith Co." also had rights in the technology.[1] "As of today (Sept. 5, 2007), the company has achieved a compression rate equivalent to that of H.264 and intends to further improve the compression rate and technology, Monolith said."[2] (This is not Monolith Studios, a game development company in Osaka.) Monolith appears to be defunct.

The parties involved with Framefree were involved in fraud litigation around 2010.[3] The case record shows various business units in the Cayman Islands and the Isle of Jersey, along with Monolith in Japan and Framefree in Delaware. No idea what the issues were. It looks like the aftermath of failed business deals.

The inventors listed on the patents are Nobuo Akiyoshi and Kozo Akiyoshi.[4]

[1] https://www.youtube.com/watch?v=VBfss0AaNaU [2] http://techon.nikkeibp.co.jp/english/NEWS_EN/20070907/138905... [3] http://www.plainsite.org/dockets/x8gi572m/superior-court-of-... [4] http://patents.justia.com/inventor/nobuo-akiyoshi


Great dectective work. I suspect the IP is now a total mess - with luck nobody has been paying the patent renewal fees and everything is now free.


Most codecs split the image into prediction blocks (for example, 16x16 for MPEG-2, or from 4x4 to 64x64 for VP9). Each of these blocks has its own motion vector. All of the transformations you mentioned look like a translation if you look at them locally, so they can all be fairly well represented by this. Codecs have, in the past, attempted global motion compensation, which tries to fully model a camera (rotating, translating, lens distortion, zooming) but all of those extra parameters are very difficult to search for.

Daala and AV1's PVQ is an example of a predictor for contrast and brightness (in a very broad sense).


Yes, H.264 has brightness/fade compensation for past frames. It's called "weighted prediction".

The previous codec MPEG4 part 2 ASP (aka DivX&XviD) had "global motion compensation" which could encode scales and rotation, but like most things in that codec it was broken in practice. Most very clever ideas in compression either take too many bits to describe or can't be done in hardware.


It seems like a pretty difficult problem since there are so many permutations of applicable transformations.

That's part of why video encoding can be very slow --- with motion compensation, to produce the best results the encoder should search through all the possible motion vectors and pick the one that gives the best match. To speed things up, at a slight cost in compression ratio, not all of them are searched, and there are heuristics on choosing a close-to-optimal one instead: https://en.wikipedia.org/wiki/Block-matching_algorithm


Now I'm out of my depth, but I think motion compensation does okay at rotation and scaling. The motion vector varies throughout the frame, and I think codecs interpolate it, so all kinds of warping can be represented.


As evidence of this, sometimes when an I-frame is dropped from a stream or you jump around in a stream you can see the texture of what was previously on the screen wrapped convincingly around the 3D surface of what's now supposed to be on the screen, all accomplished with 2D motion vectors.


Related, how h265 works: http://forum.doom9.org/showthread.php?t=167081

This is a great overview and the techniques are similar to those of h264.

I found it invaluable to get up to speed when I had to do some work on the screen content coding extensions of hevc in Argon Streams. They are a set of bit streams to verify hevc and vp9, take a look, it is a very innovative technique:

http://www.argondesign.com/products/argon-streams-hevc/ http://www.argondesign.com/products/argon-streams-vp9/


Heh, happy to see doom9 still alive and kicking. They were the n°1 resource in the early days of mainstream video compression.


It's not really alive and kicking. The forum is still active but the rest of the site hasn't been touched since 2008.


I love how you can edit photos from people to correct some skin imperfections without loosing the touch that the image is real (and not that blurred, plastic look) when you decompose it in wavelets and just edit some frequencies.

Don't know in photoshop, but in Gimp there's a plugin called "wavelet decomposer" that does that.


I guess this is the plugin you are talking about? Interesting.

http://registry.gimp.org/node/11742


Exactly that.

There was a question about retouching photos some while ago (http://photo.stackexchange.com/questions/48999/how-do-i-take...) that using wavelets was a good use of it.


Well, that's the most awesome thing I've seen in a long time! Thanks for sharing.


I recently experienced this as follows: https://www.sublimetext.com has an animation which is drawn via JavaScript. In essence, it loads a huge .png [1] that contains all the image parts that change during the animation, then uses <canvas> to draw them.

I wanted to recreate this for the home page of my file manager [2]. The best I could come up with was [3]. This PNG is 900KB in size. The H.264 .mp4 I now have on the home page is only 200 KB in size (though admittedly in worse quality).

It's tough to beat a technology that has seen so much optimization!

1: http://www.sublimetext.com/anim/rename2_packed.png

2: https://fman.io

3: https://www.dropbox.com/s/89inzvt161uo1m8/out.png?dl=0


You could give FLIF [1] a try. With the help of Poly-FLIF [2] you can render it in the browser. Don't forget to try the lossy mode, it gives better compression with negligible loss in quality.

1: http://flif.info

2: https://github.com/UprootLabs/poly-flif/


> Chroma Subsampling.

Sadly, this is what makes video encoders designed for photographic content unsuitable for transferring text or computer graphics. Fine edges, especially red-black contrasts start to color-bleed due to subsampling.

While a 4:4:4 profile exists a lot of codecs either don't implement it or the software using them does not expose that option. This is especially bad when used for screencasting.

Another issue is banding, since h.264's main and high profiles only use 8bit precision, including for internal processing, and the rounding errors accumulate, resulting in banding artifacts in shallow gradients. High10 profile solves this, but again, support is lacking.


This is also the bane of powerpoint presentations. Many TVs only support 4:2:0, so red on black text quickly becomes an smudgey mess.


It's easy to make a 4:2:0 upscaler that doesn't color bleed. Everyone just uses nearest-neighbor, which sucks, and then blames the other guy.


How would you make a 4:2:0 upscaler that doesn't color bleed?


50% solution: bicubic or bilinear. 90% solution: EEDI3. (kinda slow) 99% solution: use the full resolution Y plane for edge-direction.


I don't think that can accurately restore the details that have been created by subpixel-AA font rendering.

But if you have source/subsampled/interpolated comparisons that show 99% identical results i would be interested to see them.

Of course all that is useless if you don't have control over the output device. Just having the ability to record 4:4:4 makes the issue go away as long as the target can display it, no matter what interpolation they use.


By the way, this is an incredible example of scientific writing done well. It's very tangible jelly-like feeling that the author clearly has for the topic, conveyed well to the readers. This whole thread is people excited about a video codec!


Thank you! It means a lot to me. Yes, I try to convey my sense of excitement about technology to other people.


> This whole thread is people excited about a video codec

That's not really a weird thing on HN though. Video codecs are exactly the kind of thing that we get excited about.


"See how the compressed one does not show the holes in the speaker grills in the MacBook Pro? If you don't zoom in, you would even notice the difference. "

Ehm, what?! The image on the right looks really bad and the missing holes was the first thing I noticed. No zooming needed.

And that's exactly my problem with the majority of online video (iTunes store, Netflix, HBO etc). Even when it's called "HD", there are compression artefacts and gradient banding everywhere.

I understand there must be compromises due to bandwidth, but I don't agree on how much that compromise currently is.


Of course while reading the article you are going to be very conscious of detail and image quality because that is the subject matter of the post.

However if that MacBook Pro image was placed on the side of an article where the primary content was the text you were reading, you'd glance at the image and your brain would fill in the details for you. You probably wouldn't notice the difference in that context.

For most use cases, there likely is very little functional difference between the two images. At least, that was how I understood it.


I find the 480p setting on my plex server at home actually looks better than most of the 1080p HD streams on the internet.

Although to be fair, I suspect that a lot of times what I'm looking at are mpeg videos that have been recompressed a half dozen or more times with different encoders. Each encoder having prioritized different metrics. So, the the quality gets worse until it doesn't really matter how good the compression algorithm is. Each new re-compression is basically spending 3/4 of its bits maintaining the compression artifacts from the previous two passes.


>No zooming needed

Isn't the images above the text a zoomeed version?

>Here is a close-up of the original...


I took it to mean that we had to zoom to see that the holes were gone in the compressed version.


It indeed is, as long as we accept the first screenshot as the normal scale.


The first thing I noticed was the ringing, which is an artifact of low-pass filtering so it's a nice opportunity to go into problems with that kind of filtering. Other than that I think it was an ok teaser that gives an idea of how compression is done and what the trade-offs are.


Yep, me too - more like, if I was blind, I wouldn't notice the difference. Which is why the bitrate is always the first thing I look at when sourcing video.


But given other settings with even h.264 vs. h.265 and the source content, that isn't always a valid metric either.

I mean for fast action scenes, I rarely notice the difference between 720p and 1080p at 10ft away... but different encoding and sources, not just size alone can make significant differences.


There's another false claim like that a bit below. I can only assume that the author is close to legally blind, or uses a VGA-res display to watch the page.


Anyone who likes this would probably also enjoy the Daala technology demos at https://xiph.org/daala/ for a little taste of some newer, and more experimental, techniques in video compression.


Note that Daala has been discontinued in favor of AV1: https://en.wikipedia.org/wiki/AOMedia_Video_1

Previously Daala was presented as a candidate for NETVC but apparently this didn't go anywhere? https://en.wikipedia.org/wiki/NETVC


A lot of Daala tools are now being copied into AV1, the largest being PVQ: https://aomedia-review.googlesource.com/#/c/3220/


The demos are still neat, and some of the ideas are being used in AV1.


Daala continues to be research platform for new ideas. New techniques are a lot easier to prototype in Daala than in more mature code bases.


I'm not sure if there's been an official announcement, but I had assumed that AV1 was going to be adopted/ratified as NETVC so that it's got a standards body rubber stamp, as well as the practical adoption/support from browser vendors, GPU manufacturers, streaming sites etc.



Very well explained. But I could have understood it all without the bro-approach to the reader. You see where I am going with this? Get it? See where I am going? Ok!


Maybe I'm in the minority here but I think it adds a bit of color to an otherwise dry topic to write about.


I remember loving this style when I was a novice, e.g. Beej's networking tutorial. Not a big fan anymore, either, but certainly valuable for (part of) the target audience, I think.


Today you need tricks like that to get Twitter-generation people to read even half of that amount of text.


The part about entropy encoding only seems explain run-length encoding (RLE). Isn't the interesting aspect of making use of entropy in compression rather to represent rarer events with longer longer code strings?

The fair coin flip is also an example of a process that cannot be compressed well at all because (1) the probably of the same event happening in a row is not as high as for unfair coins (RLE is minimally effective) and (2) the uniform distribution has maximal entropy, so there is no advantage in using different code lengths to represent the events. (Since the process has a binary outcome, there is also nothing to gain in terms of code lengths for unfair coins.)


[deleted]


Thank you! I hope you now share my feeling of awe over this technical wonder.


Excellent! :)

It would be really cool to further extend it showing actually how the various tiles are encoded and between frames, something along the lines of: http://jvns.ca/blog/2013/10/24/day-16-gzip-plus-poetry-equal...

H.265 can even do deltas between blocks in the same frame, IIRC, and is excellent for still image compression too.


H.265 is my next post :)


So if H.264 is magic, what is H.265 then? :)


Excuse me, i would like my account back.


Can someone explain how the frequency domain stuff works? I've never really understood that, and the article just waves it away with saying it's like converting from binary to hex.


It's a bad analogy. Binary and hex are just different formats for representing the same number. Spatial domain and frequency domain are different views of a complex data set. In the spatial domain, you are looking at the intensity of different points of the image. In the frequency domain, you are looking at the frequencies of intensity changes in patterns in the image.

A good way to develop an intuition for the fourier space is to look at simple images and their DFT transforms: http://web.cs.wpi.edu/~emmanuel/courses/cs545/S14/slides/lec... (3/4 of the way through the slide deck).

This analysis of a "bell pepper" image and its transform is also helpful: https://books.google.com/books?id=6TOUgytafmQC&pg=PA116&lpg=....

As for why you want to do this: throwing away bits in the spatial domain eliminates distinctions between similar intensities, making things look blocky. In the frequency domain, however, you can throw away high-frequency information, which tends to soften patterns like the speaker grills in the MBP image that the human eye isn't that sensitive to to begin with.


> Spatial domain and frequency domain are different views of a complex data set.

Or in this case, a real data set.


The search keyword to learn about it is Fourier Transform: https://en.wikipedia.org/wiki/Fourier_transform

Along with the Wikipedia article and the obvious Internet search, there's a lot of good stuff that has been on HN: https://hn.algolia.com/?query=fourier%20transform&sort=byPop...


Basically we can represent any signal as an infinite sum of sinusoids. If you know about Taylor expansion of a function, then you know that the first order term is the most important, then the second and so on. Same principle with the sinusoids. So if we remove the sinusoids with very high frequency we remove the terms with least information.


Image and video codecs don't actually use the fourier transform as presented in the article, they use the DCT. Check out the example section on Wikipedia: https://en.wikipedia.org/wiki/Discrete_cosine_transform

The JPEG article also has a very good, step by step example of the DCT, followed by quantization and entropy coding: https://en.wikipedia.org/wiki/JPEG


In the most basic terms, not even talking about frequency, the mechanics of this is that one series of numbers (pixel values, audio samples, etc.) is replaced, according to some recipe or another, with a different series of numbers from which the original can be recovered (using a similar "inverse" recipe). The benefit of doing this comes from the discovery that this new series has more redundancy in it and can be compressed more efficiently than the original, and even if some of the data are thrown away at this point, the purpose of which is to make compression even more effective, the original can still be recovered with high fidelity.


It's the Fourier transform basically. There were even some past links on HN that explained it nicely so you might check those first

(Though for images it's in 2D, not 1D which is more commonly done)


> discard information which will contain the information with high frequency components. Now if you convert back to your regular x-y coordinates, you'll find that the resulting image looks similar to the original but has lost some of the fine details.

I would expect also the edges in the image to become more blurred, as edges correspond to high-frequency content. However, this only seems to be slightly the case in the example images.


This is probably because in the sample image you have clean vertical edges. It's pretty easy to represent these edges with a waveform.


You can see exactly that with the speaker grill and the text (This type of transformation is notoriously bad at compressing images of text, and is why you shouldn't use jpg for pictures of text)

In this context, the edges of, say, the macbook are not "high frequency" content, since they only feature one change (low to high luminosity) in a given block rather than several (high-low-high-low-high) like for the grill.


You should have a look at the Fourier transform of a step-function. It has high frequency components.


You're right! The images that I am using are zoomed cropped sections of a much larger image of the entire Apple home page.


What are directions for the future? Could neural networks become practically useful for video compression? [1]

[1] http://cs.stanford.edu/people/eroberts/courses/soco/projects...


Suppose I have a table of 8-digit numbers that I need to add and subtract for various reasons. Do I A: have a child, train them how to read numbers, add, and subtract, and then have the child do it or B: use a calculator purpose built to add and subtract numbers?

Neural nets are always expensive to train. You'd better be getting something from them that you can't get some other way.


Yes, you don't need the machinery of learning when you already have an algorithm you're happy with. Adding a table of numbers, I don't think anyone hopes to do much better than we already do with our circuits and computer architectures.

With video compression, I think most would agree that there might be better architectures/algorithms that we haven't stumbled upon yet. Whether specifically "neural networks" will be the shape of a better architecture, I don't know. But almost surely some meta-algorithm that can try out tons of different parameters/data-pipeline-topologies for something that vaguely resembles h.264 might find something better than h.264.

Neural nets are expensive to train. But so is designing h.264.


Google has been experimenting with image compression using AI for a while now: https://research.googleblog.com/2016/09/image-compression-wi...


Perhaps for intra-prediction. I wouldn't hold my breathe.


H.265 gets you twice the resolution for the same bandwidth, or the same resolution for half the bandwidth.


H.265 gets you half the file size for ten times more in royalty fees, or saving 50% of bandwidth for 1000% more in royalty.


Do you have a reference for that?

I was under the impression that the first 100,000 units are free, and then 20c per unit afterwards to a max of $25m.

H264 drops to 10c per unit after 5m units, to a max of $6.5m.

You need to be shipping 125 million units annually to hit the full $25m.

Yes it's more, but it's not quite ten times. And notably if the chip maker pays the royalties, then the content creators don't need to (though that was excepted indefinitely with H264).

Parts regurgitated from a quick google for reference [1]

[1] http://www.theregister.co.uk/2014/10/03/hevc_patent_terms_th...


It is actually more then 10x. The annual cap for H.264 royalty fees is 6.5M from MPEG-LA. For H.265 it is 25M from MPEG-LA, AND 50M from HEVC-Advance. That is a total of 75M. And like others have pointed out there are Technicolor patents fees not included.

So it depends how these royalty works in details. If only the chip manufacture are paying, Mediatek, Qualcomm, Samsung, Intel, AMD, Nvidia, Apple. That is at least 10 players paying maximum. And if you consider small players, the total contribution of Royalty fees to HEVC is 1 Billion / Year. ONE BILLION!! In the life time of a Video Codec that typically run at least a decade, these patents are 10 Billions.

Do you think that is a fair price, i think everyone should decide for their selves.


HEVC got an additional licensing pool in HEVC Advance that demanded significantly greater license fees on top of MPEG LA's.

Said group's demands are basically the reason Netflix started considering VP9.


Two additional, as Technicolor later dropped out of HEVC Advance and is now licensing theirs individually: http://www.streamingmedia.com/Articles/Editorial/Featured-Ar...


Hilarious!


No, it doesn't. Though that may have been the goal, HEVC has only thus far achieved an improvement of around 25%, not 50%.


In my anecdotal experience, h265 gets me 50-60% improvements in file sizes at the same quality for fairly low quality targets and the gains drop off rather quickly as you increase the quality. For videos where you don't care about the quality all that much, it's superb.


It also uses the same techniques and principles.


Ya'll wanna get the most out of your H.264 animu rips? Check out Kawaii Codec Pack, it's based on MPC and completely changed my mind about frame interpolation. http://haruhichan.com/forum/showthread.php?7545-KCP-Kawaii-C...


a) offtopic

b) Leave codec packs in 2000 where they belong. They are a great malware vector and also good at messing with settings they shouldn't.

>KCP utilizes the following components: MPC-HC - A robust DirectShow media player. madVR - High quality gpu assisted video renderer. Included as an alternative to EVR-CP. xy-vsfilter / XySubFilter(future) - Superior subtitle renderer. LAV-Filters - A package with the fastest and most actively developed DirectShow Media Splitter and Decoders. (Optional) ReClock - Addresses the problem of audio judder by adapting media for smooth playback OR utilized for bit perfect audio.

I'm actually using MPC-HC and AC3Filter to deal with some files where I couldn't hear the centre channel on VLC (on stereo speakers). Everything else isn't really needed.


oh crap it's the topic police. I use it specifically for madVR and interpolating frames for high-quality low FPS anime. It looks really great. The best I've found for this particular purpose. Be nice.


I wonder if across a lot of videos, the frequency domain representations look similar and if instead of masking in a circle we could mask with other (pre-determined) shapes to keep more information (this would require decoders to know them, of course). Or maybe this article is too high-level and it's not possible to "shape" the frequencies.


It's certainly possible to use any arbitrary shape. The way it really works is that there is a quantization matrix - which essentially is a configurable mask for your frequency domain signal.

Yes, I've dumbed it down in the article to a simple circle to illustrate the point.


This is a really well written article. Exactly why I love HN. Sometimes you get this nice technical intros into fields you thought were black magic.


Articles like this are what makes HN great, and not all those repeated links to the visual studio 1.7.1.1.0.1.pre02-12323-beta3 changelog.


Even better H.265 with 40-50% bit rate reduction compared with H.264, at the same visual quality!


But much higher hardware requirements for both encoding and decoding. Encoding is like 8x slower too.


The PNG size seems to be misrepresented. The actual PNG is 637273 bytes when I download it, and 597850 if I recompress it to make sure we're not getting fooled by a bad PNG writer.

So instead of the reported 916KiB we're looking at 584KiB.

This doesn't change the overall point, but details matter.

  $ wget https://sidbala.com/content/images/2016/11/FramePNG.png
  --2016-11-04 22:08:08--  https://sidbala.com/content/images/2016/11/FramePNG.png
  Resolving sidbala.com (sidbala.com)... 104.25.17.18, 104.25.16.18, 2400:cb00:2048:1::6819:1112, ...
  Connecting to sidbala.com (sidbala.com)|104.25.17.18|:443... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: unspecified [image/png]
  Saving to: ‘FramePNG.png’

  FramePNG.png                      [ <=>                                             ] 622.34K  --.-KB/s   in 0.05s

  2016-11-04 22:08:08 (12.1 MB/s) - ‘FramePNG.png’ saved [637273]

  $ pngout FramePNG.png
   In:  637273 bytes               FramePNG.png /c2 /f5
  Out:  597850 bytes               FramePNG.png /c2 /f5
  Chg:  -39423 bytes ( 93% of original)


Why even compare PNG and H.264 to begin with? PNG is a lossless compression format. A better comparison would be something lossy like JPG, which could easily shrink the size to ~100 kB. The point still stands, but at least it's a more relevant comparison.


Well done. The only thing that could make this better is an interactive model/app for me to play around with. The frequency spectrum can probably be used while retouching images as well.

A video on youtube led me to Joofa Mac Photoshop FFT/Inverse FFT plugins [1] which was worth a try. I was unable to register it, as have others. Then I came across ImageJ [2], which is a really great tool (with FFT/IFFT).

Edit: if anyone checks out ImageJ, there's a bundled app called Fiji [3] that makes installation easier and has all the plugins.

If anyone has other apps/plugins to consider, please comment.

[1] http://www.djjoofa.com/download

[2] https://imagej.nih.gov/ij/download.html

[3] http://fiji.sc/


I published a set of utilities that I developed for playing and to help myself learn about frequency analysis here, you might find them interesting:

https://github.com/0x09/dspfun


I found this explanation of Xiph.org's Daala (2013) very interesting and enlightening in terms of understanding video encoding: https://xiph.org/daala/

Related:

BPG is an open source lossless format for images that uses HEVC under the hood, and is generally better than PNG across the board: http://bellard.org/bpg/

For a runner-up lossless image format unencumbered by H265 patents (completely libre), try http://flif.info/.


A real fun read. Had an assignment a couple of weeks ago where we used the most k most significant singular values of matrices (from picture of Marilyn M.) to compress the image. H.264 is on a whole other level, though ;)


Now the Question is - Manson or Monroe - and which one would be easier to compress? ;)


I enjoyed this for the most part and even learned a little. But it started out very simple terms and really appealing to the common folk. But then about halfway through the tone changed completely and was a real turn off to me. It's silly but this "If you paid attention in your information theory class" was the spark for me. I didn't take any information theory classes, why would I have paid attention? I don't necessarily think it was condescending, but maybe, it's just that the consistency of the writing changed dramatically.

Anyway super interesting subject.


Really cool stuff, one thing though seems a little odd:

> Even at 2%, you don't notice the difference at this zoom level. 2%!

I'm not supposed to see that major streakiness? The 2% difference is extremely visible, even 11% leaves a noticably bad pattern on the keys (though I'd probably be okay with it in a moving video), only the 30% difference looks acceptable in a still image.


I like this video explaining the difference between H.264 and H.265 https://www.youtube.com/watch?v=hRIesyNuxkg

Simplistic as it is, it touches on all the main differences. The only problem with H.265 is the higher requirements and time needed for encoding and decoding.


Damn, lost me during the frequency part.


Sometimes it's just easier to learn the math. (I am not kidding.)


What is the latest in video compression technology after H264 and H265?

The article discusses lossy compression in broad terms, but have we reaped all the low hanging fruit? Can we expect some sort of saturation just like we have with Moore's law where it gets harder and harder to optimize videos?


If the author truly wants 'magic', how about we take a 64KiB demo that runs for 4 minutes. That's 64KiB containing 240 seconds of video, and your H.264 had to use 175 for only five seconds on video.

We can conclude that 64KiB demos are at least 48 times as magical as H.264.


This was a good and interesting read. Is h.264 an open standard ?


Doesn't look like it; https://en.wikipedia.org/wiki/H.264/MPEG-4_AVC

> H.264 is protected by patents owned by various parties. A license covering most (but not all) patents essential to H.264 is administered by patent pool MPEG LA.[2] Commercial use of patented H.264 technologies requires the payment of royalties to MPEG LA and other patent owners. MPEG LA has allowed the free use of H.264 technologies for streaming internet video that is free to end users, and Cisco Systems pays royalties to MPEG LA on behalf of the users of binaries for its open source H.264 encoder.


It is an open standard. Anyone can purchase and implement it, and it was developed by ISO. The technologies are not royalty free in the US. Don't conflate the two. *

Edit: I emphasize this mainly because the terms have a specific meaning in standards jargon but also because it places the blame for software patent abuses on the wrong parties (the standards developers rather than the lawyers and legislators).


blame for software patent abuses on the wrong parties (the standards developers)

Uh, anyone familiar with the MPEG process will assure you that the companies involved love (let me restate that: PREFER) to bring in technology on which they own the patents so they get a good cut of the resulting patent pool.

Sometimes this is even done even though it technically makes no sense. Best example: hybrid filter-bank in MP3.

The process also provides no protection or discouragement from patents from semi-involved industry partners appearing later on, etc.

This difference in approach is a stark contrast to the IETF, which is why Opus work, and future AV1 work are happening under the IETF rather than the MPEG groups.


OK, so it is open, but not free ? Is it available for academic purposes free of cost ?


You have to pay royalties to actually use it, but if you just want to read the thing, you can get it for free from the ITU. https://www.itu.int/rec/T-REC-H.264-201602-S/en


As the grandparent comment says, it is free for non-commercial use.

Of course, this also only applies in countries which enforce software patents.


No, not at all. There's even restrictions on distributing H264/AVC video files themselves.


The comparison doe not make any sense, and no h264 is not magic!!: - The guy is comparing a lossless format PNG to H264 which is a lossy video format, that is not fair. - he is generating a 5 frame video and compared to 1 frame image, only the I-frame at the begining of the video matter in that case al the others are derived from it, P-Frame. - What is the point of having that comparaison we already have images format comparable to the size of a H264 I-frame and using the same science (entropy coding, frequency domain, intra frame MB derivation...)?


Did you read the article?

The point you are making here is PRECISELY the point that the author was making in the article: that a lossy format can be far, far smaller. He then goes into the details (from a high-level point of view) of what kinds of losses H264 incurs.


An enjoyable, short and to the point article with many examples and analogies. But my favorite part was this:

"Okay, but what the freq are freqX and freqY?"


"1080p @ 60 Hz = 1920x1080x60x3 => ~370 MB/sec of raw data."

I apologize if this is trivial. What does 1920 in above equation represent?


1080p is 1920x1080 px

Btw question is trivial but don't feel apologetic about asking questions. None of us know everything and in a field we don't know, our questions will be trivial.


1920x1080 is the typical resolution of 1080p Full HD, is that you mean?


Try scrubbing backwards. H264 seeking only works nice if you're fast-forwarding the video. Actually, that is kind of magical.


Do H.264 and WebRTC have different use cases? Or do they compete directly?


Let's say you want to video chat with someone using only web browsers, you would establish a direct peer-to-peer connection with WebRTC and then you could stream H.264 video to each other. I'd say WebRTC and H.264 compliment each other. However, the shared stream or data need not be H.264.


Great Write-up, thank you for your time and effort!


Wow, now tell me how H.265 works!


Copyrighted and patented magic.


Well written article.


Too trivial, too general, too pompous. I'd downvote.


I need hacker


I need hacker


time to make move on: h.265


Nice


time to move on: h.265


time to move on H.265


H.265/HEVC vs H.264/AVC: 50% bit rate savings verified

http://www.bbc.co.uk/rd/blog/2016/01/h-dot-265-slash-hevc-vs...


So what's the final car weight? It looks like you stopped at the Chroma subsampling section..


6.5 Ounces! or 0.4 lbs. Thanks for the feedback! I added the final weight into the conclusion.


This is great as a high-level overview... except that it's way too high-level. These are all extremely well-known techniques. Is there any modern video compression scheme that doesn't employ them?

In other words, why is H.264 in particular magical?


> "If you don't zoom in, you would even notice the difference."

First of all, I think he meant "you would NOT even notice".

Second of all, that's the first thing I noticed. That PNG looks crystal clear. The video looks like overcompressed garbage.


Well explained. I was thinking of reading about h264 and this is an amazing starter. Thanks Sid!


s/magic/lossy


"This concept of throwing away bits you don't need to save space is called lossy compression."

What a terrible introduction of lossy compression. This would mean that if I empty the thrash bin on my desktop, it's lossy compression.

The concept of going through all compression ideas that are used is pretty neat though.


> This would mean that if I empty the thrash bin on my desktop, it's lossy compression.

It is.


MB is 1024 * 1024 * bytes not 1000 * 1000 * bytes. Unless you're a HDD/SSD manufacturer.


MiB (mebibyte) is 1024 * 1024 bytes.


Ugh. Comparing the file size difference between a lossless PNG and a LOSSY H.264 video of a STATIC PAGE is absurd. Calling it "300 times the amount of data," when it's a STATIC IMAGE is insulting in the extreme. It really doesn't matter if the rest of the article has insights, because you lost me already.


He clarifies right after that he got to those numbers because he used a lossless vs lossy encoder. Really should've kept reading


"right after that".

No he didn't explain "right after that." He rambled on and on, and even after all of that, he STILL doesn't bring up JPG.

It's an inherently stupid comparison to make. You can't polish a turd.


Thanks for the feedback! Sorry if I was unclear. The comparison with PNG is very intentional to illustrate the vast difference in the compression efficiencies involved. I do state the difference clearly here though:

> This concept of throwing away bits you don't need to save space is called lossy compression. H.264 is a lossy codec - it throws away less important bits and only keeps the important bits.

> PNG is a lossless codec. It means that nothing is thrown away. Bit for bit, the original source image can be recovered from a PNG encoded image.


It's a very high level example of the concept, which admittedly is absurd, but drives home the point for people who might not be familiar with the scope of the numbers involved.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: