Hacker News new | past | comments | ask | show | jobs | submit login

Nice article! The motion compensation bit could be improved, though:

> The only thing moving really is the ball. What if you could just have one static image of everything on the background, and then one moving image of just the ball. Wouldn't that save a lot of space? You see where I am going with this? Get it? See where I am going? Motion estimation?

Reusing the background isn't motion compensation -- you get that by encoding the differences between frames so unchanging parts are encoded very efficiently.

Motion compensation is when you have the camera follow the ball and the background moves. Rather than encoding the difference between frames itself, you figure out that most of the frame moved and you encode the different from one frame to a shifted version of the blocks from a previous frame.

Motion compensation won't work particularly well for a tennis ball because it's spinning rapidly (so the ball looks distinctly different in consecutive frames) but more importantly because the ball occupies a tiny fraction of the total space so it doesn't help that much.

Motion compensation should work much better for things like moving cars and moving people.




Your example seems to assume translation only. I wonder how difficult/useful it would be to identify other kinds of time-varying characteristics (translation, rotation, scale, hue, saturation, brightness, etc) of partial scene elements in an automated way.

Along the same lines, it would be interesting to figure out an automated time-varying-feature detection algorithm to determine which kinds of transforms are the right ones to encode.

Do video encoders already do something like this? It seems like a pretty difficult problem since there are so many permutations of applicable transformations.


I wonder how difficult/useful it would be to identify other kinds of time-varying characteristics (translation, rotation, scale, hue, saturation, brightness, etc) of partial scene elements in an automated way.

That's how Framefree worked. It segments the image into layers, computes a full morph, including movement of the boundary, between successive frames for each layer, and transmits the before and after for each morph. Any number of frames can be interpolated between keyframes, which allows for infinite slow motion without jerk.[1] You can also upgrade existing content to higher frame rates.

This was developed back in 2006 by the Kerner Optical spinoff of Lucasfilm.[2] It didn't catch on, partly because decompression and playback requires a reasonably good GPU, and partly because Kerner Optical went bust. The segment-into-layers technology was repurposed for making 3D movies out of 2D movies, and the compression product was dropped. There was a Windows application and a browser plug-in. The marketing was misdirected - somehow, it was targeted to digital signs with limited memory, a tiny niche.

It's an idea worth revisiting. Segmentation algorithms have improved since 2006. Everything down to midrange phones now has a GPU capable of warping a texture. And it provides a way to drive a 120FPS display from 24/30 FPS content.

[1] http://creativepro.com/framefree-technologies-launches-world... [2] https://web.archive.org/web/20081216024454/http://www.framef...


John do you know where all the patents on Framefree ended up?


Ask Tom Randoph, who was CEO of FrameFree. He's now at Quicksilver Scientific in Denver.


Some venture IP company in Tokyo called "Monolith Co." also had rights in the technology.[1] "As of today (Sept. 5, 2007), the company has achieved a compression rate equivalent to that of H.264 and intends to further improve the compression rate and technology, Monolith said."[2] (This is not Monolith Studios, a game development company in Osaka.) Monolith appears to be defunct.

The parties involved with Framefree were involved in fraud litigation around 2010.[3] The case record shows various business units in the Cayman Islands and the Isle of Jersey, along with Monolith in Japan and Framefree in Delaware. No idea what the issues were. It looks like the aftermath of failed business deals.

The inventors listed on the patents are Nobuo Akiyoshi and Kozo Akiyoshi.[4]

[1] https://www.youtube.com/watch?v=VBfss0AaNaU [2] http://techon.nikkeibp.co.jp/english/NEWS_EN/20070907/138905... [3] http://www.plainsite.org/dockets/x8gi572m/superior-court-of-... [4] http://patents.justia.com/inventor/nobuo-akiyoshi


Great dectective work. I suspect the IP is now a total mess - with luck nobody has been paying the patent renewal fees and everything is now free.


Most codecs split the image into prediction blocks (for example, 16x16 for MPEG-2, or from 4x4 to 64x64 for VP9). Each of these blocks has its own motion vector. All of the transformations you mentioned look like a translation if you look at them locally, so they can all be fairly well represented by this. Codecs have, in the past, attempted global motion compensation, which tries to fully model a camera (rotating, translating, lens distortion, zooming) but all of those extra parameters are very difficult to search for.

Daala and AV1's PVQ is an example of a predictor for contrast and brightness (in a very broad sense).


Yes, H.264 has brightness/fade compensation for past frames. It's called "weighted prediction".

The previous codec MPEG4 part 2 ASP (aka DivX&XviD) had "global motion compensation" which could encode scales and rotation, but like most things in that codec it was broken in practice. Most very clever ideas in compression either take too many bits to describe or can't be done in hardware.


It seems like a pretty difficult problem since there are so many permutations of applicable transformations.

That's part of why video encoding can be very slow --- with motion compensation, to produce the best results the encoder should search through all the possible motion vectors and pick the one that gives the best match. To speed things up, at a slight cost in compression ratio, not all of them are searched, and there are heuristics on choosing a close-to-optimal one instead: https://en.wikipedia.org/wiki/Block-matching_algorithm


Now I'm out of my depth, but I think motion compensation does okay at rotation and scaling. The motion vector varies throughout the frame, and I think codecs interpolate it, so all kinds of warping can be represented.


As evidence of this, sometimes when an I-frame is dropped from a stream or you jump around in a stream you can see the texture of what was previously on the screen wrapped convincingly around the 3D surface of what's now supposed to be on the screen, all accomplished with 2D motion vectors.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: