Hacker News new | past | comments | ask | show | jobs | submit login
Super Resolution from a Single Image (2009) (weizmann.ac.il)
146 points by bsilvereagle on Oct 18, 2015 | hide | past | favorite | 42 comments



I just can't understand how it manages to make this image http://www.pics.rs/i/2cuoM look like this image http://www.pics.rs/i/IEz0N, especially the last line of letters.


I guess the key reason why this works so well is because the lower resolution has been calculated from the higher resolution. So you have detailed information on how exactly the downscaling was calculated, which enables you to "cheat".

If you apply the same method to a completely different downscaling method, it probably wouldn't work out as nicely.

The ultimate test would be two different photographs taken at the same time - one with high and one with low resolution. Almost certainly this wouldn't work out as nicely. There is a huge difference between (a) having a low-res image due to physical effects, and (b) having a low-res image calculated by a known algorithm where all parameters are also known.


This is plain wrong. As others have pointed out, it is independent of the scaling method - it just looks for similar patches (at different sizes, mirror images, and perhaps even rotations - I don't remember) on the same image to find the best one that matches, and uses that for information.

In this specific case, you have all the letters there to match against, so there's no surprise.

Note that it uses the entire image to look for matches. If you tried to enlarge only the last line, it would probably NOT look as good.

There's a good reason that it works: Many pictures include similar elements at different scales, which lets you infer things from one scale to the other. In fact, in the eighties there was a lot of hype about "fractal compression" based on the same principle, see e.g. http://www.cs.northwestern.edu/~agupta/_projects/image_proce... ; In the end, they couldn't improve on JPEG, and has been essentially forgotten - but ... it did match the JPEG coders at the time in terms of compression rate (did much worse on speed and memory requirements); and, it could decompress pictures to much larger geometry than the original while still looking good -- technically, very similar to what is described in this paper. Everything old is new again.


Fractal Compression also stalled because Iterated Systems have many patents in the technology and licensed it under too-high-for-hobbyists terms. They should have released a low-end version as freeware/open-source and concentrated on the commercial licenses / support (similar to SQLite).


>This is plain wrong. As others have pointed out, it is independent of the scaling method - it just looks for similar patches (at different sizes, mirror images, and perhaps even rotations - I don't remember) on the same image to find the best one that matches, and uses that for information.

Which is much harder to do when the scaling method isn't known.


It doesn't even have to be scaling. It will work just as well copying only from similar images (e.g., if this was a frame from a movie, they could use nearby frames to the same effect; or if it's a picture taken of the same thing from a different angle).


Very true, but in principle it's possible to completely characterize the downscaling process effected by a physical camera.


Good point. In this case it is essentially the same as using a known algorithm with unknown parameters.

Either way, this increases the space of possible upscaled images dramatically, which makes it a lot harder (if not impossible) to produce result of the demonstrated quality.


There is some discussion of this from 2012, when this was last posted. It looks to be guessing the letters (sometimes incorrectly) based on the larger ones present in the image: https://news.ycombinator.com/item?id=4241978


Thanks, I was about to dig that up. Neat to see this again. (I should check out their more recent research now.)


I think the idea is that an image has a certain amount of self similarity. I.e. skin looks like skin. So the algorithm learns how skin is supposed to look like by looking at all the skin. So every wrinkle is improved by looking at every other wrinkle as opposed to just the pixles in an immediate vicinity.

Now replace "wrinkle" for "patch" and sprinkle some crazy stats and machine learning to get the idea to work.


As far as I understood, the letters at the bottom row are rebuilt from patterns found in other parts of the image - a few of them are false positives, see the original chart https://joyerickson.files.wordpress.com/2010/07/eye-chart.jp...


The general principal seems to be replacing being distance X from the truth with probability 1, with being distance X/10 from the truth with probability 0.99 and distance X*10 from the truth with probability 0.01.

This is consistent with another reply which states that it guesses the true letter (sometimes incorrectly)


A lot of things happened since then. References up to spring 2014 are e.g. in our own work to super-resolve arbitrary sized images with convolutional en/decoders: Super-Resolution with Fast Approximate Convolutional Sparse Coding, http://brml.org/uploads/tx_sibibtex/281.pdf (which still has a lot of possibilities to be extended (e.g. color) and improved upon).


Interesting paper; I'd be curious to see examples of how that performs. Are any online?


No, sorry :( Clearly a TODO for some future work.


I realize you mention there's no online example, but is your latest research available to try ourselves, using our own source images? This is neat!


Thanks for your kind words! There is some stuff floating around, PM me.


It appears there's no PM system available to use. (At least not on my account, which I created to ask you this question)


Check my 'about' section here, should lead you to my mail :).


Interesting to see what some of them did since https://sites.google.com/site/dglasner/

Pretty original research.


So, if I understand from the abstract, for every pixel they look through the image for similar "pixel neighbourhoods" of the same or varying size, and collect them into a kind of database of examples to be used for scaling up that pixel. Pretty cool idea!


This is the algorithm they use in all those movies. "Enhance!"


I wonder if the same techniques could be used for compression as well.


Some photocopiers have actually replaced numbers with different numbers in copied documents because of pattern-based compression: https://news.ycombinator.com/item?id=6156238


Photocopiers use Mixed Raster Content (MRC) with JBIG compression, which is indeed able to confuse symbols at the same scale, but across-the-scales substitution is beyond the current MRC capabilities afaik.


One way around that problem is to use lossless compression.

Unfortunately, if you have merely 1 bit of static noise per pixel (so +/- 0.4% intensity), you won't achieve better than 12.5% compression.


Absolutely. For lossless compression you just need to encode the difference between the predicted pixels and the actual pixels. Which hopefully has a lot of zeros, making it easier to compress. For lossy compression, you just throw away some or all of the difference info.

The biggest issue is speed. This doesn't look very fast to encode or decode. That's the biggest issue with new image compression algorithms, and makes it nonviable for most purposes.


Yes, and this was all the rage in the late '80s and early '90s. Everyone was sure this will replace JPEGs. See http://www.cs.northwestern.edu/~agupta/_projects/image_proce... for a description.

Spoiler: It didn't happen. Fractal compression mostly matched JPEG on rate/quality ; it didn't improve on it, though; and it was horrible in both memory use, run time, and memory requirements compared to JPEG. So JPEG it (still) is.

Another Spoiler: If file size was the criterion, BPG would be replacing JPEG these days. That isn't happening either.


If you mean lossy, why not? Netflix could use this by reducing the information they send out to their users who recreate a lot of what was lost.

I don't think for loss-less though, because they are inferring what a pixel should to be, i.e. they cannot be 100% any bit is every set right, just the probability that it is.


I believe this is already somewhat similar to say zip compression, which can reference other portions of the compressed file.


Is there an implementation of this available?


There is a link in the old thread: https://news.ycombinator.com/item?id=4242002


I guess I fail the IQ test. Where's the download link?


It looks like the page header/navigation got lost in time. Archive.org shows a lot of now-missing content [1], perhaps in the move to GitHub Pages?

In any case the download page does still exist: http://mentat.za.net/supreme/download.shtml

Which leads to the repo: https://github.com/stefanv/supreme

[1] https://web.archive.org/web/20140204102520/http://mentat.za....


To clarify my understanding of this post, would it be possible using this method to: 1. start with a high resolution image 2. create a low resolution version 3. using the low resolution version produce a high resolution version that looks good


It has a visual style that looks like a "photorealism" painting. Quite amazing. https://en.wikipedia.org/wiki/Photorealism


Hmmm? The idea of photorealism is that it should look as much as an actual photograph (or reality) as possible. And the best such painters, do.


Original code @ https://github.com/stefanv/supreme (repost from downthread). Looks like it's BSD licensed and in Python.


As I see from http://mentat.za.net/supreme/, this is an implementation of wholly another method, neither single-image nor self-similarity–based.


Zoom and enhance





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: