Hacker News new | past | comments | ask | show | jobs | submit login
What’s the best thumbnail for this page? (zemanta.com)
84 points by Swizec on March 19, 2012 | hide | past | favorite | 19 comments



For certain things, thumbnails make sense: online, articles about photography, etc. But for regular articles, instead of figuring out how to extract thumbnails, we should realize that in most cases, the article would be highly improved by a lack of thumbnail.

There's a trend lately to illustrate low-quality content with low-quality stock pictures (most likely acquired from a Google Image search without a proper license.) For an example, just look at TechCrunch or Pando. We should strive to rid the internet of this plague.

Good articles and real journalism have standards when it comes to illustration. Open up nytimes.com and look at what's illustrated with photos versus illustrations versus nothing.


Even the best of articles need to be broken up visually otherwise they are difficult to read.

Anecdotal proof: all written publications ever.


Sure, but that can be done with good layout and typography, or tasteful illustrations (à-la New Yorker). Simply using bad filler stock images is a really bad way to go about this.


"Unlike article extraction, it doesn’t seem anyone anywhere has ever put a lot of thought into getting thumbnails out of a website."

Incorrect. Diffbot does a visual analysis of the page to determine the best thumbnail.

[edit: I also get the impression that Prismatic does intelligent grokking of the thumbnail image, especially because I know the team, but I'm not aware of anything they published about their methodology.]


Wasn't aware of that when writing the post. Thanks for the suggestion :)


I just wen't through this exact exercise for a project I'm working on where I want to figure out the best image to display from a given craigslist listing.

I used an approach most similar to Goose where I download the image to get the meta data, then get rid of odd aspect ratio images (I think I have it set to anything with bigger than a 3.0 aspect ratio, but it needs to be tweaked). I also get rid of things like 1px wide images (or anything smaller than the thumbnail I want to display).

So far it works "okay". It's far from perfect, but its WAY better than nothing.


I built fetchful.com a while ago which is an attempt at this (as well as generating preview text). After a lot of testing, and a few hundred thousand generated previews, it can be quite hard to get consistent results for thumbnail, it obviously is very simple if developers plan for this and use appropriate metadata tags for their content.


Offtopic: if you are posting to HN, then its always wise to ensure more workers for your fastcgi process :).


The news aggregators use this ambiguity to their advantage -- plenty of times I've seen a innocuous headline shown with a bikini girl thumbnail because that image was a sidebar gallery preview on the source page (or sometimes even an ad!). Any guesses what effect this has on click-thrus?


I wrote a ruby gem to do this: https://github.com/mmb/plumnailer

The current ranking is simple but I made it pluggable with the idea that it could be improved or there could be multiple implementations.


In what it's what he describes as the Zemanta approach not "slapping together a bunch of heuristics" ?


It is bunch of heuristics... just slightly more of them producing (in my opinion) slightly better result.

The core of the article however points at the main problem: no evaluation dataset on which we could compare these algorithms.


blog is down?


It seems it's down... Here is a cached page:

http://webcache.googleusercontent.com/search?q=cache:http://...


thanks.


Now make a Javascript implementation and I'll be forever grateful.


> Goose actually writes all images to disk

What on earth for?


Anyone is interested?


Honest question? Yes.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: