Hacker News new | past | comments | ask | show | jobs | submit login
17 Year Old creates website that predicts future of Digg (gigaom.com)
51 points by rjvir on Aug 25, 2010 | hide | past | favorite | 23 comments



# Predicts with 67% accuracy

if ('pretty infographic' || 'sexy video or picture' || 'funny video or picture' || 'xkcd' || 'liberal outrage' || 'cute animals' ) in DiggContent: move DiggContent to FrontPage


Apparently Digg in the future will be spitting out HTTP 500 errors.


I bet a large majority of success would be based on that piece's past performance (Reddit, Delicious, HN, repeat submissions, etc.). An "algorithm" like that is probably pretty trivial... It would be interesting if it actually has novel concepts powering the beast.


There was this post yesterday that actually showed that there is no strong correlation between how articles do when posted to multiple sites.

In fact, if something got posted to digg and did well there that's a pretty good indicator that on HN it will get killed.


In fact, if something got posted to digg and did well there that's a pretty good indicator that on HN it will get killed.

Isn't that a strong (negative) correlation?


Hehe, good point. But it's not a hard rule though.


http://news.ycombinator.com/item?id=1631199

Is this the story you are thinking of?


would be interesting for us... but from a business point of view, many companies sometimes just want something that is able to work now, and with a decent degree of reliability.

So I would not be surprised if the guy mentioned actually will be able to sell this program for a decent amount of money.


The stories that he's wrong about are really important here, and would determine if this is news or not.

If his site has a higher quality of content, good for him! It's like the netflix prize, except no cash.

If it's worse than digg's page, then he hasn't improved anything.

Also, I'd be curious to see how "63% accuracy" is defined. In an ecosystem where 1% of stories get through, whether this number is based on false-positives or false-negatives will make a big difference. (He could be underselling himself!)


There are usually 2 numbers used to measure the 'accuracy' of a test (be it a "This link will reach the digg front page" or "This person has HIV" etc.). Those numbers are the false positive rate (you said they'd get to the front page, and they didn't), and the false negative rate (you said they wouldn't get to the front page, and they did).

It's common for these numbers to be related. Decrease one number and the other goes up. E.g. you could get a 0% false positive rate by just saying "Yes this link will get to the front page" for all pages, however your false negative rate would be massive, about 99.999999% (since you're predicting that every link gets to the front page). This test would be useless because of the high false negative rate. A breakthrough occurs when you are able to have a low false positive and low false negative result. The holy grail of any test is one that would have a 0% false negative and a 0% false positive rate.

Usually it's a trade of between false positive and false negative. Most western justice systems would rather a high false negative than a high false positive, "Better 10 guilty men walk free, than 1 innocent man goes to jail".

His statement about "63% accuracy" is ambiguous. What is he refering to? What are the false positive and false negative rates?


I think he means 37% false positives. This guess is based on the archive section, where he lists hits and misses. http://digginthefuture.com/archive


Based on what I've used with Digg's upcoming engine, it doesn't seem near 63% at all.


6 out of ten stories in the new links section can be predicted by whether the top users vote on them. Really, only 6 out of 10? This seems extremely low given the claims of gaming (and if I'm understanding it correctly).


How is 6/10 extremely low?


Well, people claim that digg is entirely run by the top users from what I understand, and if only 6/10 stories that make it to the front page are predicted by the top users voting patterns, theres not much to that claim right?

EDIT: http://www.seomoz.org/blog/top-100-digg-users-control-56-of-...


If you look at the top users most of them submit thousands of stories.

For example mklopez[1] submitted 16k stories were only 10% hit the frontpage and LtGenPanda[2] submitted 2,481 with a 42% popular ratio. You can see this stats at the bottom right of the user profile on Digg.

[1]: http://digg.com/users/mklopez

[2]: http://digg.com/users/LtGenPanda


I think nickb is still the top submitter to HN, there are about 4250 submissions from him, runner up is cwan with 3750 and then edw519 with 3180, I wonder what the top 3 on digg looks like.


reddit.com?


It will be interesting to see how often stories on reddit appear on the frontpage of Digg. Clearly, several stories will make it, but it seems unlikely that it would be near 63% or even half of that.


I can't find the source ATM, but apparently it's 10-20% daily, down from 30-40% a year ago.

Scroll to the bottom of this for something tangible: http://www.raterush.com/pages/digg-reddit


http://www.raterush.com/pages/digg-reddit

particularly that section about digg or reddit first


You have to check it the next day.


No, you've got the order mixed up, reddit turns in to digg.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: