if ('pretty infographic' || 'sexy video or picture' || 'funny video or picture' || 'xkcd' || 'liberal outrage' || 'cute animals' ) in DiggContent:
move DiggContent to FrontPage
I bet a large majority of success would be based on that piece's past performance (Reddit, Delicious, HN, repeat submissions, etc.). An "algorithm" like that is probably pretty trivial... It would be interesting if it actually has novel concepts powering the beast.
would be interesting for us... but from a business point of view, many companies sometimes just want something that is able to work now, and with a decent degree of reliability.
So I would not be surprised if the guy mentioned actually will be able to sell this program for a decent amount of money.
The stories that he's wrong about are really important here, and would determine if this is news or not.
If his site has a higher quality of content, good for him! It's like the netflix prize, except no cash.
If it's worse than digg's page, then he hasn't improved anything.
Also, I'd be curious to see how "63% accuracy" is defined. In an ecosystem where 1% of stories get through, whether this number is based on false-positives or false-negatives will make a big difference. (He could be underselling himself!)
There are usually 2 numbers used to measure the 'accuracy' of a test (be it a "This link will reach the digg front page" or "This person has HIV" etc.). Those numbers are the false positive rate (you said they'd get to the front page, and they didn't), and the false negative rate (you said they wouldn't get to the front page, and they did).
It's common for these numbers to be related. Decrease one number and the other goes up. E.g. you could get a 0% false positive rate by just saying "Yes this link will get to the front page" for all pages, however your false negative rate would be massive, about 99.999999% (since you're predicting that every link gets to the front page). This test would be useless because of the high false negative rate. A breakthrough occurs when you are able to have a low false positive and low false negative result. The holy grail of any test is one that would have a 0% false negative and a 0% false positive rate.
Usually it's a trade of between false positive and false negative. Most western justice systems would rather a high false negative than a high false positive, "Better 10 guilty men walk free, than 1 innocent man goes to jail".
His statement about "63% accuracy" is ambiguous. What is he refering to? What are the false positive and false negative rates?
6 out of ten stories in the new links section can be predicted by whether the top users vote on them. Really, only 6 out of 10? This seems extremely low given the claims of gaming (and if I'm understanding it correctly).
Well, people claim that digg is entirely run by the top users from what I understand, and if only 6/10 stories that make it to the front page are predicted by the top users voting patterns, theres not much to that claim right?
If you look at the top users most of them submit thousands of stories.
For example mklopez[1] submitted 16k stories were only 10% hit the frontpage and LtGenPanda[2] submitted 2,481 with a 42% popular ratio.
You can see this stats at the bottom right of the user profile on Digg.
I think nickb is still the top submitter to HN, there are about 4250 submissions from him, runner up is cwan with 3750 and then edw519 with 3180, I wonder what the top 3 on digg looks like.
It will be interesting to see how often stories on reddit appear on the frontpage of Digg. Clearly, several stories will make it, but it seems unlikely that it would be near 63% or even half of that.
if ('pretty infographic' || 'sexy video or picture' || 'funny video or picture' || 'xkcd' || 'liberal outrage' || 'cute animals' ) in DiggContent: move DiggContent to FrontPage