Or maybe our algorithms just aren't good enough. Suppose you use bayesian filter...

derefr · on Nov 29, 2010

In this case, what you specifically want is Sentiment Analysis. It's getting pretty accurate+efficient, and should be usable in just this scenario.

nutjob123 · on Nov 29, 2010

Interesting, do people frequently use bayes' theorem in web programming? Ive only seen it it other programming contexts.

pjscott · on Nov 29, 2010

If you're interested, there are a whole host of fun and useful machine learning techniques that are actually not as hard to understand and apply as they sound. The best introductory book that I know of is Programming Collective Intelligence, which is surprisingly clear, if a little vague on the theory:

http://oreilly.com/catalog/9780596529321

Naive Bayesian classifiers are just one of the more popular types; others include Support Vector Machines (SVMs), decision trees (and their relatives, random forests), and a bunch more. If you'd like to play around with some, Weka is good open source software for this:

http://www.cs.waikato.ac.nz/ml/weka/

mixmax · on Nov 29, 2010

I have no idea...

That's how I'd solve this particular problem though. As I said in the parent I only have cursory experience in programming, and almost none in algorithms.

ddemchuk · on Nov 29, 2010

Google already analyzes backlinks in their context to determine how relevant the anchor text is to the topic of the page.

Determining sentiment (the topic of the NYT piece) is considerably harder though, because it would allow for spammers to write negative articles about a site and link to it and negatively affect its rankings. Also, determining the tone/emotions of a piece of text is probably one of the hardest things to do with textual analysis

alextgordon · on Nov 29, 2010

Determining sentiment (the topic of the NYT piece) is considerably harder though, because it would allow for spammers to write negative articles about a site and link to it and negatively affect its rankings.

This could be solved by making sentiments act as a weight (i.e. a multiplier in [0, 1]). Positive sentiments would give a particular reference more weight, negative sentiments would give little to no weight. Then it would be impossible to negatively affect a site's rankings - only positively affect them. Just like now.

tgflynn · on Nov 29, 2010

You wouldn't have to apply the sentiment factor for all sites. You could have a rule that would only lower rankings for sites which have an overwhelming majority of negative links (say >90%). I don't think there would be many cases where a spammer could achieve that level of impact and those few cases could be dealt with through manual intervention.

As for determining sentiment, it's not something I've ever tried to do, but is it really that hard ? Intuitively I would think positive and negative articles would have significantly different distributions of certain words.

nickmbailey · on Nov 29, 2010

A negative link on a page with a low reputation could be accounted for accordingly.