Hacker News new | past | comments | ask | show | jobs | submit login

Or maybe our algorithms just aren't good enough.

Suppose you use bayesian filtering on the text surrounding the links to determine whether the connection is good or bad. With a reasonable amount of data it should be possible.

Note: I'm not an algorithms guy, I do business and strategy and a wee bit of programming, so maybe the example isn't good, but I thinkthe point is.




In this case, what you specifically want is Sentiment Analysis. It's getting pretty accurate+efficient, and should be usable in just this scenario.


Interesting, do people frequently use bayes' theorem in web programming? Ive only seen it it other programming contexts.


If you're interested, there are a whole host of fun and useful machine learning techniques that are actually not as hard to understand and apply as they sound. The best introductory book that I know of is Programming Collective Intelligence, which is surprisingly clear, if a little vague on the theory:

http://oreilly.com/catalog/9780596529321

Naive Bayesian classifiers are just one of the more popular types; others include Support Vector Machines (SVMs), decision trees (and their relatives, random forests), and a bunch more. If you'd like to play around with some, Weka is good open source software for this:

http://www.cs.waikato.ac.nz/ml/weka/


I have no idea...

That's how I'd solve this particular problem though. As I said in the parent I only have cursory experience in programming, and almost none in algorithms.


Google already analyzes backlinks in their context to determine how relevant the anchor text is to the topic of the page.

Determining sentiment (the topic of the NYT piece) is considerably harder though, because it would allow for spammers to write negative articles about a site and link to it and negatively affect its rankings. Also, determining the tone/emotions of a piece of text is probably one of the hardest things to do with textual analysis


Determining sentiment (the topic of the NYT piece) is considerably harder though, because it would allow for spammers to write negative articles about a site and link to it and negatively affect its rankings.

This could be solved by making sentiments act as a weight (i.e. a multiplier in [0, 1]). Positive sentiments would give a particular reference more weight, negative sentiments would give little to no weight. Then it would be impossible to negatively affect a site's rankings - only positively affect them. Just like now.


You wouldn't have to apply the sentiment factor for all sites. You could have a rule that would only lower rankings for sites which have an overwhelming majority of negative links (say >90%). I don't think there would be many cases where a spammer could achieve that level of impact and those few cases could be dealt with through manual intervention.

As for determining sentiment, it's not something I've ever tried to do, but is it really that hard ? Intuitively I would think positive and negative articles would have significantly different distributions of certain words.


A negative link on a page with a low reputation could be accounted for accordingly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: