Hacker News new | past | comments | ask | show | jobs | submit login

You can get the trending topics through the Twitter API https://dev.twitter.com/docs/api/1/get/trends/%3Awoeid

But I think what you are asking is how such a method would come up with its own trends, given just a stream of tweets. This is a supervised approach (http://en.wikipedia.org/wiki/Supervised_learning), so for now, you would need to train it (possibly online) by giving it examples of what should be a trend and what shouldn't. It would be interesting to make it semi-supervised (http://en.wikipedia.org/wiki/Semi-supervised_learning) so that you would only need to provide a small number of labels.




Thanks. That is what I was asking.

It sort of comes down to the question of what's really being learned here? Are they modeling some inherent process of topics becoming popular (or memes spreading in a population) that could be used in other situations, or are they just modeling some arbitrary algorithm that twitter uses to mark some topics as "trends"? If they're just modeling twitter's existing algorithm, then it's less interesting because that algorithm already exists. Since they're able to detect the trend before twitter does (well, before twitter announces it anyway), then it seems like they're probably onto something more fundamental.


It sort of comes down to the question of what's really being learned here?

That's a great question. We are learning to recognize trends and non-trends based on previous examples. Since the Twitter trends algorithm gives us such examples, you could say we are learning to replicate the outputs of an arbitrary algorithm --- and you'd be right. But learning from examples is a very general thing, so the method has applications beyond detecting trending topics.

Are they modeling some inherent process of topics becoming popular

No, we don't model the process of something becoming popular. (To do this, one might suppose people spread popular topics in X way and unpopular topics in Y way, and try to estimate from the data whether the topic is popular or unpopular.) The beauty of this is that we never have to build a model, because we rely directly on the data. As a corollary, this approach is applicable out of the box for any domain with time-varying data (though I suppose you might have to take care to measure the right kind of time varying data).

Does that answer your question?


I think that answers it. Thanks. :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: