If you're interested in a little bit of a "roll your own" thing, we have an Open Source project called Neddick[1] that could help. It's heavily centered on RSS feeds and our real target for it is inter-enterprise use... but the RSS feeds can come from anywhere (even Google Alerts, which is actually something we use quite a bit of for our "dogfood" server). Everything is indexed and searchable using Lucene, and there's a feature for adding comments, as well as a recommender to suggest similar pieces of content.
You can also do RSS aggregation, as each "channel" can consume as many RSS feeds as you want, and then the aggregate of all the entries is itself exposed as an RSS feed.
There's also voting, tagging, a "share" feature that lets you send links via email or XMPP, etc. It also looks a lot like Reddit for some weird reason.... whistles innocently
(There's a new visual theme coming that won't look so obviously inspired by Reddit, BTW)
All of that said, we're not at a 1.0 release yet and while a LOT of stuff works, and someone could almost certainly get value from it, there are definitely bugs and things that don't yet work the way we want them to. In particular the recommendations engine is currently just the Lucene MoreLikeThis filter which is pretty naive. We have plans to roll some much more sophisticated analysis using Mahout[2] eventually, but haven't gotten there yet.
Eventually we'll have persistent searches with alerts via email and XMPP, alerts based on vote thresholds, and a whole raft of other features. It's coming together, just more slowly than we'd like.
If you want to take a look, there's a public demo server at
Sample logins are testuser0-testuser19 with password "secret".
The front-page right now is actually being fed the RSS feed from here at HN, but you'll notice it's a few days out of date. That's one of the bugs I mentioned. There's a bad piece of content coming in on one of the feeds, and it's breaking the parsing, which is currently causing it to break out of the loop that consumes the feed. Fixing that is on my TODO list for Real Soon Now, so hopefully by this weekend.
Edit: This channel is more up to date, and it is, indeed, being populated from a Google Alert. This is getting a bit meta circular now... :-)
If you wanted to use this for a more general "entire web" alerting tool, you'd have to build (or acquire) a crawler to go out and get the data. I think there might be some useful stuff in Nutch[3], Droids[4], Manifold[5] and/or Heritrix[6], but I won't swear to it.
That's the bit we're not really interested in, as we aren't trying to build a public, consumer facing app here, but a tool for use inside organizations. Still, if somebody wanted to use it that way, it could be done. Scalability would also be an issue, but I think that could be managed..
You can also do RSS aggregation, as each "channel" can consume as many RSS feeds as you want, and then the aggregate of all the entries is itself exposed as an RSS feed.
There's also voting, tagging, a "share" feature that lets you send links via email or XMPP, etc. It also looks a lot like Reddit for some weird reason.... whistles innocently
(There's a new visual theme coming that won't look so obviously inspired by Reddit, BTW)
All of that said, we're not at a 1.0 release yet and while a LOT of stuff works, and someone could almost certainly get value from it, there are definitely bugs and things that don't yet work the way we want them to. In particular the recommendations engine is currently just the Lucene MoreLikeThis filter which is pretty naive. We have plans to roll some much more sophisticated analysis using Mahout[2] eventually, but haven't gotten there yet.
Eventually we'll have persistent searches with alerts via email and XMPP, alerts based on vote thresholds, and a whole raft of other features. It's coming together, just more slowly than we'd like.
If you want to take a look, there's a public demo server at
http://demo.fogbeam.org:8080/neddick1
Sample logins are testuser0-testuser19 with password "secret".
The front-page right now is actually being fed the RSS feed from here at HN, but you'll notice it's a few days out of date. That's one of the bugs I mentioned. There's a bad piece of content coming in on one of the feeds, and it's breaking the parsing, which is currently causing it to break out of the loop that consumes the feed. Fixing that is on my TODO list for Real Soon Now, so hopefully by this weekend.
Edit: This channel is more up to date, and it is, indeed, being populated from a Google Alert. This is getting a bit meta circular now... :-)
http://demo.fogbeam.org:8080/neddick1/r/Microsoft
If you wanted to use this for a more general "entire web" alerting tool, you'd have to build (or acquire) a crawler to go out and get the data. I think there might be some useful stuff in Nutch[3], Droids[4], Manifold[5] and/or Heritrix[6], but I won't swear to it.
That's the bit we're not really interested in, as we aren't trying to build a public, consumer facing app here, but a tool for use inside organizations. Still, if somebody wanted to use it that way, it could be done. Scalability would also be an issue, but I think that could be managed..
[1]: https://github.com/fogbeam/Neddick
[2]: http://mahout.apache.org
[3]: http://nutch.apache.org
[4]: http://incubator.apache.org/droids/
[5]: http://manifoldcf.apache.org/
[6]: http://en.wikipedia.org/wiki/Heritrix