I second this. And if you decide you want to go really in-depth on recommender systems, I suggest taking a look at "Recommender Systems Handbook" (http://www.springer.com/computer/ai/book/978-0-387-85819-7). It's basically a collection of scholarly articles on on the topic, so the approach is academic. But it's also the best resource I'm aware of for understanding what's state-of-the art across a range of aspects of recommender systems. (Also, although the price is a somewhat hair-raising $179, there are PDF copies of the whole thing floating around that are easy to find with a google search.)
Great book indeed -- they talk about Collaborative filtering mostly, which suffers from the cold-start problem. If you need to build a recommendation algorithm that uses expert-knowledge (numerical features) you could use a simple kNN algorithm. [1] and [2] are two libraries I've written for this purpose.
I did something very similar in the past except I used Cosine Similarity (http://en.wikipedia.org/wiki/Cosine_similarity). It allowed me to give each tag a "weight" and when comparing the tag clouds, I would zero out any that aren't found. It works really really well.
To impudently hijack the thread: for a very similar approach (jaccard similarity coefficient, ruby) which has a nice abstracted implementation for background workers, take a look at David Celis 'recommendable' - here's him introducing the same system: http://davidcel.is/blog/2012/02/07/collaborative-filtering-w... and the gem itself: http://davidcel.is/recommendable/ I believe it's been discussed on HN before.
Redis is used to store the binary votes, and to compute similarity coefficients. Since redis is very good with set operations (intersections on multi-million-member sets (and more) are crazy fast), it's quite the natural choice for the db backend. One of the cases where a NoSQL solution seems to be the right tool for the job, as a matter of fact!
I've used recommendable (incl. in production code) in the past, it works very well, is reliable, robust, and easily hackable for whatever needs. (e.g. it's meant to integrate with Rails, but it's quite simple to make it work on barebones ruby, with (e.g.) Sinatra as a lightweight web app exposing vote functionality, and so on.)
If you're willing to get into actual NLP, then semantic similarity would certainly be one way to go. Is there any equivalent to Stanford (Java) or NLTK (Python) in Ruby land? But I'm not sure that Levenshtein will necessarily get you better results than the bag-of-words approach the author is taking with Jaccard distance, if all you're doing is document classification.
I went with this author's approach to use Jaccard to rank the results, however, I like this approach better:
https://neil.fraser.name/writing/patch/
They basically take the distance to the beginning of the text into account.
Thanks for you comment Josh. In production I actually implemented PG version and it's still in use today. I thought it will scale better and last longer. I didn't do benchmarking or anything like that - it think it would really depend on size of you dataset. For something serious - compute recommendation in background and store them in database. I believe that is how "big boys" do it. :)
for an approach using neo4j, check out cadet! (my project)
cadet is more just a jruby wrapper around neo4j, but one can use it to interact with neo4j (and thus come up with recommendations without touching a line of java, or even cypher )