Simple recommendation system written in Ruby

manish_gill · on March 24, 2014

Programming Collective Intelligence is an excellent book for learning these sort of things. First chapter is a recommendation engine! :)

mazelife · on March 24, 2014

I second this. And if you decide you want to go really in-depth on recommender systems, I suggest taking a look at "Recommender Systems Handbook" (http://www.springer.com/computer/ai/book/978-0-387-85819-7). It's basically a collection of scholarly articles on on the topic, so the approach is academic. But it's also the best resource I'm aware of for understanding what's state-of-the art across a range of aspects of recommender systems. (Also, although the price is a somewhat hair-raising $179, there are PDF copies of the whole thing floating around that are easy to find with a google search.)

mck- · on March 24, 2014

Great book indeed -- they talk about Collaborative filtering mostly, which suffers from the cold-start problem. If you need to build a recommendation algorithm that uses expert-knowledge (numerical features) you could use a simple kNN algorithm. [1] and [2] are two libraries I've written for this purpose.

[1] https://github.com/axiomzen/Alike [2] https://github.com/axiomzen/Look-Alike

tungwaiyip · on March 24, 2014

Yes. And this is a recommendation system implementing the book's collaborative filtering algorithm in 9 lines of code

http://tungwaiyip.info/2012/Collaborative%20Filtering.html

sciguy77 · on March 24, 2014

Thank you for the rec, this looks like a great book.

emehrkay · on March 24, 2014

I did something very similar in the past except I used Cosine Similarity (http://en.wikipedia.org/wiki/Cosine_similarity). It allowed me to give each tag a "weight" and when comparing the tag clouds, I would zero out any that aren't found. It works really really well.

wfn · on March 24, 2014

Good stuff, and a nice writeup/explanation!

To impudently hijack the thread: for a very similar approach (jaccard similarity coefficient, ruby) which has a nice abstracted implementation for background workers, take a look at David Celis 'recommendable' - here's him introducing the same system: http://davidcel.is/blog/2012/02/07/collaborative-filtering-w... and the gem itself: http://davidcel.is/recommendable/ I believe it's been discussed on HN before.

Redis is used to store the binary votes, and to compute similarity coefficients. Since redis is very good with set operations (intersections on multi-million-member sets (and more) are crazy fast), it's quite the natural choice for the db backend. One of the cases where a NoSQL solution seems to be the right tool for the job, as a matter of fact!

I've used recommendable (incl. in production code) in the past, it works very well, is reliable, robust, and easily hackable for whatever needs. (e.g. it's meant to integrate with Rails, but it's quite simple to make it work on barebones ruby, with (e.g.) Sinatra as a lightweight web app exposing vote functionality, and so on.)

otobrglez · on March 24, 2014

Thanks for the reference. David's post looks awesome!

sixtosugarman · on March 24, 2014

You may use Levenshtein Distance to get better results by taking word variations into account.

And also you can enhance it by using semantic similarity scores for strings.

mazelife · on March 24, 2014

If you're willing to get into actual NLP, then semantic similarity would certainly be one way to go. Is there any equivalent to Stanford (Java) or NLTK (Python) in Ruby land? But I'm not sure that Levenshtein will necessarily get you better results than the bag-of-words approach the author is taking with Jaccard distance, if all you're doing is document classification.

jbranchaud · on March 24, 2014

As far as NLP libraries in Ruby land, there is both [treat](https://github.com/louismullie/treat) and [ruby bindings to the Stanford Core NLP](https://github.com/louismullie/stanford-core-nlp).

otobrglez · on March 24, 2014

I've used OpenNLP with jRuby for my NLP experiment. Check it out https://github.com/otobrglez/politiki-ner to get an idea how to mix it.

jamra · on March 24, 2014

Here is a cool approach to the subject by Linked In. http://engineering.linkedin.com/open-source/cleo-open-source...

Here is my HN-obligatory, self-written golang version: https://github.com/jamra/gocleo

I went with this author's approach to use Jaccard to rank the results, however, I like this approach better: https://neil.fraser.name/writing/patch/ They basically take the distance to the beginning of the text into account.

xmpir · on March 24, 2014

I recommend reading http://nlp.stanford.edu/IR-book/ on this topic.

razvvan · on March 24, 2014

if you want something that scales better use minhash. You get a similarity that is approx jaccard but with a lower footprint memory and cpu wise.

joshcrowder · on March 24, 2014

How well does this perform compared to your postgreSQL example?

Also - We are looking for a Ruby / Backbone.js developer drop me an email josh@seriousfox.co.uk :)

otobrglez · on March 24, 2014

Thanks for you comment Josh. In production I actually implemented PG version and it's still in use today. I thought it will scale better and last longer. I didn't do benchmarking or anything like that - it think it would really depend on size of you dataset. For something serious - compute recommendation in background and store them in database. I believe that is how "big boys" do it. :)

snkcld · on March 24, 2014

for an approach using neo4j, check out cadet! (my project) cadet is more just a jruby wrapper around neo4j, but one can use it to interact with neo4j (and thus come up with recommendations without touching a line of java, or even cypher )

still in progress, and id love any input! http://github.com/karabijavad/cadet

http://github.com/karabijavad/congress-graph

chrismealy · on March 24, 2014

I remember seeing a nice little system that involved taking the square root of something, but I can't remember what it was. Anybody know it?