Hacker News new | past | comments | ask | show | jobs | submit login
Simple recommendation system written in Ruby (opalab.com)
157 points by otobrglez on March 24, 2014 | hide | past | favorite | 19 comments



Programming Collective Intelligence is an excellent book for learning these sort of things. First chapter is a recommendation engine! :)


I second this. And if you decide you want to go really in-depth on recommender systems, I suggest taking a look at "Recommender Systems Handbook" (http://www.springer.com/computer/ai/book/978-0-387-85819-7). It's basically a collection of scholarly articles on on the topic, so the approach is academic. But it's also the best resource I'm aware of for understanding what's state-of-the art across a range of aspects of recommender systems. (Also, although the price is a somewhat hair-raising $179, there are PDF copies of the whole thing floating around that are easy to find with a google search.)


Great book indeed -- they talk about Collaborative filtering mostly, which suffers from the cold-start problem. If you need to build a recommendation algorithm that uses expert-knowledge (numerical features) you could use a simple kNN algorithm. [1] and [2] are two libraries I've written for this purpose.

[1] https://github.com/axiomzen/Alike [2] https://github.com/axiomzen/Look-Alike


Yes. And this is a recommendation system implementing the book's collaborative filtering algorithm in 9 lines of code

http://tungwaiyip.info/2012/Collaborative%20Filtering.html


Thank you for the rec, this looks like a great book.


I did something very similar in the past except I used Cosine Similarity (http://en.wikipedia.org/wiki/Cosine_similarity). It allowed me to give each tag a "weight" and when comparing the tag clouds, I would zero out any that aren't found. It works really really well.


Good stuff, and a nice writeup/explanation!

To impudently hijack the thread: for a very similar approach (jaccard similarity coefficient, ruby) which has a nice abstracted implementation for background workers, take a look at David Celis 'recommendable' - here's him introducing the same system: http://davidcel.is/blog/2012/02/07/collaborative-filtering-w... and the gem itself: http://davidcel.is/recommendable/ I believe it's been discussed on HN before.

Redis is used to store the binary votes, and to compute similarity coefficients. Since redis is very good with set operations (intersections on multi-million-member sets (and more) are crazy fast), it's quite the natural choice for the db backend. One of the cases where a NoSQL solution seems to be the right tool for the job, as a matter of fact!

I've used recommendable (incl. in production code) in the past, it works very well, is reliable, robust, and easily hackable for whatever needs. (e.g. it's meant to integrate with Rails, but it's quite simple to make it work on barebones ruby, with (e.g.) Sinatra as a lightweight web app exposing vote functionality, and so on.)


Thanks for the reference. David's post looks awesome!


You may use Levenshtein Distance to get better results by taking word variations into account.

And also you can enhance it by using semantic similarity scores for strings.


If you're willing to get into actual NLP, then semantic similarity would certainly be one way to go. Is there any equivalent to Stanford (Java) or NLTK (Python) in Ruby land? But I'm not sure that Levenshtein will necessarily get you better results than the bag-of-words approach the author is taking with Jaccard distance, if all you're doing is document classification.


As far as NLP libraries in Ruby land, there is both [treat](https://github.com/louismullie/treat) and [ruby bindings to the Stanford Core NLP](https://github.com/louismullie/stanford-core-nlp).


I've used OpenNLP with jRuby for my NLP experiment. Check it out https://github.com/otobrglez/politiki-ner to get an idea how to mix it.


Here is a cool approach to the subject by Linked In. http://engineering.linkedin.com/open-source/cleo-open-source...

Here is my HN-obligatory, self-written golang version: https://github.com/jamra/gocleo

I went with this author's approach to use Jaccard to rank the results, however, I like this approach better: https://neil.fraser.name/writing/patch/ They basically take the distance to the beginning of the text into account.


I recommend reading http://nlp.stanford.edu/IR-book/ on this topic.


if you want something that scales better use minhash. You get a similarity that is approx jaccard but with a lower footprint memory and cpu wise.


How well does this perform compared to your postgreSQL example?

Also - We are looking for a Ruby / Backbone.js developer drop me an email josh@seriousfox.co.uk :)


Thanks for you comment Josh. In production I actually implemented PG version and it's still in use today. I thought it will scale better and last longer. I didn't do benchmarking or anything like that - it think it would really depend on size of you dataset. For something serious - compute recommendation in background and store them in database. I believe that is how "big boys" do it. :)


for an approach using neo4j, check out cadet! (my project) cadet is more just a jruby wrapper around neo4j, but one can use it to interact with neo4j (and thus come up with recommendations without touching a line of java, or even cypher )

still in progress, and id love any input! http://github.com/karabijavad/cadet

http://github.com/karabijavad/congress-graph


I remember seeing a nice little system that involved taking the square root of something, but I can't remember what it was. Anybody know it?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: