The Google PageRank Algorithm in 126 Lines of Python

pc · on Jan 30, 2008

"I was surprised by the simplicity of the math underlying the google PageRank algorithm, and the ease with which it seemed to be efficiently implementable"

It's actually not all that easy to implement this stuff efficiently when n grows to any sort of useful size. E.g. the article uses an n x n adjacency matrix, so if we take n = 1 million pages, we need 4TB just to represent it, assuming no overhead. (Adjacency lists help here.)

See http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en....

(I hacked together a script to compute this on MediaWiki wikis a while ago, by examining the DB directly -- http://collison.ie/code/pagerank.rb)

vincentk · on Jan 30, 2008

As the author of the above article, I'd like to point out that the adjacency matrix is extremely sparse. If you use a sparse representation of the link matrix (as I have done -- your claim is incorrect), its space usage will scale as O(N), and not as O(N^2) as you suggest.

By that of course I don't mean to say you can use this script to build your own google.... but the problems lie elsewhere.

eru · on Jan 30, 2008

On of our professors used PageRank as an example to introduce matrices. He explained the algorithm in class. (Not this particular python implemention though.)

kf · on Jan 30, 2008

The original algorithm, I presume... I would think they're up to millions of lines of python/c++ by now.

tlrobinson · on Jan 30, 2008

Next: "Linux in 62 lines of code!!!"

trekker7 · on Jan 30, 2008

Bah, PG could write that in Lisp on the back of a napkin.

ph0rque · on Jan 30, 2008

With arc, he'd need just a postage stamp.

http://reddit.com/r/programming/info/6710p/comments/c0312wv

timr · on Jan 30, 2008

A bit more for the unicode support, though.