I have a strong negative reaction every time I view the InfoQ site. I can't say exactly what is the problem thought... perhaps they should have somewhat bigger margins to separate content from menus and ads?
I have one contact on LinkedIn whose profile takes significantly longer to load than that of anyone else I'm linked to. On days when the others are taking under a second to load, his takes anywhere from 5 seconds to half a minute.
There are two ways his differs from the others. (1) he has way more contacts than anyone else I know, and (2) he has a paid LinkedIn membership, whereas I and all my other contacts have free memberships.
Has anyone else noticed particular profiles always load slow, and if so, does this correlate with either them having a large number of contacts or with them having a paid membership?
If you have top-notch engineers, it takes very few of them to manage very complex problems. At that level there is also an advantage to having few people: it reduces the communication and coordination overhead. Assuming the general rule that a great engineer is 100x as productive as a typical engineer, reducing overhead is 100x as effective as well.
It doesn't work this way for all situations. In the case being described here, the data is huge and the algorithms are complex, but it sounds like the software itself is probably rather small. That makes it suitable for a small team of very good engineers. In situations where the software is huge but none of it is particularly complex, you're better off with a large team of average engineers. (Of course, if you put very good engineers into that position, they will probably re-factor your software down to something more manageable.)
The article sounded like they just meant that 2 engineers work on the "people you may know" thing. I'm pretty sure there's more than that on the SNA team. (www.sna-projects.com)
Two guys work on PYMK, but the entire SNA team is much bigger. It is amazing what those two guys get done, though. They sit by the window and grin all day, so much ass are they kicking.
Pagerank is a like a markov chain - you iterate the same dataset over and over until you're happy with the result. If 20 is good enough, its good enough. Good explanation here: http://www.iterativemapreduce.org/samples.html#Pagerank
Whereas if you include signals from multiple sources, the joins are each one MR job, never mind the calculations.