LinkedIn's Data Infrastructure

swah · on Aug 5, 2010

I have a strong negative reaction every time I view the InfoQ site. I can't say exactly what is the problem thought... perhaps they should have somewhat bigger margins to separate content from menus and ads?

tzs · on Aug 5, 2010

I have one contact on LinkedIn whose profile takes significantly longer to load than that of anyone else I'm linked to. On days when the others are taking under a second to load, his takes anywhere from 5 seconds to half a minute.

There are two ways his differs from the others. (1) he has way more contacts than anyone else I know, and (2) he has a paid LinkedIn membership, whereas I and all my other contacts have free memberships.

Has anyone else noticed particular profiles always load slow, and if so, does this correlate with either them having a large number of contacts or with them having a paid membership?

bartwe · on Aug 5, 2010

I expected more then two engineers to work on that aspect of their backend.

DougWebb · on Aug 5, 2010

If you have top-notch engineers, it takes very few of them to manage very complex problems. At that level there is also an advantage to having few people: it reduces the communication and coordination overhead. Assuming the general rule that a great engineer is 100x as productive as a typical engineer, reducing overhead is 100x as effective as well.

It doesn't work this way for all situations. In the case being described here, the data is huge and the algorithms are complex, but it sounds like the software itself is probably rather small. That makes it suitable for a small team of very good engineers. In situations where the software is huge but none of it is particularly complex, you're better off with a large team of average engineers. (Of course, if you put very good engineers into that position, they will probably re-factor your software down to something more manageable.)

timf · on Aug 5, 2010

I was similarly surprised to read "three people work on [Facebook] photos, the largest photo site on the internet." (http://ow.ly/2k7ka)

My theory is that they have good managers that take care of all the BS and free their minds to get tons of work done :-\

bartwe · on Aug 5, 2010

In someway it worries me that there are so little of these jobs apparently, bit of a lottery ticket to get such a challenge.

jbooth · on Aug 5, 2010

The article sounded like they just meant that 2 engineers work on the "people you may know" thing. I'm pretty sure there's more than that on the SNA team. (www.sna-projects.com)

rjurney · on Aug 5, 2010

Two guys work on PYMK, but the entire SNA team is much bigger. It is amazing what those two guys get done, though. They sit by the window and grin all day, so much ass are they kicking.

yarapavan · on Aug 5, 2010

Link to Slidedeck: http://www.slideshare.net/ydn/6-data-applicationlinkedinhado...

badave · on Aug 5, 2010

Does anyone know where I could find out how Facebook does their suggestions? It'd be interesting to compare LinkedIn's method to Facebook's.

KrisJordan · on Aug 5, 2010

Surprised to read LinkedIn's "People You May Know" MapReduce pipeline takes 82 jobs.

From what I've read Google's index runs in around 20 MapReduce jobs.

rjurney · on Aug 5, 2010

Pagerank is a like a markov chain - you iterate the same dataset over and over until you're happy with the result. If 20 is good enough, its good enough. Good explanation here: http://www.iterativemapreduce.org/samples.html#Pagerank

Whereas if you include signals from multiple sources, the joins are each one MR job, never mind the calculations.

xiiiiiiiiii · on Aug 5, 2010

Great team, nice reading about them