> "Our code base has grown organically and its internal dependencies are very co...

enneff · on Jan 8, 2014

> Modularity tends to obviate the need for large, atomic refactorings.

"Tends to".

But when you're dealing with code at Facebook's scale, things that "tend not to happen" actually happen quite a lot. In fact, you must plan for them as a matter of course.

So yes, modularity is great, and I because I'm a nice guy I assume Facebook aren't a pack of idiots and that they're writing nice modular code. But even if that's the case, in an organization of Facebook's size you still need to make widespread, atomic refactorings on a regular basis.

I know this from experience, because I work at Google (much larger codebase than Facebook) on a low-level piece of our software stack. We face these issues regularly and while working in a single repo has its drawbacks, it also has real advantages.

moron4hire · on Jan 8, 2014

Perhaps I don't understand the whole situation here. I hear "all of our code is in one repository" and I think "GMail and Google Maps are in the same repository, in the same repository with GoLang, in the same repository with AdWords."

The more I think about it, the more I think your post reveals a lack of maturity in our industry that lends credence to the pro-engineering-licensing argument that I've argued against many times on my own. That everyone can be so cavalier about this topic.

Because the fact that your companies are so large is EXACTLY why it makes no flipping sense that you're running gigantorepository. You have so many products, so many projects going on, that I just really have a hard time believing that it was disciplined software development that led to all of your code being so interdependent.

But the part that started getting under my skin was the fact that we aren't talking about Bob's Local Software Consultancy here. We're talking about two companies that touch the lives of hundreds of millions, perhaps even billions of people in the world.

If OpenStreetMap doesn't have their code in the same repository as Postgres, Linux, and DuckDuckGo, then there is no excuse for the Facebook Android App to be in the same repository as HHVM.

enneff · on Jan 8, 2014

I think you have this picture in your mind of just one big pile of spaghetti code. The truth is way more nuanced. All the code may be in one big repository, but that doesn't mean it is not well-managed. The code is still modular; code is managed in libraries with clean APIs, and so on.

But whether you keep your code in one big repository or many small repositories, you still need to track and manage those the dependencies between the various parts.

For instance, when a bug is discovered in library X, you need to know which binaries running in production were compiled against a buggy version of that library. At Google we can say "A bug was introduced at revision NNNNN and fixed at revision MMMMM. Please recompile and redeploy any binaries that were built within that range." (And we have tools to tell us exactly which binaries those are.) This is something that using One Giant Repository gives us for free.

If you were taking the many-small-repos approach, for any given binary you'd need to track the various versions of each of its dependencies. You'd also need to manage release engineering for each and every one of those projects, which slows progress a lot (although we do have a release process for really critical components).

But like I said, there are relative advantages and disadvantages to either approach. To write software at this scale requires tools, processes, and good communication. Where you keep your code, at the end of the day, is actually a pretty minor concern compared to all the other stuff you need to do to ship quality products.

moron4hire · on Jan 8, 2014

No, you're giving me the right picture, and it's mostly the picture I thought it was.

These issues are the same issues the rest of us in the world have to deal with when working with your APIs. Someone in one of the sibling comments has linked to an article discussing Bezos giving the command from on-high that Amazon would dog-food all of its APIs.

And apparently it isn't so minor of a concern if it warrants the first blog post out of Facebook in the last 3 weeks. Maybe that's just a coincidence that this is the first blog post of the year. It seems like they are trying to say "it's a big enough deal that we have and we're going to spend a lot of money on it."

Maybe the problem is that Facebook and Google are just too big. They might have to be as big as they are to be doing the work that they are doing, but is that really the best thing for the rest of the world?

thrownaway2424 · on Jan 8, 2014

The fact that Google is one of the largest, most successful software companies in history and you are arguing on the internet using the handle "moron4hire" just about sums up the merits of your position.

krakensden · on Jan 8, 2014

It's not like no one respectable who has ever taken a contrary position:

http://apievangelist.com/2012/01/12/the-secret-to-amazons-su...

moron4hire · on Jan 8, 2014

And your name is "thrownaway2424".

Just because a company is big doesn't mean they are working in the best way, or working in a way that is to the best benefit of the public. Might does not make right. We don't let large architectural engineering firms get away with doing whatever the hell they want just because they should have a proprietary interest in doing the best job possible, and we shouldn't be letting banks do it, either.

Yes, it's hard. Boo hoo. So is making safe cars. But you don't get the option to take the easy way out. Solve the hard problem, it's the job.

skj · on Jan 8, 2014

Go is hosted publicly (on googlecode) and adwords is supersecret or some nonsense and non adwords Googlers can't see it.

But gmail and maps...well, they share a lot of code! For instance, they both run on web servers.

tonfa · on Jan 8, 2014

> adwords is supersecret or some nonsense and non adwords Googlers can't see it

Most code including ads code is readable (and they can propose changes as well) by googlers.

ZenoArrow · on Jan 7, 2014

To be honest, whilst we have no way to accurately determine whether the code is a mess without a chance to see it, the most surprising line of this article (in my opinion) was that the code base was larger than the Linux kernel. I'm not seeing anything on the front end that would warrant such complexity, guessing a large chunk of the code base is server code. Would be interested in reading a summary of the components of the Facebook code base.

plorkyeran · on Jan 8, 2014

I suspect that the kernel is one of the only things running on Facebook's servers that they didn't write from scratch. Alexandrescu has mentioned that a 1% speedup to HHVM saves FB about $100k per year, and at that sort of scale it's pretty easy for reinventing every wheel to make sense.

moron4hire · on Jan 8, 2014

right, so why can't projects like that exiat in their own repository? what keeps HHVM and Facebook tightly coupled?

pydave · on Jan 8, 2014

Some of them are.

> We already have some of the easily separable projects in separate repositories, like HPHP

http://article.gmane.org/gmane.comp.version-control.git/1897...

Presumably some parts aren't so isolated.

igravious · on Jan 8, 2014

This rather surprised me as well. I tend to think of the Linux kernel as one of the larger single code-bases out there. Am I wrong?

michal8181 · on Jan 8, 2014

Yes.

As described here http://www.informationisbeautiful.net/visualizations/million... Facebook code base (~60 MLOC) if almost 4 times bigger than Linux 3.1 (a mere 15 MLOC).

bpicolo · on Jan 8, 2014

It's one of the largest open source projects perhaps. But when you get into a large company like facebook who creates a hell of a lot of different things the numbers are way higher.

Crito · on Jan 8, 2014

The kernel is measured in hundreds of megabytes. There are companies with codebases measured in terabytes.