> Modularity tends to obviate the need for large, atomic refactorings.
"Tends to".
But when you're dealing with code at Facebook's scale, things that "tend not to happen" actually happen quite a lot. In fact, you must plan for them as a matter of course.
So yes, modularity is great, and I because I'm a nice guy I assume Facebook aren't a pack of idiots and that they're writing nice modular code. But even if that's the case, in an organization of Facebook's size you still need to make widespread, atomic refactorings on a regular basis.
I know this from experience, because I work at Google (much larger codebase than Facebook) on a low-level piece of our software stack. We face these issues regularly and while working in a single repo has its drawbacks, it also has real advantages.
Perhaps I don't understand the whole situation here. I hear "all of our code is in one repository" and I think "GMail and Google Maps are in the same repository, in the same repository with GoLang, in the same repository with AdWords."
The more I think about it, the more I think your post reveals a lack of maturity in our industry that lends credence to the pro-engineering-licensing argument that I've argued against many times on my own. That everyone can be so cavalier about this topic.
Because the fact that your companies are so large is EXACTLY why it makes no flipping sense that you're running gigantorepository. You have so many products, so many projects going on, that I just really have a hard time believing that it was disciplined software development that led to all of your code being so interdependent.
But the part that started getting under my skin was the fact that we aren't talking about Bob's Local Software Consultancy here. We're talking about two companies that touch the lives of hundreds of millions, perhaps even billions of people in the world.
If OpenStreetMap doesn't have their code in the same repository as Postgres, Linux, and DuckDuckGo, then there is no excuse for the Facebook Android App to be in the same repository as HHVM.
I think you have this picture in your mind of just one big pile of spaghetti code. The truth is way more nuanced. All the code may be in one big repository, but that doesn't mean it is not well-managed. The code is still modular; code is managed in libraries with clean APIs, and so on.
But whether you keep your code in one big repository or many small repositories, you still need to track and manage those the dependencies between the various parts.
For instance, when a bug is discovered in library X, you need to know which binaries running in production were compiled against a buggy version of that library. At Google we can say "A bug was introduced at revision NNNNN and fixed at revision MMMMM. Please recompile and redeploy any binaries that were built within that range." (And we have tools to tell us exactly which binaries those are.) This is something that using One Giant Repository gives us for free.
If you were taking the many-small-repos approach, for any given binary you'd need to track the various versions of each of its dependencies. You'd also need to manage release engineering for each and every one of those projects, which slows progress a lot (although we do have a release process for really critical components).
But like I said, there are relative advantages and disadvantages to either approach. To write software at this scale requires tools, processes, and good communication. Where you keep your code, at the end of the day, is actually a pretty minor concern compared to all the other stuff you need to do to ship quality products.
No, you're giving me the right picture, and it's mostly the picture I thought it was.
These issues are the same issues the rest of us in the world have to deal with when working with your APIs. Someone in one of the sibling comments has linked to an article discussing Bezos giving the command from on-high that Amazon would dog-food all of its APIs.
And apparently it isn't so minor of a concern if it warrants the first blog post out of Facebook in the last 3 weeks. Maybe that's just a coincidence that this is the first blog post of the year. It seems like they are trying to say "it's a big enough deal that we have and we're going to spend a lot of money on it."
Maybe the problem is that Facebook and Google are just too big. They might have to be as big as they are to be doing the work that they are doing, but is that really the best thing for the rest of the world?
The fact that Google is one of the largest, most successful software companies in history and you are arguing on the internet using the handle "moron4hire" just about sums up the merits of your position.
Just because a company is big doesn't mean they are working in the best way, or working in a way that is to the best benefit of the public. Might does not make right. We don't let large architectural engineering firms get away with doing whatever the hell they want just because they should have a proprietary interest in doing the best job possible, and we shouldn't be letting banks do it, either.
Yes, it's hard. Boo hoo. So is making safe cars. But you don't get the option to take the easy way out. Solve the hard problem, it's the job.
To be honest, whilst we have no way to accurately determine whether the code is a mess without a chance to see it, the most surprising line of this article (in my opinion) was that the code base was larger than the Linux kernel. I'm not seeing anything on the front end that would warrant such complexity, guessing a large chunk of the code base is server code. Would be interested in reading a summary of the components of the Facebook code base.
I suspect that the kernel is one of the only things running on Facebook's servers that they didn't write from scratch. Alexandrescu has mentioned that a 1% speedup to HHVM saves FB about $100k per year, and at that sort of scale it's pretty easy for reinventing every wheel to make sense.
It's one of the largest open source projects perhaps. But when you get into a large company like facebook who creates a hell of a lot of different things the numbers are way higher.
That's a polite way of saying "we write shitty code without any sort of plan."
> "Splitting it up would make large, atomic refactorings more difficult"
Actually, it's the other way around. Modularity tends to obviate the need for large, atomic refactorings.
And what, exactly, is the meaning of these graphs? This is leading me to believe that being a developer at Facebook is about quantity over quality.