It's a little annoying that every big tech company has solved this problem in house in their own way. Google, Amazon, Facebook, Apple, and Microsoft all have internal build systems that do basically the same things.
Microsoft's build system is a joke by any modern standard. It's a combination of a tool called 'build' (some version is available in DDK) and nmake. As the end result, since 'build' doesn't enforce dependencies (and as a result can't reliably trace them), they leak, and developers put manual "drains" into build files, reducing system parallelism. They try to upgrade manifest-based builds, but nothing came out of it as far as I know. Automatic incremental builds do not work either.
It took me 12 hours to build Windows in 2001, it's still takes approximately the same time to build Windows now (or so they tell me). No distributed build in sight. I think that across entire Microsoft few thousand man-years are wasted in last decade because of the build issues, and cost of lost opportunities is unmeasurable.
I think that having an in-house system was partly driven by the need to fully utilize Google's proprietary/confidential computational infrastructure. My understanding is that every big company is different, and leveraging your CPUs to their fullest extent means there isn't a one-size-fits-all model.
The only way you could move towards a single model is if all the companies open-sourced their data centers like Facebook did, so you'd at least have a good idea of which infrastructures you need to run on.
DISCLAIMER: I work as an intern at Google in engtools (the group that manages the linked article), but have no idea about the design decisions that went into the build system.
Are there any open source build systems that work similarly? I looked at basically everything out there, and none provided this functionality. So my company ended up rolling our own, based on the Google model.
The particular characteristic that I saw missing was package-level build files. All the open source guys really wanted you have one big build file that described how to build everything, rather than bunch of small ones that could be composed.
This is the problem, no there are not any open source systems which can build systems like Google has. I think that is in part because there are no open source projects of the complexity and scale of what Google builds and thus those teams haven't run into the problems that Google did and had to come up with their own ways to work around them.
When I worked there Google had 20,000 employees. More than half of them were engineers, and probably at any given moment in time 5,000 unique users might type 'build' at their desktop after having checked in a change. That means 5,000 instances get build, 5,000 regression test suites run, following at least 5,000 different 'views' into the repository. (which sat one some Perforce servers that nearly glowed from all the work they were being asked to do).
There are few places where such scale is needed, perhaps a couple of dozen, and they all have competent engineers so they all come up with their own build system.
If open source were ever to address this particular challenge, it would have to be in the creation of some really solid interfaces which would allow a build system to emerge from the tools, rather than trying to explicitly create a build system from scratch.
That is true, although my comment referred to how the build was configured, rather than the scaling issues you mention.
Even at my current ~12 developer company, we have too big and interconnected of a codebase to reasonably manage with ant, or buildr, or anything else I saw. We won't run into "we can't build our code fast enough" for a long time still :)
There is just a huge gap in build systems, for teams between ant at the small end and and google3 at the astoundingly huge end
We're working on that for Apters (http://apters.com/). We have a simple language used to describe an individual package build, which acts as a module that you can import from another package that needs it as a build dependency. The whole thing uses content hashes for versioning, like Git, and caches any build you've already run once.
The whole thing is Open Source; we'll also have a hosted version as a service.
Cool, it sounds nice from your description. I checked out your github, but didn't find any examples of how your system works.
I think the ideal way for builds to work is to be able to define packages (that can be source, libraries, data, a genrule that runs a shell command) and the dependencies of that package (references to other packages). The build system should just know how to build various kinds of code (or maybe have one spot for configuring how your Java build runs, for example). It should also know what output files to look for, so that it can avoid re-building files over and over again. For the common languages, it can deduce where the built file should be. For genrules, the developer can specify what the output files are.
Issues I had with ant:
Ant requires you to specify the mechanics for how to build everything. It is difficult to impossible to express more complicated build rules (e.g. arbitrary shell commands were a huge pain). It also was very slow for incremental builds involving multiple projects, and it did not provide incremental builds for most build artifacts. For example, if an 'ant' rule generates a file, it will do so every time unless you write custom code to skip doing that based on hash or timestamp.
Ant is fine for one blog of code, but when you want to stop building the world, I find things get very difficult. It became painfully slow to build stuff, and difficult to write build files correctly as our system grew.
I don't have a ton of experience with other systems, but they seem to share a lot of the same problems.
Our packages work exactly the way you describe: a module which can build anything you like, and which has references to other packages for its dependencies.
Apters build steps let you run an arbitrary command, commonly a shell script. For more common cases (a C package with autotools, Java with Ant, etc), you can import a module which knows how to build them, and let it do all the work. (That module is versioned too.)
The language looks something like this:
let env = merge [deps.libfoo, deps.libbar, deps.gcc, prefix "src" (deps.source)]
in extract "/installpath" (build env "/buildscript")
This would merge together some dependencies, including the source code, run a build script in that environment, and extract the installation path from the result. /buildscript would likely contain something like "make DESTDIR=/installpath install".
deps.libfoo and deps.libbar get mapped to other Apters packages via a dependency list.
We handle incremental builds through caching: any build step with identical inputs (by hash) will use a cached result.
I love Tup, it basically replaced Make for me. It tracks dependencies and files, automatically cleaning and updating everything even more precisely than Make, making it a lot faster and easier to use. https://github.com/gittup/tup
(yes, you can use Tupfiles in each directory, if that's what you mean by packages. As far as I know however, (and this is by design) there is no way to control the build. It will always build exactly what needs to be built to be up to date.)
Their build systems may do basically the same thing, but I bet they all do it dramatically differently and probably for good reason.
Every company you listed has different products, resources, deliverables and goals. Their build systems will have evolved over many years to take advantage of those resources and meet those needs, and that process will naturally lead to different solutions.
I asked myself the same question once. Having worked a bit on this before on msft I learned that builds, just like other dev tools, need constant improvement as complexity increases. Faster builds, customization and allowing scripting, parallel building, etc... It's one of those things taken for granted because it's always been there, like the F5(compile), but It's a tool which can be improved just like any other.
I need to read more before I really understand, but what was the problem with something like distcc, ccache and CMake? The two downsides I see to the distcc/CMake approach is slow dependency resolution (looking at lots of files instead of just packages), and duplicate coping of files by distcc to remote machines.
This is quite a high level overview, so possibly underneath they're using distcc. This at least used to be true for ad-hoc builds on their corporate network in previous years.