Hermetic declarative build systems like Bazel feel exactly like sound statically typed languages to me. It's really hard to get all of your dependencies precisely specified declaratively. The urge to hack in a little imperative code or jam in a build step that you just "know" needs to happen in a certain order is always there and it can be very difficult to satisfy the taskmaster that is the build system. When you have weird cases like a script that produces a variable set of output files based on the content of some input file, it's tough to get that working in the build system.
But... if you climb that cliff, there are many rewards at the top. Like the article notes, you can reuse intermediate build artifacts between developers (because it's not possible for those artifacts to accidentally depend on developer-specific state), you can automatically parallelize (because the build system can tell declaratively which tasks do no depend on each other) and you can ask it interesting queries about the dependency graph. That sounds an awful lot like the runtime safety, performance, and static analysis static types give you.
The challenge is that for small programs, dynamic types are just easier to get going with. And every build script starts pretty small. So you have a plethora of simple, hackable build systems like make that work fine at first but then get more and more painful as the program grows. With software, there is at least some customer-visible pain that can enable you to justify a rewrite in a statically-typed language. But build infrastructure is always a hidden cost so it's really hard to get the resources to migrate to a different build language.
Maybe the lesson here is to bite the bullet and start with a hermetic build system at first.
Yes but there is a hundred megabytes of system behind that. Tens of thousands of lines of starlark code, and tens (maybe hundreds) of thousands of lines of java code besides.
A lot of work goes into those 4 lines. This is evidenced by the simple question of, how do you set up a custom compiler for that code properly? The answer being a couple hundred lines of starlark code, for what in other build systems is effectively `CC=clang` (to be fair bazel allows this but does not support or encourage it, especially for anything besides toy builds).
I like Bazel (a lot!) but you are oversimplifying it here.
Provided the abstraction doesn't leak I really don't care, in the same way that end users of Microsoft Word don't care about the algorithms that kern their text, about how DirectWrite does subpixel rendering, or anything else.
Yes, it's all fiercely large and complex. The original user is pointing out that this is all hidden for simple projects, waiting in the wings for when you need it, and therefore doesn't matter.
Yes but it's minimizing the complexity of the cliff. The original author they were responding to was describing the static/dynamic discrepancy, and how it requires a lot of work.
The author I was replying points out how simple the syntax is. But that's missing the point. I provided a direct counter example:
Adding a new compiler to bazel is very complicated, and by new compiler I include something as simple as asserting "all of my C++ code is c++17" at a project level is (when done the correct way, and not one of the many hacky prone to failure ways) a 100+ line toolchain definition (because C++17 should be treated as an entirely new compiler from the perspective of bazel; treating it as a command line argument gets you in a bad place where you hit weird errors).
To draw an analogy, it's like saying that C++ code like:
some_matrix[i, j];
is great syntax and cross platform! When in reality it involves (in user space code) 3 complex classes (at a minimum!), overloading the comma operator(!), implicitly casting from builtin types, and probably hundreds of lines of template code (if you want any sort of extensibility). True it is cross platform and great syntax. But it obfuscates the amount of code and understanding required to do anything but what the most basic syntax allows, for example extending the system in any way.
Simple things like this are simple in all C/C++ build systems, the question is how the complex things are handled (multi-platform support, detecting and selecting different versions of the same compiler, cross-compiling, IDE support, multi-stage builds where source files are generated in the build which need to be inputs to later build stages, custom build steps, or bootstrapping tools which need to be used as compilers or code-generators for later build stages, etc etc etc...)
OP was suggesting that systems like Bazel are harder to get started with but pay off with larger, more complex builds. However, I am claiming that Bazel is also good for the simple cases.
The issue is that declarative build systems ALWAYS need an escape hatch for the exceptions--and the exceptions grow over time.
We've been here before. It was called Make. And then it was called Ant. And then Maven. And Google still built their own system. And they will build another again.
Nobody ever learns the lesson that a build system is a program--period. Sorry to burst your bubble: there will be circular dependencies--you will have to deal with it. There will be two different C compilers on two different architectures. You will have to deal with it. Libraries will be in the wrong place--you will have to deal with it. These libraries come from the system but these libraries come from an artifact--you will have to deal with it.
The only way to deal with it is to have a genuine programming language underneath.
> The only way to deal with it is to have a genuine programming language underneath.
I disagree. Having a full programming language is great for the flexibility, but it causes lots of issues at scale. Restrictions in Bazel are critical for having a good performance on a large codebase, and for keeping the codebase maintainable.
With restrictions, we can provide strong guarantees (e.g. determinism, hermeticity), get better tooling, we can query the graph (without running the tools), we can track all accesses to files, etc. We can also make large-scale changes across the codebase.
Note also that Bazel language is not truly declarative, but it encourages a separation between the declarations (BUILD files) and the logic (bzl files).
> Restrictions in Bazel are critical for having a good performance on a large codebase, and for keeping the codebase maintainable.
Note the vocabulary--"restrictions". Your build system isn't solving a technical problem--it's solving a political one and trying to cloak it in technical terms.
We already have a problem. Your build system is now IN THE WAY if I'm not at scale. Any build system that makes people take notice of it is an a priori failure.
Thanks for posting this though. I've had a nagging irritation with so many of these "scalable" things, and this is the first time it has really coalesced that "scale" is almost always intertwined with "political control".
You say that like bazel hasn't already been proven out on a few billion LoC :). You're quite right that exceptions will exist - your job is to either fix them or squeeze them into the build system. Both are well trod paths. You're quite right that these edge cases aren't _easy_ - but in what system would they be?
> there will be circular dependencies--you will have to deal with it.
Add them both to the same compilation unit (cc_library, whatever). Or extract an ABI (e.g. Turbine) and compile against that.
> There will be two different C compilers on two different architectures. You will have to deal with it.
Poke around https://github.com/tensorflow/tensorflow/tree/master/third_p... . I see arm and x86, Windows and many variants of Linux, multiple versions of GCC, multiple versions of clang, multiple versions of cuda, python 2 and 3, and names I don't even recognize.
> Libraries will be in the wrong place--you will have to deal with it.
Just write the path to the file, or write a build rule that makes it visible in a standard location, or make it part of the platform definition, or use something like https://docs.bazel.build/versions/master/be/workspace.html#b... to alias it into your workspace. (Not recommended, but supported.)
> Libraries will be in the wrong place--you will have to deal with it.
I won't pretend bazel can express _everything_, but there's little you can't hack in _somehow_ with sufficient motivation, and moving towards the bazel-recommended patterns brings growing peace-of-mind (and faster builds).
(Disclaimer: Googler, work closely with the bazel team. I like bazel).
I love this example, but more because it shows just how obtuse make is even for simple programs. There's one self-explanatory line in this whole program (`SRCS = foo.cpp bar.cpp baz.cpp`), and everything else is magic you'd just have to cargo-cult forward.
I _think_ that it's also broken, if I copy-paste it: your comment renders with spaces in the `clean :` stanza, but I believe make requires that to be a tab character?
While certainly simplistic, the bazel example shows one obscure feature (glob) that's both still fairly obvious, and unnecessary for a direct comparison to your example. The rest reads clean, and could be replicated/modified by someone with no clue fairly straightforwardly.
Don't get me wrong, bazel BUILD files will often hide a whole lot of magic. But the benefit is, for a newcomer the files they have to interact with on a day-to-day basis are friendly. Take tensorflow, one of the most complicated bazel repos I know - browse to a random leaf BUILD file and you'll probably find it quite readable. (I randomly clicked to https://github.com/tensorflow/tensorflow/blob/master/tensorf... , seems pretty clear - especially when I consider it's compiling kernels for GPUs, which I know nothing about.)
(Disclaimer: Googler, work closely with the bazel team. I like bazel.)
The `SRCS = foo.cpp bar.cpp baz.cpp` was my fav part. I like to put every file used into a Makefile instead of using globs, but my example also showed how without editing the file someone can build with globs instead ;) Similar approach if you need the program to be named winApp.exe instead... That's another 'beauty' of make, you don't have to edit the Makefile.
There's really not much cargo cult in my example. Everything would be in the most rudimentary doc or tutorial, certainly less reading than if you needed to use git for the first time.
And yes there is supposed to be a tab there, some copy-paste thing munged that.
What does posix compliant* even mean for make? The posix standard doesn't even require support for c++! That's a VERY portable Makefile. It would work on any make from the late seventies (assuming you also had a c++ compiler and system Makefiles that knew how to use it). It only uses a small subset of the rules in the standard.
Are you kidding? I have no clue what half of those lines are doing. And only any idea about the other half because I've come across make before. I also highly doubt that this will run on windows, whereas the Bazel build probably will.
Although I'd strongly suggest not doing that bazel works better if you have 1 library per header (or source file for non c langs). It helps to have tools to manage deps for your though.
You could have a cc_binary for your main foo_main.cc file that depends on cc_library targets separately specified for each bar_lib.h/bar_lib.cc.
That makes the graph more granular, e.g. if you just update the foo_main.cc you don't need to recompile the libraries. Or you can reuse bar_lib in a different binary.
> if you just update the foo_main.cc you don't need to recompile the libraries.
Are you sure this is required? I just tried with a test repo and Bazel only performed two actions after I updated foo_main.cpp (compile foo_main.cpp and link app.)
I just tested this and I think you are incorrect. Bazel will appear as if it's rebuilding all of them (it lists a dozen files), but it's really just checking the caches for them. Try running `bazel build -s` and watch what commands it actually runs.
Note that adding and removing files (including through a glob) does always cause a rebuild though (because it's a change of the rule). This is a deficiency of bazel.
That's not correct, or at least it's not always correct. C may have some special support, but other languages don't.
The other place this is obvious is with tests. If you have a unit test per source, a change to any source will return all tests, splitting reduces this.
Then that is a problem with those language's implementations. Bazel allows for it, it is on those language implementations to provide support. Notably I am not aware of many languages that easily allow access to their dependency graphs (especially without third party tools anyway). To me this seems more of an issue with the languages in question than with Bazel.
But again, bazel doesn't want you to do that. The more information bazel has, the more it can do for you. If you stick to 1-lib-per-source-file, dead code elimination, across your entire codebase, however large it is, can be done by a single reverse dependency query across every language, even those that don't have compilers.
In other words, correctly using bazel gives you access to all the cool magic linkers do without having to do the work to implement it. And it adds value even for the ones that do (like C++, again: you'll run fewer tests).
I mean sure, I'm not necessarily disagreeing that many people could do with fewer source files per library. But I also think you are pathologically misusing the tool along the lines of creating an NPM library for the left-pad function.
Bazel has other tools for mitigating excessive test running (like test size/time descriptions and parallel test running) running too many tests has never been a problem I have encountered with even my dozen source file bazel libraries. Bazel also has smart test runners that can split up the tests in a test binary and run it in parallel, and I don't have to write a dozen lines for every C++ library.
> I mean sure, I'm not necessarily disagreeing that many people could do with fewer source files per library. But I also think you are pathologically misusing the tool along the lines of creating an NPM library for the left-pad function.
I'm literally quoting Google's best practices for using bazel/blaze.
> Bazel has other tools for mitigating excessive test running (like test size/time descriptions and parallel test running) running too many tests has never been a problem I have encountered with even my dozen source file bazel libraries. Bazel also has smart test runners that can split up the tests in a test binary and run it in parallel, and I don't have to write a dozen lines for every C++ library.
Right, but here's the key thing: You're still running the test. My way, you just don't, and you lose nothing. You use less CPU, and less time.
I mean the best practices on the bazel website include:
> To use fine-grained dependencies to allow parallelism and incrementality.
> To keep dependencies well-encapsulated.
Like if I need 5 source files in a library to keep it well encapsulated, I'm doing that instead of making 5 libraries that are a rat's nest of inter-dependencies. And like the headers and repeating all the deps and specific command line arguments and so on would be unreadable.
Then you aren't keeping your dependencies fine grained.
> Like if I need 5 source files in a library to keep it well encapsulated, I'm doing that instead of making 5 libraries that are a rat's nest of inter-dependencies.
If you can't form your dependency tree into a DAG, you have larger design issues. This is yet another thing that bazel does a good job of uncovering. Libraries with cyclic dependencies aren't well encapsulated, and you should refactor to remove that.
I recognize that at a small scale this doesn't matter. But to be frank, there are parts of my job that I literally would be unable to accomplish if my coworkers did what you suggest.
Sure I'll just go ahead and stick my fly weight class and it's provider class in two different libraries. And have one of them expose their shared internal header that is not meant for public usage. That's not going to cause any issues at all (sarcasm).
One source file per library and one source file per object is an example of two policies that will conflict here (and I'm not trading code organization for build organization when I can just as easily use the features intended for this situation).
Meanwhile in the real world limiting the accessibility of header files not meant for public use prevents people from depending on things they shouldn't. Organizing libraries on abstraction boundaries regardless of code organization allows for more flexible organization of code (e.g. for readability and documentation). And so on.
This is why these feature exists and why Google projects like both Tensorflow and Skia don't follow the practices you are espousing here pathologically.
> But to be frank, there are parts of my job that I literally would be unable to accomplish if my coworkers did what you suggest.
Then you are incompetent and bad at your job. To be blunt I would recommend firing an engineer who pathologically misused a build tool in ways that encourage hard to read code and difficult to document code, while also making the build bloated and more complicated, all in the name of barely existent (and not at all relevant) performance improvements. And then said they couldn't do their job unless everyone conformed to that myopic build pattern.
It's like what, an extra 4 characters to get the list of code files in a library from a bazel query? What in the world could you possibly be doing that having to iterate over five files rather than one makes your job impossible.
Bazel supports visibility declarations your private internal provider can be marked package, or even library-private.
> Tensorflow
Tensorflow is a perenial special case at Google. It's a great tool, but it's consistent disregard for internal development practices is costly. A cost I've had to pay personally before.
> Then you are incompetent and bad at your job
No, I just don't have the time or interest in reimplement language analysis tools when I don't need to.
But it doesn't actually have that. It has that for 1 very specific usecase, sometimes. Its very much not generic or intentionally built into bazel, and I am almost certain that there are more complex cases where that caching will break down (for example when the various deps import each other, or as mentioned: tests). Especially when its easier to have tooling automatically manage build files for you, so you don't even have to do it by hand!
It does, this is testable a number of ways. The funniest one is to do a `#include "foo.bar"` (and make sure the file exists and is in `srcs`!) and watch bazel freak out with "I have no clue what kind of file that is, how do I analyze that!"
Something to be aware of though is that Bazel does do some stupid things around this space. Like adding and removing files causes the build rule to change (if using a glob), which will proc a full rebuild of the library.
> Like the article notes, you can reuse intermediate build artifacts between developers (because it's not possible for those artifacts to accidentally depend on developer-specific state)
Note that Bazel builds still depend on the ambient system environment, like the compiler toolchain and system libraries, and it's possible to poison a shared build cache.
Shared caches are typically used with remote execution and CI servers, where the build environment is fully controlled.
Bazel does not really hash any of the system stuff -- like system headers, and system-provided .a and .so files (unless they are explicitly declared in WORKSPACE, but this is optional, and I have never seen this done for libc stuff fot example).
So all you need to do is to find one person who likes to mess with their system (for example, by installing latest versions of the libraries from random PPA's), and you'll end up with poisoned cache for everyone.
On any large team your compiler should be checked into revision control or have a way to fetch a specific artifact & have that pointer checked in. That way you're guaranteeing everyone is using the right thing.
Are you talking in general, or bazel specifically?
Because in general yes, I agree, it would be nice. Probably not in source control though -- putting in compilers + libraries will make checkouts very slow, git is not really designed for tens of gigabytes of stuff.
For bazel specifically, this is not really well supported. The default rules and all examples just trust system-installed compilers.
There is ongoing work on building in docker (with all the corresponding security issues), and there are rumors that Google internal version has some sort of magic, but this is not a "mainstream" configuration.
I am glad there is work going towards bazel-ifying the toolchain! And maybe now that 1.0 is out, we will see more work like this.
However, I think my point still stands -- this work looks pretty experimental. And most tutorials, including bazel's own [0], recommend using default configuration, which uses system compiler. And most of the examples I have seen also rely on system compiler.
For the record, in the beginning Bazel does not support using auto-configured system compiler at all. The only thing it supports is using a controlled build environment and hardcode all paths (including compiler paths), or using a checked-in toolchain. But at a later stage a dumb autoconf-y "detect system toolchain and try to use it" thing was added and it became default.
This is because configuring a checked-in toolchain in Bazel needs tremendous effort and very specific knowledge about something 99.9% of developers does not know and DOES NOT WANT to know or spend their time on.
Hermetic build systems make a lot of sense for companies, where you often want to control your whole stack and might even use a monorepo with all external dependencies vendored.
However, there are at least two cases where you do not want real hermetic builds:
- in the case of open source software that needs to be packaged for distributions, vendoring all the dependencies is bad. As the name already says, distributions remix software components and thus might want to compile your software against a different version of a library than the particular version you pick.
- for libraries, consumers use your library just as a component and might want to combine it with a specific version of other libraries, so vendoring also sucks.
Unfortunately, many declarative build systems make it hard to separate the "building a component" part from the "composing a product" part. I could find no way in Bazel to depend on system software (using for example pkgconfig).
In my opinion, build systems for open source software (as I mentioned, for companies Bazel makes sense since you often want to vendor all your deps anyway) need to have three aspects:
- build rules for just for this component
- dependencies on other components/libraries
- a method to lock down components/libraries to a specific version, creating an offical "product" with an official supported combination of all dependencies.
This still makes it possible for distributions to create their own "product", without having to rewrite most parts of the build system. At the same time, developers can work with the official library versions (for compat, maybe you should even have multiple "sets" of dependencies), so getting started is easy and you don't need to install lots of dependencies first before you can hack on the project.
Often, build systems do things that are nice for onboarding (like "build systems" that pull git repos of dependencies during build, in order to have a single command to build the product from scratch after checkout). But those things directly conflict with the needs of distributions. We need to separate the "instructions to build, supplying all deps externally explictly" (in the context of building a package for a distribution) from the "just build it from the command line on a dev machine" (here, dependencies should be implictly fetched automatically, to give a nice user experience).
You could probably write a rule similar to go_repository[1] that would invoke pkgconfig to find the location of local libs instead of downloading a lib.
rules_foreign_cc appears to be something different: building projects with other build systems as part of a bazel build. The README does not talk about using prebuilt system libraries at all.
Additionally, adding lots of additional customization on top of the build system itself defeats the purpose of using a popular system in the first place.
The problem with fully declarative systems is that, if you need something that is beyond the scope of the declarative semantics, you are screwed.
Build systems have the annoying property that they are ultimately highly declarative, but they need some pretty extreme scope to handle edge cases. Any C/C++ project is going to boil down to "compile these N source files, and link them into a library/executable", but the necessary build flags is going to come from several locations, there are multiple variations of build flags and compilers that need to be used at the same time, and there is going to be project-custom code to generate source files that involves potentially building the tools to generate the files as part of the build system. And mixed-language projects add more layers of pain.
bazel in particular is pretty nice with that. There are multiple extension points -- there is embedded Python-like language, you can wrap arbitrary commands, and if this is not enough, you can dynamically general build files as well.
> The typical solution is to become more conservative in how you build things such that you can be sure that Y is always built before X…but typically by making the dependency implicit by, say, ordering the build commands Just So, and not by actually making the dependency explicit to make itself. Maybe specifying the explicit dependency is rather difficult, or maybe somebody just wants to make things work. After several rounds of these kind of fixes, you wind up with Makefiles that are (probably) correct, but probably not as fast as it could be, because you’ve likely serialized build steps that could have been executed in parallel. And untangling such systems to the point that you can properly parallelize things and that you don’t regress correctness can be…challenging.
That might make the installed size large, but if it's isolated to Bazel, managed by Bazel, updated by Bazel, etc, then why does it matter whether it's a JVM, a Python binary, a JS VM, or anything else?
E.g. Debian put a lot of work into making a more trustworthy build chain by introducing reproducible builds.
There's no Bazel on Debian. And not on many other distros. Because getting Bazel to be built itself is a complexity nightmare.
"Here, download this big blob where you have no idea how it's been built" doesn't exactly help.
I don't think bundling in components inherently changes the reproducible builds equation.
In fact I think that bundling in a JVM that they know works in a particular way will likely improve the reproducibility of Bazel _setups_ and therefore builds that come out of Bazel, which is their aim.
"every time the NSA's compiler binary builds my code I get the same output" isn't really all that comforting. If you can't build the compiler (and/or build tool) how can you be sure that there isn't anything malicious in it? Trusting trust all over again
Could the Bazel folks better define the composition of their blob? That way they can distribute their blob, but someone could substitute for another instance of the JVM. I'm assuming ownership of this lies with Bazel and this is more of a political/motivation problem than a technical one. Or, is that incorrect?
Next thing we know, every program is gonna ship with either a slimmed-down (!) 20MB JRE (if you use modules, otherwise it's gonna be worse), 130MB Electron runtime (this makes bundling the JRE seem like a good option), or what have you..
Sure I'll bite. There are cases where this matters, absolutely, it would also matter if the difference was between 10MB and 10GB, but we're talking about a pretty small relative difference.
In return, you get the reliability of not having to worry about how your OS/configuration/other packages may interact with the software. In my opinion that's a huge productivity win, and also fits nicely with the fact that build systems should be thoroughly reproducible and do the same thing everywhere they run.
There are some good, niche, reasons for not bundling things like a JVM:
- If you're a strong free software advocate and need total control over the freedom in your dependencies. Debian do a lot of dependency unbundling for this reason. They're good at it, but it's not something most people care about or want to deal with the fallout of.
- Memory usage optimisation by sharing libraries in memory. This has a nice benefit for native code, but unlikely to do a lot for Java processes from what I understand. This likely _could_ have a significant benefit for the Electron ecosystem given that many people now run 2-3 Electron apps at a time.
"Small relative difference" - Most programs, probably including Bazel, have <1MB worth of code. Some programs have additional media files, but not all of them. And even in 2019, small programs do make a difference in convenience. Sometimes when you download things, sometimes when you copy files around, sometimes when you search the files on your computer...
A factor of 20, or 130, makes a huge difference if you consider all programs installed on your computer.
Currently Bazel has 48MB of source code in the `src` directory, plus another 125MB of integrations. Some of this will be tests, some will be build config, but that's probably in the ballpark of 100MB of source code. Adding somewhere between 20-100% for reliable running of that in ~all circumstances feels like a pretty solid engineering decision to me?
I haven't looked at the project. In my experience, 100K lines of C source code roughly corresponds to 1MB of native code. That might not be representative, though. But maybe it's just that that the rest of this project has some more of these "20-100% expansion will be ok" decisions.
I just took these sizes from the master branch when I wrote the above comment. While that may have been true at one point, or is for C code, I suspect that is no longer the case in general. Between data files, resources, modern compilers optimising for speed over size, and more, I'd expect this to be significantly higher now.
I still remember when dynamic linking was being introduced into OSes, Amiga libraries, Windows 3.x, the very first Linux kernel to support dynamic linking,....
Because the large majority of OS only allowed for static linking.
When dynamic linking was finally possible, it was welcomed with pleasure, because not only it allowed to save disk and memory space, it also opened the door for extensibility via plugins, and dynamic behavior without restarts, instead slow IPC with one process per plugin/action.
Now thanks to missteps of glibc design and .so hell on Linux (even worse than on Windows due to all distributions), many devs embrace static linking as some golden solution.
You can choose to do a self-contained deployment, which bundles the runtime with your app, or a framework-dependent deployment, which requires a pre-existing shared install. Pick your poison.
Does it also ship java binary?
Is the java binary statical-linked?
If not, how can we build on platform with not-mainstream libc (read Linux with musl libc like Alpine, Void)?
sometimes it is, in Enterprise it's very common to have invasive proxies and just yesterday I was fighting Bazel into using different than the built-in cert store. (it tries to fetch some of its components from github, and that fails if you don't use the proper certs)
But it doesn't have to be right for you, it only has to be right for Bazel. You wouldn't use it for anything else except Bazel, so is it a problem if it's out of date or lacking features? I think that's only a problem for the Bazel developers.
It is a problem if it has a security hole. Or if you have reason not to trust the Bazel developers. (Or trust people who might have illegitimate access to said developers' machines - for example the NSA.)
Probably better than Nodejs Gulp Yo man ugabuga npm here's 3.4 million files for you to download or Python is it 2.7 or 3.1? Pip too and Ansible let's make that terminal a disco night of text! or some Go "Works great for my static file blog" build tool.
You aren't wrong about that. I've lost countless hours to people doing some random thing to get a newer Python version (even if they didn't even need it), only to completely hose both the new one and their system one. And the npm situation is really messy for many people who are new to it.
At some point someone will just bundle all the software into a single docker container, and then we are finally done with the entire installation quagmire.
It doesn't matter whether I trust it or not: these binaries are typically linked against a different C library, assume a different filesystem layout and dynamic linker location, and are often only available for x86-64 in any case.
I build binaries locally to get something that's marginally more likely to work on my system, rather than because I believe I could somehow catch nasties buried in the source code even if I read every line. Too many years of seeing the ingenuity of IOCCC entries has taught me I won't succeed at that. :)
What's wrong with the Java dependency? There's even a single-binary Bazel executable[1] that brings its own embedded OpenJDK. My company is using Bazel and the Java dependency has never been an issue.
Even if you write custom rules, you use the Python-like Skylark language and never interact with Java code.
No, that's generated also, but is highly reusable whereas some people get along completely fine without Java (unless you make them pull it in for Bazel for Firefox).
Well, no. Reproducible builds are a security concern and a source base distro is highly optimizable and configurable. Just because you do not see the purpose does not mean one is absent.
Java's build artifacts are too painless. GNU/Linux distributions tend to have a policy of "we build things from source, we don't simply re-publish the authors' build artifacts." Because Java's build artifacts are so painless, source artifacts and build processes are neglected. I'm convinced that there are many "foundational" Java libraries that have never been successfully compiled on anything but the original author's computer, right before they uploaded the .jar to Maven Central.
With a Java package, the final distro package will mostly be a single .jar file under /usr/share/java/, but getting to that .jar file from source is often a nightmare.
With a Python package, the final distro package will mostly be a sprawling directory under /usr/lib/python/site-packages/, but getting that directory from source is very rarely anything more than `./setup.py install --optimize=1`.
In my experience, large, complicated Python packages with C dependencies can be a huge nightmare just like large Java applications, and simple cases are simple in either case. With many dependencies, you need to unbundle every single one and package it separately, in either case.
This is an irrelevant statement. You need the C compiler anyway. Your Java or Python is built with C. You add the complexity of your other environment on top of it.
The Arduino IDE calls out to avr-gcc, and mbed-os is written in c++ so requires an embedded c++ compiler like arm-gcc or something similar. ChromeOS also runs the Linux kernel so you do need it to compile the OS, and it can hypothetically run binaries compiled for the linux platform. Dig deep enough through all the layers of bootstrapping and you'll find c basically everywhere.
"Works" can mean very different things on Windows. On one extreme you can have proper integration that covers Windows' idiosyncrasies such as automatically finding and setting up the Visual Studio environment instead of making the user jump through hoops like starting the dev command prompt. And on the other extreme you can have something hacky like "requires Cygwin" which I personally wouldn't even call "works".
Bazel is excellent on Linux and macOS, but it's pretty bad on Windows. I'm not sure what (if anything) they're going to do about that. Maybe they'll fix it on Windows though. That'd be pretty cool.
Microsoft internally uses something pretty similar to Bazel. I'm not familiar enough with the two to fully understand motivations for the divergence. It supposedly initially ran poorly on Mac, just due to differences in what is cheap on different platforms. I wonder what kind of inherent performance differences you would find in something "Windows first" vs "Linux/MacOS first"
https://github.com/microsoft/BuildXL
The differences between bazel and BuildXL are less about platform and more about philosophy. Bazel is about opinionated builds - declare everything, and you get fast correct builds. Hard to adopt, but super powerful. BuildXL is more about meeting codebases where they are - take an existing msbuild|cmake|dscript build with its under-specified dependencies, and figure out how to parallelize it safely. This delivers quick wins, but comes with real costs as well (notably: no caching; labour-intensive magic to guess an initial seed of deps, with a performance cliff when something changes).
(Disclaimer: Googler, work closely with the bazel team. I like bazel; I like BuildXL for expanding the space of build ideas, but have long-term concerns it settles in a local maxima that'll be hard to get out of).
I think there's nothing insurmountable in getting Bazel to work well on Windows. It's just a chicken and egg problem: few people use it there, so it's not up to par, so few people use it there because it's not up to par.
I wonder if they considered build2[1]? It seems to tick most of their boxes. In particular, it has first-class Windows support and, in the upcoming release, even native Clang-targeting-MSVC support (not the clang-cl wrapper)[2]. While some of the features they are looking for (like faster build times which I assume means distributed compilation and caching) are not there yet, we are working on them (and would welcome help)[3].
I think part of the reason is battle-testing. Blaze (the Google internal version of Bazel) has been used by 20k people per day for well over five years to continually build a single mammoth shared C++, Java, Python, Shell (plus others) code base complete with a globally distributed cache system and thousands if not tens of thousands of build machines.
One of the important parts in the post seems to be that it is not just a C++ build system, there are many other tasks executed through the build system. In that sense, it seems like build2 isn't a great fit.
On the contrary, build2 is a language-agnostic, general-purpose build system. There is the bash module[1], for example. And now that we have support for dynamically buildable/loadable build system modules, we can develop support for more languages, code generators, etc., without bundling it with the build system core. I myself am itching to take a stab at Rust. Hopefully will find time after the release.
Yes, we need to change that. Our initial focus was on taking really good care of C/C++ compilation but now we are at the stage (i.e., we have support for external build system modules) where this is starting to change.
It's a very general tool. IIRC from playing with it, it doesn't give a fig about whether it's running a compiler at all. I think I was using it for a toy compiler+nasm, actually...
That would be pretty easy to fix, TBH. Tup's author works for Mozilla, and Mozilla has the expertise to add that functionality. As always, it's a matter of will and resourcing.
Tup runs what is effectively a bash build script, but does so under ptrace (or equivalent) so it knows about every file that has been checked (and IIRC even those that were missing). So you basically write a "from scratch" build script and that's it.
To rebuild, it will rerun the script, except that it knows which file it reads (and which mustn't exist), so if nothing is changed, it can skip that stage. There is no inherent difference between the compiler binary (say, /usr/bin/gcc), the system include (/usr/include/stdio.h), and the project source.
It makes for a super simple build script, which, after one successful build, gives tup enough information about dependencies and parallelization opportunities, while at the same time also tracks toolchains better than you can manually (does /usr/bin/gcc shell to /usr/bin/x86-gcc or /usr/bin/amd64-gcc ? did one of them change but not the other? was it the one we need? tup knows)
Unfortunately, it is nontrivial; it can uses ptrace or fuse to trace file access on Linux, IIRC has some other hack on Windows - and little to no support for other systems. Also, the overhead is in the single digit % last time I checked.
I'm trying to wrap my head around whether Bazel is worth looking into for the kind of work loads I typically work in. There doesn't seem like that much benefit if you have single language code base and especially if you already have a single build.
I suppose I see the benefit of this setup if you keep many libraries and want to pin to the latest instead of, an explicit lib version. If you're in a multi-repo system you're probably not doing this but I can see how Bazel could actually make this possible.
Can someone shed some light on the CI story? Can Bazel do code deployment as well? Is that a smart thing to do with Bazel?
Bazel is worth looking into as soon as you have dependency graphs that your language's native build system can't deal with efficiently, or you have multiple languages that have dependencies on each other's artifacts.
A pure Go codebase? Not worth it.
Personally, I use Bazel even for small projects as soon as they involve things like generated code or gRPC.
Yes, there are multiple rulesets for deployments, like rules_k8s[1] and rules_docker[2]. Of course, you can easily build your own custom deployment pipeline.
Other build systems (CMake, Meson, etc) should be able to gain the sandboxing feature. The remote execution feels more fundamental to me and not so easy to replicate.
But... if you climb that cliff, there are many rewards at the top. Like the article notes, you can reuse intermediate build artifacts between developers (because it's not possible for those artifacts to accidentally depend on developer-specific state), you can automatically parallelize (because the build system can tell declaratively which tasks do no depend on each other) and you can ask it interesting queries about the dependency graph. That sounds an awful lot like the runtime safety, performance, and static analysis static types give you.
The challenge is that for small programs, dynamic types are just easier to get going with. And every build script starts pretty small. So you have a plethora of simple, hackable build systems like make that work fine at first but then get more and more painful as the program grows. With software, there is at least some customer-visible pain that can enable you to justify a rewrite in a statically-typed language. But build infrastructure is always a hidden cost so it's really hard to get the resources to migrate to a different build language.
Maybe the lesson here is to bite the bullet and start with a hermetic build system at first.