Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
SQLite is easy to compile (jvns.ca)
192 points by nikbackm on Oct 28, 2019 | hide | past | favorite | 54 comments


> But then I tried to run it on a build server I was using (Netlify), and I got this extremely strange error message: “File not found”. I straced it, and sure enough execve was returning the error code ENOENT, which means “File not found”. This was kind of maddening because the file was DEFINITELY there and it had the correct permissions and everything.

This is an infuriating property of the runtime linkers on both Linux and Windows: if you're trying to load file A, and dependency B of file A does not exist, you just get "file not found" with no indication of which file it was, and it's extremely hard to debug. At least on linux the "show dependencies" tool is built in.


There is also LD_DEBUG, which is quite helpful.


I ran into this last week with golang. I’m used to building statically-linked executables, and forgot to set `CGO_ENABLED=0` to force this when compiling. I was doing a multi-stage docker build from `golang:1.13` and copying the final executable to `alpine:3.10`, only to see `shell: main: file not found`. `ls` and just about everything else showed it there. It wasn’t until I rabbit holed into dynamically-linked go executables that it started to make any sense at all. A simple message of `main: linked '/lib64/...' not found` would have saved me hours.


ld.so on Linux does show you which library is missing.

In this case, it was ld.so itself that was missing, so it didn't have a chance to tell you what's wrong.


On Windows, you get a graphical dialog in those cases, but no CLI feedback...


vcdepends used to be part of the SDK as well.


> All the code is in one file (sqlite.c), and there are no weird dependencies! It’s amazing.

This is because the author of SQLite publishes it this way as a convenience to integrators; the actual day-to-day coding is not done in a single C file.

In fact, the README[1] calls out twelve "key files", and explicitly warns that SQLite "will not be the easiest library in the world to hack."

https://sqlite.org/src/doc/trunk/README.md


There's quite a bit of build management going on with sqlite. Apart from that script, Dr. Hipp wrote his own source code management system[1] to fit his need and if I'm not mistaken they don't write headers by hand[2].

[1]: https://www.fossil-scm.org/

[2]: https://www.hwaci.com/sw/mkhdr/


Fossil intrigues me every time I see it mentioned. I've never found a sue for it, but it's so neat that I want to find a use for it.


The main features could be emulated with a small wrapper around git, no?


A full web ui with issue tracker is a bit more than a small wrapper.


> the actual day-to-day coding is not done in a single C file.

That's true - but in order for it be able to be reduced to a single C file, a lot of thought had to go into the process.


You just don't depend on weird libraries that mess up your preprocessor state. Then you make sure that even your static (file local) variables have unique names. And then you concatenate all the header and source files.

It shouldn't be very hard, but most projects fail already at the dependency stage.


And also as a gift to optimizing compilers, from discussions I’ve had w him. The idea is that a single compilation unit (the “amalgamation”) is easy for the compiler to reason about.


Sure, but now we have -flto, somewhat negating that advantage?


Perhaps. Assumes all compilers support it, but otherwise, a testable case.


Except LTO is slow to compile and SCU is fast.


I'm not a C/C++ developer, but why can't more C/C++ projects be distributed as source amalgamations? What prevents this from happening?

As in, source to source amalgamation rather than a more complicated build system to generate a binary.


> why can't more C/C++ projects be distributed as source amalgamations

* Build time and memory increases.

* Debugger locations are less clear. MSVC debugging doesn't even work past 64KB lines.

* You still need to link it anyway when using as a library.

* Linker optimization has improved (especially clang).

* It's more difficult to modify.

SQLite has some decent reasons for it especially historically; I'm not sure those reasons are as strong for other projects.

For applications, distributing binaries is even more convenient than source code (assuming they exist for your platform).

And for libraries, "header-only" can be more of a convenience than "single-file".


Interesting, thank you!

Are there any ways of doing that sort of distribution header-only then, rather than single file?


For one, I think this would be cumbersome from a debugging perspective. All of the address-to-source-mappings in your DWARF section would point to things like "lib.c:237652" instead of "lib-module.c:347".

It also means that you're effectively linking all you dependencies statically. In some cases this can be a plus, in some cases a minus. I think having the choice is important though.

I imagine there are lots of other parts of the toolchain that would have to be updated to support this workflow. Seems like a lot of hassle for something that, though at times complicated, mostly works.


There's the #line directive that can keep the original file names and line numbers.


For larger projects there is a danger of blowing out the memory on older machines when you combine everything into one monstrous C/CPP file.

It's already impossible to compile Firefox on a machine with 1GB of RAM because some of the files are just too big. Even with swapping enabled the build process crapped out for me last time I tried it. While 1GB machines are pretty rare these days outside of SBCs, you could easily blow out a 4GB machine if you tried to compile the whole damn thing at once, and 4GB machines are still commonly found.


Is that much of a problem though? For SBCs like the Raspberry Pi you can do builds on the SBC itself. While this is neat, it is far from optimal because builds are very slow. It's much more typical for embedded systems to do you build on a host PC with a cross-compiler.


Sometimes getting that cross compiler environment set up properly, especially with all of the dependencies and build tools, is more hassle than just letting the build run overnight on the SBC itself. Especially when it has a configure script that wants to check on various properties of the target architecture as part of the build.


I'm not sure anyone thought that it was made that way, this is very explicitly for distribution and it works very well.


In a vacuum, reading that statement, that was the assumption I made. Certainly strikes me as a bit odd, but then, SQLite is a bit of an odd project. And sometimes odd projects do things differently in a way that works out well because they have the discipline to pull it off.

But it does make much more sense that it is worked on in pieces and then put back together.


Short Tcl script that combines everything into one file:

https://www.sqlite.org/src/artifact/5fed3d75069d8f66

Some other details / rationale:

https://www.sqlite.org/amalgamation.html

Combining all the code for SQLite into one big file makes SQLite easier to deploy — there is just one file to keep track of. And because all code is in a single translation unit, compilers can do better inter-procedure optimization resulting in machine code that is between 5% and 10% faster.


One of the benefits of this is Alon Zakai's wonderful compilation of the SQLite C code with Emscripten to generate sql.js, which I have found very useful for teaching SQL.

https://github.com/kripken/sql.js


Awesome! "Web SQL Database" rides again!

https://en.wikipedia.org/wiki/Web_SQL_Database


Yeah, so, this just blew my mind. I compiled it with an older version of gcc that I had on my Windows machine and it worked just as easily (without threading, or -ldl)

But there it is, a fully featured SQL engine in an EXE file that I can use with any application I want.

In a world that requires a million SDK's, DLL's, dependencies, etc, this is the most refreshing thing in the world.


Being easy to compile is a very nice thing to have about any software. Redis for example, is also too easy to compile, and quickly too. I think a lot of software using autotools is easy to compile (atleast on the POSIX compliant systems). Even postgresql is easy to compile, although not very quickly.


> run ./configure > realize i’m missing a dependency

If only there was like, a tool, to help you manage compile-time dependencies.

I wish there was a `yarn` equivalent for C projects.


This problem is 1000% harder than you think it is, and everyone has a different idea on how to solve it.


Sure, but like... `package.json` as a source of truth for dependencies (package names + semver criteria to match) followed by abstracting fetch + build away through one command `yarn install`

Those two concepts will always remain the same despite the underlying workings, no?


That's what distributions do. For example in Debian, the lists of build and run-time dependencies are in ./debian/control


In some ways the entire concept of "Linux distribution" developed to address this. Certainly if you stay within a distro ecosystem it will have tools to do this for you in both Debian and Redhat.


But doesn't this make compiling program on systems without these tools hard? For example, it is not easy to compile those programs on Windows without WSL or Cygwin.


Well, it doesn't make it easier. I suppose you could try building a set of RPM packages for the Windows platform, which is basically what Cygwin is.

If you're saying "wouldn't it be great if C the language came with an integrated package manager that worked the same on all platforms and was available everywhere", then yes, that would be great, and I too would like a pony. I just think the ecosystem is way too fragmented to ever achieve that. Doubly so if you have to start thinking about cross-compilers and the long tail of little embedded targets.


Working with other languages was how I realized how brilliant the combo of Maven and Java was ;-)

(PS: today Yarn and Nuget does some of this for the Node/JS/etc and .Net ecosystem but back when Maven arrived those weren't even planned.)


I wish we just collectively stuck to a single language agnostic build systems like buck, pants, or bazel and distributed everything with BUILD files.


I find pkgconfig + dnf install pkgconfig(foo) useful


Haha, I actually feel exactly the opposite! I wish JS, Ruby, Python, etc. deferred to OS-specific package management. Mandatory xkcd reference, etc.


Another good example is the Go compiler. Assuming you have any version of Go in your PATH, it's:

    git clone https://github.com/golang/go
    cd go/src
    ./make.bash     # or make.bat for Windows
And that's it. You can then use "bin/go" to compile your projects.


It's a whole lot more complex to run a build script like this than a simple call to a C compiler, and has many more dependencies: you even need bash installed, and it looks like the script would fail if the kernel isn't compiled with selinux support... The script itself isn't that compilcated, but complicated enough that I didn't want to spend too much time reading it.

Moreover, you cannot compare cloning a git repository with downloading a single c source file... This seems equivalent in complexity to the good old "git clone; ./configure; make"


You just reminded me of GOPATH :(


The amalgamation file is a really interesting idea -- is this common in the world of C applications? The documentation is quite clear, however, that the amalgamation file and the source files are not the same thing. The source code (1,848 files in 40 folders) can be pulled down here -- https://www.sqlite.org/cgi/src/doc/trunk/README.md -- but more assembly will be required if you're planning to build the project.

UPDATE: maybe not so much assembly is required... just running "make" built the project without any drama (I'm on macOS with XCode and tooling for Xamarin already installed - YMMV in terms whether you might need to install something to compile from source).


It's not common, but there are other projects like this for easy integration, especially on embedded systems. You see some C++ libraries that are a single header file include. And here is a collection of single file headers for graphics tasks, like loading images, resizing images, rendering font glyphs, etc:

https://github.com/nothings/stb


Every time an article about SQLite hits the front page here, I find myself stunned at how well-designed it is.


I haven't checked yet but was curious if it post processes the file into a single source file or that's how it's developed. The former sounds useful for build (as her blog suggests) whereas the latter sounds frightening to audit. (Unless it's written literate?)


I believe they use TCL scripts to generate the amalgamated .c file. Oddly enough there is a post about TCL on the front page as I type this.


If there is a way to run the preprocessor without any recursion it seems like that would be an easy way to do it without an extra tool.


It's far more easier. I had some sqlite2 database on an old system but nothing to convert it or even dump it. Had to download sqlite2 source and delete all tcl.c files. Then ran

    gcc *.c -o sqlite2
and it was done. That much simple.


I love a happy ending!

(Especially when many of my compiling-from-source experiences resemble what the author was anticipating)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: