Hacker News new | past | comments | ask | show | jobs | submit login

This post got me thinking that it'd be interesting if instead of being given an explicit list of files to watch for changes to, the list of files were inferred with an LD_PRELOAD that listened for `open` / `stat` / etc. system calls that a process ran.

For example, in the example on the blog post, `git ls-files` will almost certainly ignore autogenerated build files, but it's possible for one of those files to change without the output of `git ls-files` changing. Similarly for things like third party packages that are installed system-wide.

With an LD_PRELOAD, all you'd need to do is

    my-watcher ruby foo.rb
and the watcher would figure out which other Ruby files were opened, be them git versioned Ruby files in the current folder, or Ruby gems / Ruby VM dependencies in $HOME/.rbenv/, or system packages in /usr/local, or config files in /etc, ...

I guess I wouldn't actually be surprised to hear if someone has built this already.




Some build systems work that way, tup is one I think. They use strace to intercept file I/O and figure out what has been updated, and thus can figure out an optimal way to re-build.


I think Tup uses FUSE rather than strace (which is why it doesn't track external dependencies, and requires relative paths for all internal files), but I might be wrong.


I built a dependency checker that worked similarly to this once.

By hooking the filesystem calls, you can make a list of all files that a given process touches. When that process finishes, serialize a dependency file containing the hashes, timestamps, and sizes of all those files. Next time you run that same command line, read the dependency file from the last run and compare to the current filesystem state. If it's the same, and you know your command is idempotent, you can skip execution entirely.

Now, if you put that logic in a dll, you can inject it into arbitrary third-party processes to which you don't have source code and it will still work. Name the dependency file after the hash of the command line.


The ClearCase SCM does something similar - it has the notion of "derived objects" and "audited builds".

You can run an audited build command under a special ClearCase wrapper and it will look at the versions of the elements used in the build - if that build has already occurred with the same input elements before, even if in a different view, it can "wink in" the derived object result - that is, it can cause that previous build to be visible in your current view. This can save a lot of time when you're building a large codebase.


A less intrusive approach could use kernel queues[0] on most Unix-like systems. This way, LD_PRELOAD would not need to be used and the process for responding to desired disk I/O is independent of the processes performing same.

0 - https://www.freebsd.org/cgi/man.cgi?query=kqueue&apropos=0&s...


fabricate.py (https://github.com/brushtechnology/fabricate), based off the now ancient memoize.py.

In my experience with C/++, it is faster to combine Make & ccache: just have every C file depend on every header file, and let ccache decide if it needs to be rebuilt.


I really disagree there, specially as the project gets complex enough. ccache will not speedup anything related to linking, for example. But it will increase mtime of all your object files as it writes them even if they come from cache. So at some point even archive file creation dominates your "incremental" build time.

I suggest you don't wait until it's too complex to fix the mistake and do proper dependency tracking since the very beginning. It's not that hard in C/C++.


My project is ~3.6 million lines of code with a 300 ms incremental null build and a 2-10s touch-a-header-file build. I only generate a few hundred exes, and most of the code lives in a single dylib.

I know that's not large, but I've got one Makefile that's only about 200 lines long. It's a pretty good trade-off.


Yes, it's not large at all, but I am already surprised of the claim that you can link 4 million LOCs of line in 300ms. For completeness, around 10x that MLOCs of C++ takes 2 minutes here with gold, on Xeon machines. Even writing the main exec is a good chunk of time (stripped, it already measures almost quarter of a gigabyte).


Most of the code is tuned C code — tuned in the sense of being fast to compile, with a nice C++ wrapper to make it nice to use. I'm seeing compilation in the 100kloc/s on an older laptop.


Cool idea! Minor tweak though I'd rather have the watcher run separately, but I think that would still be easy enough if you resolved file paths to the project root.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: