This post got me thinking that it'd be interesting if instead of being given an explicit list of files to watch for changes to, the list of files were inferred with an LD_PRELOAD that listened for `open` / `stat` / etc. system calls that a process ran.
For example, in the example on the blog post, `git ls-files` will almost certainly ignore autogenerated build files, but it's possible for one of those files to change without the output of `git ls-files` changing. Similarly for things like third party packages that are installed system-wide.
With an LD_PRELOAD, all you'd need to do is
my-watcher ruby foo.rb
and the watcher would figure out which other Ruby files were opened, be them git versioned Ruby files in the current folder, or Ruby gems / Ruby VM dependencies in $HOME/.rbenv/, or system packages in /usr/local, or config files in /etc, ...
I guess I wouldn't actually be surprised to hear if someone has built this already.
Some build systems work that way, tup is one I think. They use strace to intercept file I/O and figure out what has been updated, and thus can figure out an optimal way to re-build.
I think Tup uses FUSE rather than strace (which is why it doesn't track external dependencies, and requires relative paths for all internal files), but I might be wrong.
I built a dependency checker that worked similarly to this once.
By hooking the filesystem calls, you can make a list of all files that a given process touches. When that process finishes, serialize a dependency file containing the hashes, timestamps, and sizes of all those files. Next time you run that same command line, read the dependency file from the last run and compare to the current filesystem state. If it's the same, and you know your command is idempotent, you can skip execution entirely.
Now, if you put that logic in a dll, you can inject it into arbitrary third-party processes to which you don't have source code and it will still work. Name the dependency file after the hash of the command line.
The ClearCase SCM does something similar - it has the notion of "derived objects" and "audited builds".
You can run an audited build command under a special ClearCase wrapper and it will look at the versions of the elements used in the build - if that build has already occurred with the same input elements before, even if in a different view, it can "wink in" the derived object result - that is, it can cause that previous build to be visible in your current view. This can save a lot of time when you're building a large codebase.
A less intrusive approach could use kernel queues[0] on most Unix-like systems. This way, LD_PRELOAD would not need to be used and the process for responding to desired disk I/O is independent of the processes performing same.
In my experience with C/++, it is faster to combine Make & ccache: just have every C file depend on every header file, and let ccache decide if it needs to be rebuilt.
I really disagree there, specially as the project gets complex enough. ccache will not speedup anything related to linking, for example. But it will increase mtime of all your object files as it writes them even if they come from cache. So at some point even archive file creation dominates your "incremental" build time.
I suggest you don't wait until it's too complex to fix the mistake and do proper dependency tracking since the very beginning. It's not that hard in C/C++.
My project is ~3.6 million lines of code with a 300 ms incremental null build and a 2-10s touch-a-header-file build. I only generate a few hundred exes, and most of the code lives in a single dylib.
I know that's not large, but I've got one Makefile that's only about 200 lines long. It's a pretty good trade-off.
Yes, it's not large at all, but I am already surprised of the claim that you can link 4 million LOCs of line in 300ms. For completeness, around 10x that MLOCs of C++ takes 2 minutes here with gold, on Xeon machines. Even writing the main exec is a good chunk of time (stripped, it already measures almost quarter of a gigabyte).
Most of the code is tuned C code — tuned in the sense of being fast to compile, with a nice C++ wrapper to make it nice to use. I'm seeing compilation in the 100kloc/s on an older laptop.
Cool idea! Minor tweak though I'd rather have the watcher run separately, but I think that would still be easy enough if you resolved file paths to the project root.
Entr and many other similar alternatives do not satisfy my needs. The main fault is that they take list of files instead of root directory and possibly include/exclude patterns. So when I create or rename files, these tools either don't register new files, or they just fail on missing files.
I'm rather reusing existing stuff but seeing how every scriptable filesystem watcher misses the point, I'm inclined to write my own inotifywait/inotifywatch wrapper.
mkdir /tmp/project
cd /tmp/project
touch file{1,2}
ls | entr -d echo "a change"
Terminal 2:
rm file2
Terminal 1:
entr: cannot open 'file2': No child processes
Regarding `-d`:
> Track the directories of regular files provided as input and exit if a new file is added. This option also enables directories to be specified explicitly. Files with names beginning with ‘.’ are ignored.
So first, it doesn't track NEW directories, and second, it exits if new file is added. Exactly how is this useful?
EDIT: All I want from filesystem watcher is to track files by pattern and just re-run the command, however complex that may be to implement via OS interfaces. When I'm working on a Python project that has packages (directories), I may also refactor (rename files), and this simplistic behavior does not catch that.
> All I want from filesystem watcher is to track files by pattern and just re-run the command, however complex that may be to implement via OS interfaces.
I can attest to modd too, it uses a (per project) configuration file to define include/exclude (recursive) globs, along with respective blocks of commands.
I would also like to extol virtues of its sister project devd[0] from same author which makes web development palatable to me. It's a little websocker server that injects a tiny script into HTML to reload the page in browser once modd detects file change, rebuilds front-end and sends devd a SIGHUP.
> All I want from filesystem watcher is to track files by pattern and just re-run the command, however complex that may be to implement via OS interfaces.
I assume I'm misunderstanding something, so this is probably a naive question and maybe you can explain further, but if that is all you need why not just write it yourself? It sounds like an <1 hour project to write something cross-platform in Qt that monitors the filesystem recursively with QFileSystemWatcher https://doc.qt.io/qt-5/qfilesystemwatcher.html checks for the configured regexen and runs the configured command.
I have implemented stuff like this. It is not even remotely a <1 hour project. Not if you want it to work reasonably well and not be broken all the time. It’s a full workday for someone who already understands the problem. There are a number of cases you often want to handle:
- Multiple files may change in quick succession, for example, if you hit “save all” in an editor. You might want to delay triggers to see if more events arrive.
- You will sometimes see incomplete / corrupted versions of files, because you ran a command while another program was writing the file. The other program probably should atomically rewrite the file, but you’re stuck fixing the problem. The user doesn’t want to see errors from this and you generally want to rerun the command.
- You need to execute commands in the correct order.
- A command may change state from being queued&ready to having outdated inputs.
If it’s just one command, sure, you could probably do it in an hour. But if there’s more than one command (usually the case!) then it gets far more complicated.
Just looking at the last time I implemented something like this--it was watchman + about 500 lines of code figuring out which actions to run and when. And that’s not even with any parallel execution.
I've ran into this issue myself and it's not filesystem watchers missing the point, it's actually something not supported in a number of operating systems. Thus you then either have to resort to nasty user-space hacks (which will be comparatively slow and resource hungry) or accept that you can't watch nested directories.
Sure, there may races due to limitations of underlying OS interfaces, but no such races really matter when you edit with mammal speed, so it's perfectly sufficient, and so it is perfectly possible to make good file system watcher, without any user-space power-hungry hacks.
inotifywait is a user-space program that borrows it's name from inotify (because it uses inotify) and when using the -r (recursive) flag it sets up multiple inotify watches to work around the very problem I just described.
In fact it's man page comes with a big fat warning about using -r:
> Warning: If you use this option while watching the root directory of a large tree, it may take quite a while until all inotify watches are established, and events will not be received in this time. Also, since one inotify watch will be established per subdirectory, it is possible that the maximum amount of inotify watches per user will be reached. The default maximum is 8192; it can be increased by writing to /proc/sys/fs/inotify/max_user_watches.
> inotifywait is a user-space program that borrows it's name from inotify (because it uses inotify) and when using the -r (recursive) flag it sets up multiple inotify watches to work around the very problem I just described.
What problem?
What you quoted is totally irrelevant for my workflow. On my crappy laptop and all projects in my work directory, the watches are established almost instantly, without any spike in CPU.
There are 4677 directories in Linux kernel and you can just put a knob change in /etc/sysctl.d/ if your code base is bigger than kernel.
You're just inventing OS limitations to justify limitations of lazy user-space programming.
EDIT: Setting up watches on kernel tree including directories in .git:
~/src/linux % inotifywait -r -m -e close_write --format %f . |& ts
Jul 01 16:03:10 Setting up watches. Beware: since -r was given, this may take a while!
Jul 01 16:03:10 Watches established.
Your workflow might be ok, others might not. But blaming developers for being "lazy user-space [programmers]" when Linux lacks a feature that Windows and macOS support, and which could cause complications when worked around, is a really unproductive way to hold a discussion on HN.
You still didn't specify which feature Linux lacks that Windows and macOS supports. I have shown you that there's no need for any workaround, that the OS feature works as intended, but it seems you're keen on believing FUD scraped from web pages of lazy user-space programmers instead of considering my arguments and evidence.
FYI: What OS interface do you think that `entr` uses on Linux? That's right, inotify.
Not the original poster, but Windows allows you to get all changes in an NTFS volume, effectively allowing you to monitor notifications to an unlimited amount of files and directories (didn't remember what the feature was called, a quick googling led me to https://docs.microsoft.com/en-us/windows/win32/fileio/change... ).
This allows software like Voidtool's Everything to index the entire filesystem and update that index in real time. Using inotify for that is not possible since you'd have to recursively register for hunderds of thousands of directories, which far exceeds the limit of allowed registerations (and either way, would be quite wasteful to hold 100,000 registration handles). As a result, the Linux equivalent to Everything is unable to find recently created files, erronously finds deleted files and has outdated metadata fo recently changed files. Alternatively, you can attempt to reindex right before searching (e.g. rerun updatedb if you are using locate), but then your searches are much slower and the benefit of indexing for a quick search is reduced significantly.
A nicer API would allow to use one registration handle to get notifications for an entire directory tree or at least have a seprate API for an entire filesystem (like the NTFS option mentioned above).
At least, that was the conclusion I came to when I last researched the topic. If you know of a supported way of doing this on Linux I'd really love to know! It will allow me to finally make the program I wanted to make!
> As a result, the Linux equivalent to Everything is unable to find recently created files, erronously finds deleted files and has outdated metadata fo recently changed files.
You're being quite aggressive about your disagreement here. I don't think it's "FUD" to rightfully point out that using inotify on a large directory tree takes extra userspace code that is explicitly called out in the inotifywatch docs as performing poorly enough to worry about race conditions. Not to mention that it requires a limits tweak. This isn't "lazy," it's wishing for the mechanism to be more capable. Would you have this same attitude if the default limit were 1024? 512?
Why wonder what my attitude would be with lower limit? Default limit is 8192 and it's OK for use case exemplified by entr and you can easily change it.
I agree with your frustrations around this limitation and think file watchers should do their best to support arbitrary file addition/removal in directories. That being said, the races do matter when you edit with "mammal speed" but use source control to merge/rebase commits that affect your watched directories. Especially in a team setting when multiple coworkers are working on the same directories, git can be blasting rapidly through a lot of file moves, deletions etc so it's not really an edge case.
Git is a good point. However, the only time I desire commands triggered by changes in source tree is during actual code writing to instantly test my changes. On the other hand stopping it during Git operations could harm my workflow. This could be easily solved by simple 1s debounce, with this hypothetical debounce filter
This seems to fill a similar role as watchman[1] (specifically with watchman-make[2])
I've heard good things about watchman because it will make a best effort to let filesystem changes "settle" before running the command specified. If there's a comparison of these two written up or if someone can give their testimonial I'd love to hear it.
watchexec was borne out of a few frustrations with entr, mostly around how it handles new files being created. However, from a pure design standpoint, entr is just better than most anything out there due how closely it hews to UNIX philosophy, and it gets so far on just that.
The only real improvement that can be made to these tools (currently) would be to have perfect information about what caused a file to change. Currently, most tools require you to tell it files/patterns to ignore to avoid triggering loops where the file watcher changes files and ends up triggering itself over and over. watchexec did good work here by ingesting your .gitignore and using that.
Unfortunately OSes don't provide great info when it comes to file modifications. On Linux, ptrace/LD_PRELOAD would enable us to know the set of all files changed as a result of running the file watcher (and thus ignoring them automatically). DYLD_INSERT_LIBRARIES is a thing on macOS, though it is subject to SIP restrictions with some binaries. I'm unsure what mechanism exists on Windows. The highly platform-dependent nature of this is one reason why I haven't really pursued this line of work in watchexec.
Does make have watch functionality built in? Sometimes I use make and entr together. entr detects filesystem changes and make efficiently rebuilds. But nowadays a lot of languages have their own build tools besides make, and entr works directly with any of them.
What files are interesting is already encoded in your Makefile and make is pretty good at figuring out if a given file changed or not so, instead of duplicating efforts, run make every second or two and let it figure out what needs to be done.
It won't eat your memory or pool of file pointers and your CPU will barely feel it.
That's a solid approach if your dependencies are already managed by make, but languages like Rust and Go do their own dependency management, and Rust in particular is pretty slow to compile. I wouldn't want it running every 2 seconds.
Well it knows from the filesystem which files have changed.
If you mean “invoke as soon as a file has changed”: I certainly don’t want that behavior as many code changes require multiple files to be edited and there’s no automatic way to understand as the change hasn’t been written yet.
Guard [1] is a great tool in the ruby world for rerunning tests, reloading your browser, or running any arbitrary task based on files changing inside your file system. Its built on top of Listen [2] to do the fs piece, which has pretty well supported at least across linux and mac os over the years.
> The inotify cron daemon (incrond) is a daemon which monitors filesystem events and executes commands defined in system and user tables. It's use is generally similar to cron(8).
Thanks for sharing this. I'm continuously surprised by how many tools exist in the *nix ecosystem that I've never come across. It makes me question if there's a discoverability problem here. Is there any website that broadly categorizes what utilities are available given a desired goal?
Entr requires one watch per file. One tool that I haven't seen mentioned here yet is fswatch, which uses directory based patterns and works more akin to a find: it outputs changed file names to stdout.
Interesting thing to look at and see if it maybe suits you better.
I use a shell script named "onsave" that wraps the following incantation:
inotifywait --event modify --recursive . --quiet
If anyone wants the full script I can post it here. One thing many of these tools miss is that I typically want the build command run once at the start.
I use this kind of thing regularly with web stuff like Django, Flask and frontend/JS stuff using parcel, but I wonder what other things this is appropriate for? I can't imagine it being very useful for C/C++ code. I don't always want to rebuild every time I save a file.
One of the issues I've seen with entr, fswatch, etc is they don't respond to ATTR changes (e.g. via touch).
When developing on VM, sometimes you have to "forward" filesystem events to the guest. Usually that happens via TCP/UDP and touch, but wont trigger the reload.
I used to use `inotifywait`. It met all my requirements so I just stuck to it. Allowed me to have LaTeX rendered to PDF in Evince on one side and write in Vim on the other side.
Since Evince reloads file if it changes on disk, that meant I had immediate feedback.
entr is a great too, I use it all the time to automate various little things when I'm developing. The coolest part that it works in every environment I care about, which is FreeBSD, Linux, and macOS.
I created https://crates.io/crates/runwhen a while back and it's cross platform. Unfortunately you have to build it yourself :-p I never got around to creating an automated build pipeline for it.
I will say, inotify is one of the best designed kernel facilities, it avoid so many of the pitfalls of other facilities used for the same task, and is generally easier to use. I wonder if any BSDs plan to implement it, outside of Linux ABI layers.
For example, in the example on the blog post, `git ls-files` will almost certainly ignore autogenerated build files, but it's possible for one of those files to change without the output of `git ls-files` changing. Similarly for things like third party packages that are installed system-wide.
With an LD_PRELOAD, all you'd need to do is
and the watcher would figure out which other Ruby files were opened, be them git versioned Ruby files in the current folder, or Ruby gems / Ruby VM dependencies in $HOME/.rbenv/, or system packages in /usr/local, or config files in /etc, ...I guess I wouldn't actually be surprised to hear if someone has built this already.