Hacker News new | past | comments | ask | show | jobs | submit login
Hermit is a hermetic and reproducible sandbox for running programs (github.com/facebookexperimental)
180 points by PaulHoule 8 months ago | hide | past | favorite | 16 comments



It seems like this tool does not create a fully deterministic nor reproducible environment. Hermit seems to only intercept and modify syscalls, but this is not the only source of non-determinism and randomness. For example, the layout of environment variables in memory also causes non-determinism, caused by the content of the environment variables as well as their order in memory. CPU instructions like RDTSC, RDRAND, RDSEED and similar also introduce randomness. It seems like Hermit ignores some these sources of randomness, but I can't test it, because it doesn't build on a current Arch system with the Rust toolchain from the repo.

At least it seems Hermit masks RDRAND and RDSEED via CPUID, but not every program is written to support ancient architectures which didn't support these instructions and therefore not every program tests availability via CPUID.

In addition, even if all of this was deterministic, CPU flags set by various instructions with "undefined" flags according to the CPU manual can slightly differ between different microarchitectures. A "normal" program should not be influenced by this, but it is still a source of non-reproducibility. This might be relevant for certain rare compiler bugs.


In practice, those "undefined flags" are set consistently. In Pernosco we move rr recordings across architectures quite often and those flags have never been a problem.


It's a really interesting project but it hasn't worked for non-trivial programs for me. I tried to use it on my Raft implementation. Hermit crashed with obscure (to me) error messages.

Others have commented on here before, it admittedly doesn't seem to be actively maintained.

> Just to let you know we’re not actively working on Hermit in the team

https://github.com/facebookexperimental/hermit/issues/34#iss...


That's been my experience as well. It lacks support for certain clone(2) flags like CLONE_VFORK[1], which limits the set of non-trivial programs it can run, and since running non-trivial programs is most of the point, I haven't revisited it since it was first announced.

[1] https://github.com/facebookexperimental/hermit/blob/bd3153b4...


I'm curious what the performance impact is like; I assume there has to be some slow down because of the interception of system calls?


It uses Reverie under the hood, which itself relies on ptrace (at least for the current, sole implementation).

> Since ptrace adds significant overhead when the guest has a syscall-heavy workload, Reverie will add similarly-significant overhead. The slowdown depends on how many syscalls are being performed and are intercepted by the tool.

> The primary way you can improve performance with the current implementation is to implement the subscriptions callback, specifying a minimal set of syscalls that are actually required by your tool.

https://github.com/facebookexperimental/reverie


Tangent: running old OSes (with no virtio support) under QEMU on Linux has the peculiar property that I/O-heavy portions such as installation can run faster under TCG (JIT) than under KVM (hardware virtualization), presumably due to all the trapping. It’s a toss-up when those also include CPU-heavy parts (decompression).


> all thread executions are serialized so that there is effectively only one CPU

This definitely isn't intended for general-purpose sandboxing. It's an interesting tool for analysis and debugging.


Ah, I had missed that it effectively forces you to one CPU. Although I already would not use it for anything but testing account of it intentionally on unrandomizing things - I suspect, for instance, that it's unsafe to run any sort of cryptography that would create keys under this.


It sounds similar to that antithesis testing service that was on front page recently as well. That also claimed to be able to run programs deterministically as well. I wonder if the two projects are related at all.


Our projects have some features in common, but are pretty much unrelated. Hermit is a deterministic userland, whereas we enforce reproducibility at the hypervisor level and with the right device drivers can support any OS.

The most interesting part of Antithesis (to me) isn’t even the perfect reproducibility, but the autonomous state space exploration that finds the bugs in the first place. AFAIK Hermit doesn’t do that, though you might be able to get somewhere by running your program plus a conventional fuzzer under Hermit together?

Disclosure: I am one of the co-founders of Antithesis.



I think this tool must share a lot techniques and use cases with rr. I wonder how it compares in various aspects.

https://rr-project.org/

rr "sells" as a "reversible debugger", but it obviously needs the determinism for its record and replay to work, and AFAIK it employs similar techniques regarding system call interception and serializing on a single CPU. The reversible debugger aspect is built on periodic snapshotting on top of it and replaying from those snapshots, AFAIK. They package it in a gdb compatible interface.

Hermit also lists record/replay as a motivation, although it doesn't list reversible debugging in general.


"Hermit is no longer under active development within Meta and is in maintenance mode. There is a long tail of unsupported system calls that may cause your program to fail while running under Hermit. Unfortunately, we (the team behind this project) don't have the resources to triage issues, fix major bugs, or add features at this point in time."


What's the difference between this and a container?


Hermit executes your program deterministically. This means that it accounts for sources of non-determinism like thread scheduling. The idea is that you will be able to investigate executions in a fully reproducible manner.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: