Hacker News new | past | comments | ask | show | jobs | submit login

More likely you're running your containers using overlay or some FUSE thing, and that's causing the I/O slowdown.

What syscalls do you think are intercepted, how? Speaking as someone who can write kernel code, I'm not aware of any such thing specific to containers. (As far as the linux kernel is concerned, there's no such thing as a container.)

If you're talking about BPF, that can be used outside of containers, e.g. systemd can limit any unit, and using it is not part of a definition of what a container is.




I have decades of experience writing this type of low-level high-performance data infrastructure code directly against the Linux kernel, deployed in diverse environments. I’ve seen almost every rare edge case in practice.

You can find many examples in the wild of reputable software that loses significant performance once containerized no matter how configured. Literally no one has demonstrated state-of-the-art data infrastructure software that works around this phenomenon, and at this point you’d think someone would be able to if it was trivially possible. I test database kernels in a diverse set of environments and currently popular containers aren’t remotely competitive with VMs, never mind bare metal. The reasons for the performance loss are actually pretty well understood at a technical level, albeit esoteric.

Every popular container system has runtimes that intercept syscalls. Whether or not Linux requires containers to intercept syscalls is immaterial because in practice they all do in a manner destructive to I/O performance.

There used to be a similar phenomenon with virtual machines for many years, such that no one deployed databases on them. Then clever people learned how to trick the VM into letting them punch a hole through the hypervisor, and we’ve been using that trick ever since. It isn’t as fast as bare metal, but it is usually within 10%. No such trick exists for containers and as a consequence performance in containers is quite poor.


How can an unprivileged runtime intercept syscalls of an application talking directly to a kernel? I'll go browse through the containerd code to see if I can find such a thing because I know Go pretty well, but I have never heard of a runtime intercepting syscalls. That's why application kernels like gvisor exist.


>Then clever people learned how to trick the VM into letting them punch a hole through the hypervisor, and we’ve been using that trick ever since

Can you say more on this trick. Is it available by default on the cloud VMs, or is it something that has to done in the user's code. I have seen tweets from DirectIO developers saying AWS Nitro machines are better for kernel-bypass-IO compared the Google cloud VMs. but my understanding of it was something done by Amazon Nitro card engineers/developers, and Google was working on something similar to improve the performance.


Lots of words, very little said. How do you think e.g. docker "intercepts syscalls"?


No idea, don’t really care. That these performance disparities exist isn’t controversial, several well-known companies have written about it e.g. nginx[0]. I deploy on Kubernetes, but all of our I/O intensive infrastructure is deployed on virtual machines for performance reasons — we still lose ~10% on a virtual machine compared to bare metal but a container on bare metal is significantly worse.

All high-performance data infrastructure bypasses the operating system for most things, taking total control of the physical hardware it uses. Cores, memory, storage, network. Linux happily grants that control, with caveats. It doesn’t work the same inside Kubernetes and Docker, and no one can figure out how to turn it off. Maybe you don’t work on I/O intensive applications, but for people that do this is a well-documented phenomenon. My teams have wasted far too many hours trying to coax good performance out of code in containers that works just fine on bare metal and virtual machines.

[0] https://www.nginx.com/blog/comparing-nginx-performance-bare-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: