Podman: Pasta User-Mode Networking

jnardiello · on Oct 18, 2022

As an opensource lover, you gotta appreciate redhat registering a "containers" organisation with rh-only maintained projects. True OSS philosophy at work here!

markkrj · on Oct 18, 2022

That's not true. There are projects which are not maintained by RH. Also, anybody could have registered that name before RH... You could have done so.

jnardiello · on Oct 18, 2022

Mate, I've been working in this space for long enough to know exactly what I'm talking about.

Podman, Buildah, Skopeo and Crun are ALL redhat projects. OSS = Open code (ok) + Open Governance (not ok). In the "containers" org there are currently 23 maintainers, 99% RedHat employees (with a few indies - you always want students contributing for free). This is a great example of enterprise OSS shitshow, whereas a large org register a fake org on github with an intended "open" name - but keep close control on the governance of projects and tries to push (implicitly) its technological stack as a standard.

Disclaimer: I fucking hate enterprise vendors.

zoobab · on Oct 18, 2022

Not to mention Openshift.

encryptluks2 · on Oct 18, 2022

That is if GitHub doesn't let a big organization with a lot of money take it from you.

kmlben65 · on Oct 19, 2022

that's a "non issue" - commercial companies are vendor-locked with cloud on conscious decision so it doesnt matter - they pay for everything (an sometimes multiple times),

and any orgs that treasure open source philosophy either forking or not touching anything of redhat with a long pole.

unnah · on Oct 18, 2022

Ok, a stupid question. From what I understand this PR provides a tunnel, so that code in the container's private network namespace sees the external network, as if the code was running on the host. Why is this necessary - is it not possible to make the container use the initial network namespace, and get the same end result in a simpler way?

rwmj · on Oct 18, 2022

SLIRP is a clever tool that turns network traffic (raw IP frames on one side) into connect(2), read, write etc system calls on the other side. eg. If it sees a virtual machine is making a connection by sending a TCP SYN packet, it will on one side turn that into a connect(2) system call, and on the other side send back the appropriate TCP packets to complete the connection. It also does ARP, DNS, etc. If you think of a kernel as something that turns socket system calls into packets, then SLIRP is the opposite.

SLIRP is mainly useful when you don't have (or don't want) root permissions to send raw packets. I may be one of the few people here to have used SLIRP back in the early 90s for its original purpose: You have dial up access to a shared SunOS terminal login, how do you turn that into a full network connection for your local Linux PC? SLIRP (+ expect, SLIP and some scripting) solved this exact problem.

Passt (https://passt.top/passt/about/) is a more modern replacement for SLIRP that amongst other things fully supports IPv6 and is more secure architecturally (runs in a separate process, uses modern Linux mechanisms for isolation etc). There was a talk about it here: https://kvmforum2022.sched.com/event/15jJY/slirp-is-dead-lon...

This pull request is using Pasta which is something on top of Passt that does something with network namespaces which I'm not entirely clear on, but some docs here: https://passt.top/passt/about/#pasta-pack-a-subtle-tap-abstr...

jasonjayr · on Oct 18, 2022

> I may be one of the few people here to have used SLIRP back in the early 90s for its original purpose

While I was still in college in 1999 -- I figured out how to pair a local ppp device with SLIRP on the Sun machines we had shell access on over ssh, enabling me to bounce internet access through our CS network, which got access to a few things we couldn't directly access from our dorms.

Later in my career I learned about the TCP-over-TCP meltdown effect, which explained why I had to restart it every once in a while.

Good times :)

apitman · on Oct 18, 2022

I have a project that relies on user mode networking. passt is a very welcome development. Really hoping it supports Windows QEMU hosts at some point.

bonzini · on Oct 18, 2022

Very difficult, it's not portable so it would be basically a rewrite. I read that similar APIs are present in FreeBSD, but at least it's POSIX-based so that does not really say anything about the complexity of a Windows port.

Anyhow, the presentation mentioned above explains which Linux extensions are needed.

sbrivio · on Oct 18, 2022

You might be right, but I hope you're not. :)

I had a look at Winsock documentation a while ago, that doesn't look so bad in terms of what we need. See also: https://lore.kernel.org/qemu-devel/20220919232441.661eda8d@e.... Replacing epoll might be messy, though.

sascha_sl · on Oct 18, 2022

Because you want containers to be able to allocate ports that are already in use on your host. Or at least you don't want that to be a source of errors.

sbrivio · on Oct 18, 2022

On top of that, you usually want to isolate the container workload with an observable network abstraction instead of granting it full (albeit non-root) access to host network facilities (including sockets).

See https://medium.com/nttlabs/dont-use-host-network-namespace-f... for just an example of what can go wrong otherwise.

bogomipz · on Oct 18, 2022

This looks like a replacement for something called slirp4netns which is "User-mode networking for unprivileged network namespaces." I wasn't familiar with this or libslirp. Can someone say what the practical use-case is for User-mode networking? Is this just to complements Podman's existing security posture or something else?

bongobingo1 · on Oct 18, 2022

> Is this just to complements Podman's existing security posture

To my understanding, yes. You can run Podman containers as non-root, but containers often have their own network namespace which would require root privileges to create without slirp4netns. I don't believe there are really practical reasons to use it beyond that pretty big one. It does (used to?) incur some performance hit even (but only at some multi-gb rate and even then only a fractional penalty IIRC). e: I was thinking about rootlesskit here, which is somehow combined with slirp4netns in some cases.

I remember looking into Pasta a while back when I wanted to get client-ip-addresses in a container and the current Podman implementation for user-networks obscures that value. I think this might fix that along with IPv6 and other improvements.

bogomipz · on Oct 18, 2022

Thanks. I didn't understand what you mean that Podman would obscure the value of the client IP?

sbrivio · on Oct 19, 2022

There are two port forwarding modes allowed with slirp4netns. One uses slirp4netns itself: data is passed across a tap device, libslirp translates the destination address and preserves the source address.

The second one uses rootlesskitport (while slirp4netns still takes care of outbound connections): it opens sockets directly in the detached network namespace and passes data between sockets without going through the tap device. It's faster, because you avoid 1. the tap device 2. Layer-4/Layer-2 translations. But those sockets are local to the namespace, so destination and source address become loopback addresses. That might be unexpected in some cases, see also https://nvd.nist.gov/vuln/detail/CVE-2021-20199.

pasta implements both modes (it's the "tap bypass" in https://passt.top/passt/about/#pasta-pack-a-subtle-tap-abstr...), and selects the appropriate one based on the original source address, so that you don't need to choose one. Local connections skip the tap device, non-local ones go through it (you can have non-loopback source addresses only for traffic coming through a a non-loopback interface).

bogomipz · on Oct 19, 2022

Ah OK, that make sense. Slirp4netns and Past both seem pretty interesting. I'm looking forward spending some time with Podman networking. Cheers.

znpy · on Oct 18, 2022

As an Italian… i love that name choice.

encryptluks2 · on Oct 18, 2022

Maybe I'm wrong but I thought netavark already replaced slirp4netns

bonzini · on Oct 18, 2022

Does it support rootless containers?

encryptluks2 · on Oct 18, 2022

Yes, according to their GitHub page. I use it as the networking stack for Podman.

sbrivio · on Oct 18, 2022

See podman-network-create(1), I think the description of the --driver option should clarify this. You can use macvlan there, but in rootless operation you don't have access to the network interface of the hosts -- unless you set that up networking separately beforehand (as root), that is.

So yes, it supports rootless, but by (kernel) design, there's no magic way to bridge (no pun intended) that gap (even with macvlan).

noodlesUK · on Oct 18, 2022

My main question about this is why this instead of SLIRP? There's not really a tl;dr for that obviously on the repo?

I've definitely had problems with SLIRP before but which ones does this aim to solve?

sbrivio · on Oct 18, 2022

I have specifically not added that out of respect for what Slirp represented to dialup shell users in the late 1990s and for what it did for the ecosystem in the past 15 years. I used it for more than a decade myself, and if you consider that it's now used for containers and virtual machines, given the original purpose, it's really an impressive piece of software. Feature comparisons can be risky at times.

It's mostly in the direction of IPv6 support, performance, security. Those topics are covered in slides: https://static.sched.com/hosted_files/kvmforum2022/01/passt_... or recording: https://www.youtube.com/watch?v=U89bWP1HNgU of a recent talk from KVM Forum 2022.