This is...not simple. I agree that container networking is not all that much different from other Linux networking, but that doesn't make it simple. A lot of application developers are switching to working with containers and haven't historically had to do any manual network configuration. It's all new to them. Linux networking conventions change every few years and simplying keeping up with the basics is a chore. netplan is the current flavor of the week but it's still new to plenty of people who've never had to worry about anything more than auto-configured DHCP or cloud-provider-default VPS networking.
There are also important security considerations with container networking. Docker, by default, punches massive holes through your firewall in non-obvious ways. People don't realize that with a default Docker configuration, containers are ignoring any normal firewall rules you may have setup with iptables or ufw. Locking that down is only easy if you already know iptables well, and even if you do, managing it is a pain.
The article doesn't touch on Kubernetes, but that's a whole other can of worms. You have to pick a CNI and manually configure it; DNS doesn't just magically work; default CIDR allocations often conflict with existing networks; load balancer ingress for a development/single-host/non-cloud environment is a horror show.
This is a good and helpful article, but container networking is not simple by any stretch.
... for Ubuntu, and Ubuntu only. It is an invention of Canonical not adopted by the rest of Linux distros (except Ubuntu derivatives), generally speaking.
I have started spinning up “bare metal” k8s on a cloud VM and it’s not that hard to get going until you get to anything networking related then I feel like I’ve just jumped off a cliff. I have no knowledge there and the online resources seem to be nonexistent because you’re expected to just use a prebaked solution from cloud providers.
I ended up just installing k3s but I have yet to figure out where the Traefik packaged in k3s can listen directly on port 80 and 443, but the basic Traefik installed via Helm cannot
To be fair, starting with something like a BSD might be easier in terms of networking. Mainly because the tooling hasn't been all over the place in the last 2 decades.
Also, a lot of network knowledge is not OS specific. Learning about IP, ethernet, routing protocols etc is valuable no matter which OS you use.
I've had years of Linux experience and decided my home network + router was quite poor and I'd set up my own home network router and use OpenBSD to do it as an excuse to look at a BSD and fix my network issues.
OpenBSD has been an absolute pleasure to work with. The man pages are well written and complete, the filesystem is well organized, having no issues that people have with Systemd the simplicity of OpenBSD's init system showed you don't need complex init systems. Then you have PF which is far easier to deal with than IP tables.
It took me a few hours of playing to set OpenBSD up with multiple vlan's, dhcp server, firewall, cross vlan routing, mdns etc. The hardest part was figuring out the correct parameters for my ISP broadband connection as the ISP didn't publish some of the information so needed to sniff the network traffic to find that i needed to set a vlan for my isp.
The simplicity of OpenBSD makes it very easy to learn networking and if you are familiar with Linux you will become familiar with BSD very quickly as BSD's are less fragmented than Linux by their nature.
Well, I first checked /etc/resolv.conf which told me to change the dns settings in systemd. So I read up on the systemd documentation and tried to change the dns setting using systemd-resolved but got a permissions error. So I did a bunch of reading online and found out that netplan is actually used to configure the dns so I spent some time reading the netplan documentation on how to change the dns server. This process took me about an hour.
I have done some learning about more generalized network concepts (ip addresses, protocols, routing) but I want to make the jump to actually apply it in some VMs now
I think my next stop is the network section of Unix and Linux System Administration Handbook
Strangely there is no current books for the state of the art Linux networking.
Here are a few and far between books on Linux networking but they are seriously outdated [1][2]. I think it's about time someone write an authoritative book on Linux networking in the light of the recent changes on Netfilter, bpfilter, eBPF and LXC container.
[1] Linux Kernel Networking: Implementation and Theory, 2013
[2] Linux Routers: A Primer for Network Administrators, 2nd Edition, 2002
Correction: chapter 4 is in the printed book. On the web site, it's found at http://policyrouting.org/iproute2-toc.html. The links from the other chapters of the web site don't work but browsing the ToC of the book, it looks like this link on the web site has all the info from the book plus a bit more...
A -- "A lot of application developers are switching to working with containers and haven't historically had to do any manual network configuration. It's all new to them."
B -- "Linux networking conventions change every few years and simplying keeping up with the basics is a chore. netplan is the current flavor of the week but it's still new to plenty of people who've never had to worry about anything more than auto-configured DHCP or cloud-provider-default VPS networking."
My perspective has been that "A" is directly leading to "B" for similar reasons new javascript frameworks keep being released. Linux networking conventions don't change rapidly (5 - 10 years) but new management tooling does get created alongside to sit next to existing tooling, already abstracted out OS tools have new tooling to further abstract out system management which just creates further unnecessary complexity.
> Linux networking conventions change every few years and simplying keeping up with the basics is a chore.
Debian has used /etc/network/interfaces for as long as I know. (at least 15+ years) It's sane, short and straightforward. For servers, this has been more than ideal for every setup I could conceive.
> through your firewall in non-obvious ways. People don't realize that with a default Docker configuration, containers are ignoring any normal firewall rules you may have setup with iptables or ufw.
What do you mean by non-obvious? If I bind a port of the container to the host eg. -p 8080:80 only this port is a hole. Do you have something different in mind? (I'm a docker beginner)
That alone is non-obvious. Docker adds prerouting and masquerade changes. They don't show up in the default iptables rule list (iptables -L). If you don't know about it creating the other tables it adds, and how to list them explicitly, you won't see these rules added. Docker binds to the public interface by default, rather than to much-safer private/localhost defaults. If you have an iptables INPUT DENY rule, it doesn't come into play at all, and that's certainly untuitive for people who aren't iptables experts. For example, if you're using ufw (which most non-power-users rely on) and have it set to DENY by default, that won't protect you from the example you provided. The way Docker works is simply wrong. It should not mess with iptables at all. It does so because that makes it "easier" for novice users, but that ease of use is a real security risk.
Docker's custom iptables chains, upon restart of network or docker, can and most likely WILL clobber rules that used to work. Docker adds the DOCKER-USER user chain for the purpose of readding rules back after docker has set up its rules. Most engineers never have to deal with it unless they are deploying to production boxes.
I hope k8s (or some managed subset) eventually provides a stable networking abstraction. There is so much to consider when it comes to network hardening and I'd like it to not have to constantly stress about it, feeling like I don't know enough.
I think it’s time for us to get past “X is simple” or “X is easy.” It’s a highly subjective thing to say, no matter the topic. It’s never helpful, it just makes other people feel bad.
Well, to be honest, some level of sarcasm was meant to be in this title because it precedes a 4000+ words explanation of the topic... But seems like I'm pretty bad at writing good titles. For sure I didn't mean to make anyone feel bad.
FWIW, I thought it was obviously either stupidity[0] or sarcasm based on the title, and the 4000+ words explanation confirmed it as the latter. Can't speak for anyone else, though.
Edit: 0: well, or lies / container vendor shilling
I know a guy, he hated networking, so he decided to set up a kubernetes cluster on his home server to abstract all his networking difficulties away. Now he occasionally complains that the network inside the network inside the network is annoying because it doesn't play nicely with the network outside the network on top of the main network. Overall I feel the rate at which he talks about networking issues has increased, but his enthusiasm for talking about them has gone up, so I guess that's a win.
Yeah, I feel cursed by the same habit. I get to about a 75% understanding of a domain and feel like I'm about to run out of things to learn...so I expand the domain and continue until I'm at that 75% mark again...and repeat...
Brilliant write up. Lots of this topic is so much easier to understand when it’s presented from first principles, without any of the LXC or Docker helpfulness hiding the details.
(If the author is reading, thank you! I’ll likely use this material for the pupils in my computer club.)
Managing an IPv6 stack alongside IPv4 is also very informative. IPv6 is still not widely deployed — SMTP is likely tied to v4 for all eternity — but it’s incredibly useful for managing multiple sites of inventory over the internet. Seeing RFC1918 style 10.x.y.z private addresses and IPAM in use by internal ops and IT in 2021 brings tears to my eyes.
Adding a section on using conntrack to watch the way in which the kernel handles MASQUERADE and DNAT would be illustrative as well.
I especially like all of the replace verbs, which I wish I’d known about sooner. They make idempotency much simpler to express without any if ! ip thing get <long list of route args>; then ip add <same list of args> ; fi stuff.
That's cool! Here's a synopsis of `ip replace`: "replace will either modify an existing address or create a new one if the specified address does not exist"
Container networking is as simple as linux networking, which isn't.
Its as much of a faff now as it was when I was doing KVM virtualisation professionally (don't, pay and use VMware. You'll be much happier, and have lots of free time)
To debug its a massive arse and lacks any useful and friendly debugging tools.
One thing that does help is either to use VLANs or a second adaptor to separate container traffic from control. It makes debugging slightly easier. By easier, I mean that when you accidentally misconfigure it, there is a better chance that you'll be able to get control of the host still.
If you're running untrusted code in your container, you've pretty much already lost.
Containers are useful for deployment and configuration, they are not a robustly secure sandbox. For that you still need to go with a VM.
No cloud provider will offer to run your containers alongside other customer's containers, on a shared kernel. Your containers always run within your own VM.
No, it is levels of security, not a white/black issue. For some customers and providers, containered processes are enough isolation. For many small businesses, even shared hosting with chroot is enough.
Generally speaking, Xen or Firecracker VMs do have smaller attack surface than containered processes on a shared Linux kernel. But configuration and exposed capabilities matter - it is possible to have container better secured than a VM (e.g. minimal Zones/jails env + correct MAC config vs. general Qemu/VMware VM with many default legacy devices and bad/no MAC config).
Motivated attackers can escape even these VMs. So they are not a magical solution.
Common hypervisors are too big and buggy to be pronounced as security panacea. From time to time, VM escapes resurface to public but most are probably guarded and being exploited in quiet. As we know after Spectre and Meltdown, standard computing technology is buggy/bugged all the way down to hardware.
If you want really "robustly secure" server environment, such do exist: for example, separation kernels like the L4 family or the Green Hills INTEGRITY systems. But for web apps, almost nobody bothers.
> No cloud provider will offer to run your containers alongside other customer's containers, on a shared kernel. Your containers always run within your own VM.
> From time to time, VM escapes resurface to public but most are probably guarded and being exploited in quiet. As we know after Spectre and Meltdown, standard computing technology is buggy/bugged all the way down to hardware.
To my knowledge there has never been a successful escape from the VMs offered by AWS, GCP, or Azure. That would be a pretty big story.
> If you want really "robustly secure" server environment, such do exist: for example, separation kernels like the L4 family or the Green Hills INTEGRITY systems. But for web apps, almost nobody bothers.
What's the reason none of the major cloud providers use seL4? Missing features? By seL4's own account their performance is exceptional, but perhaps its performance can't compete against a hardware-assisted system like AWS Nitro?
You won't hear much about the VM escapes in those clouds because there is strong incentive for almost everyone to be quiet about them and patch as soon as possible so there "isn't a story". Some researchers do want to make a name for themselves and publish before the patch, but those are minority. And publicly attacking a big corporation is quite risky so they prefer presenting the bug as general problem in the technology. But the bugs in hypervisors exist, see CVE's for Xen for the past decade.
> Some researchers do want to make a name for themselves and publish before the patch, but those are minority. And publicly attacking a big corporation is quite risky so they prefer presenting the bug as general problem in the technology.
That doesn't sound right to me. The industry norm for security research is 'responsible disclosure', which is intended to give the vendor reasonable time to implement the fix, while eventually publishing the knowledge for all to know.
Unless they're simply being paid for their silence, I can't imagine a security researcher wanting to keep quiet about a major achievement like that.
> bugs in hypervisors exist, see CVE's for Xen for the past decade
Sure, but I'm talking specifically about the big 3 cloud providers, not vanilla Xen. Amazon in particular have gone to pretty extreme lengths with their Nitro system.
I have to say, it would be cool if they did. If I could get a dirt-cheap rate for running batch workloads in a potentially antagonistic environment, I could make use of that. Not all data is sensitive.
I know many people use single tenant hardware to run containers, so I wouldn’t say that it defeats the purpose of containerization, just one of the benefits.
The confusion and disagreement on this topic is rooted in faulty assumptions about other organizations' requirements. Many people use containers with host networking, without uid namespaces, without pid namespaces, without bind mounts. What you think is "a container" may not be at all universal. Also the idea that host networking grants all processes CAP_NET_RAW is just weird and wrong.
It's not rocket science, but it's not entirely simple because there are a lot of moving parts, because IP networking is annoyingly complicated, and because we are stuck with IP addresses and interfaces that they are attached to, with many of these low-level details percolating up to user code. I like Linux's a-la-carte container APIs but their flexibility seems to increase complexity, as you end up dealing with multiple namespaces as well as cgroups, etc..
Then there are Kubernetes and Docker which add a lot of complexity on top, while providing some levels of abstraction which don't entirely eliminate the need for understanding the underlying internals.
Having some more mid-level container and networking APIs would be nice, but I'm not sure it would solve the complexity problem.
Guaranteed employment for networking/container/linux experts I guess.
IPv6 doesn't solve the difficulties of container networking. Containers have private network namespaces, which means that a container can't communicate with other hosts without (1) a dedicated interface or (2) a bridge (basically a software switch) for host-to-container communication.
These problems are Layer 2 problems, while IPv4 and IPv6 are at layer 3.
With IPv6 there's not enough tinkering with service meshes, port forwardings, workarounds for private ip address conflicts and VPCs to write articles about.
Containers don't (generally) run in their own operating system, just their own userland. It can actually be a source of security vulnerabilities to assume that containers are completely isolated from each other, such as running a root user container in production, assuming it can't get privileged access to the host. It's less similar to a bare-metal MS DOS application than it is a glorified chroot jail
> Containers don't (generally) run in their own operating system
Right, but containerized application can't (ideally) talk to other applications on the same machine, that's how it's similar to a single-process OS with a single app running. Of course there are details like a single application may still contain multiple processes from OS standpoint, but the overall comparison stands.
> It's less similar to a bare-metal MS DOS application than it is a glorified chroot jail
These two cases are similar enough from containerized application standpoint (only OS services are different than those of MS DOS).
There are also important security considerations with container networking. Docker, by default, punches massive holes through your firewall in non-obvious ways. People don't realize that with a default Docker configuration, containers are ignoring any normal firewall rules you may have setup with iptables or ufw. Locking that down is only easy if you already know iptables well, and even if you do, managing it is a pain.
The article doesn't touch on Kubernetes, but that's a whole other can of worms. You have to pick a CNI and manually configure it; DNS doesn't just magically work; default CIDR allocations often conflict with existing networks; load balancer ingress for a development/single-host/non-cloud environment is a horror show.
This is a good and helpful article, but container networking is not simple by any stretch.