Hacker News new | past | comments | ask | show | jobs | submit login
Container networking is simple (2020) (iximiuz.com)
271 points by zdw on Jan 19, 2021 | hide | past | favorite | 67 comments



This is...not simple. I agree that container networking is not all that much different from other Linux networking, but that doesn't make it simple. A lot of application developers are switching to working with containers and haven't historically had to do any manual network configuration. It's all new to them. Linux networking conventions change every few years and simplying keeping up with the basics is a chore. netplan is the current flavor of the week but it's still new to plenty of people who've never had to worry about anything more than auto-configured DHCP or cloud-provider-default VPS networking.

There are also important security considerations with container networking. Docker, by default, punches massive holes through your firewall in non-obvious ways. People don't realize that with a default Docker configuration, containers are ignoring any normal firewall rules you may have setup with iptables or ufw. Locking that down is only easy if you already know iptables well, and even if you do, managing it is a pain.

The article doesn't touch on Kubernetes, but that's a whole other can of worms. You have to pick a CNI and manually configure it; DNS doesn't just magically work; default CIDR allocations often conflict with existing networks; load balancer ingress for a development/single-host/non-cloud environment is a horror show.

This is a good and helpful article, but container networking is not simple by any stretch.


> netplan is the current flavor of the week

... for Ubuntu, and Ubuntu only. It is an invention of Canonical not adopted by the rest of Linux distros (except Ubuntu derivatives), generally speaking.


Long live wicked, easily managed with Yast on OpenSuse.


Yes, but how do you manage it without yast?


try: wicked --help

... but the beautiful thing about yast, is you get the interface via gui or console and it's largely identical.


Any good books on Linux networking?

I have started spinning up “bare metal” k8s on a cloud VM and it’s not that hard to get going until you get to anything networking related then I feel like I’ve just jumped off a cliff. I have no knowledge there and the online resources seem to be nonexistent because you’re expected to just use a prebaked solution from cloud providers.

I ended up just installing k3s but I have yet to figure out where the Traefik packaged in k3s can listen directly on port 80 and 443, but the basic Traefik installed via Helm cannot


To be fair, starting with something like a BSD might be easier in terms of networking. Mainly because the tooling hasn't been all over the place in the last 2 decades.

Also, a lot of network knowledge is not OS specific. Learning about IP, ethernet, routing protocols etc is valuable no matter which OS you use.


I agree.

I've had years of Linux experience and decided my home network + router was quite poor and I'd set up my own home network router and use OpenBSD to do it as an excuse to look at a BSD and fix my network issues.

OpenBSD has been an absolute pleasure to work with. The man pages are well written and complete, the filesystem is well organized, having no issues that people have with Systemd the simplicity of OpenBSD's init system showed you don't need complex init systems. Then you have PF which is far easier to deal with than IP tables.

It took me a few hours of playing to set OpenBSD up with multiple vlan's, dhcp server, firewall, cross vlan routing, mdns etc. The hardest part was figuring out the correct parameters for my ISP broadband connection as the ISP didn't publish some of the information so needed to sniff the network traffic to find that i needed to set a vlan for my isp.

The simplicity of OpenBSD makes it very easy to learn networking and if you are familiar with Linux you will become familiar with BSD very quickly as BSD's are less fragmented than Linux by their nature.


I was amazed when my OpenBSD network config worked on the first try using only the man pages as reference.

On the other hand it took me the better part of an hour to figure out how to change the DNS server used by an Ubuntu install.


> On the other hand it took me the better part of an hour to figure out how to change the DNS server used by an Ubuntu install.

Might I ask why/how exactly?


Well, I first checked /etc/resolv.conf which told me to change the dns settings in systemd. So I read up on the systemd documentation and tried to change the dns setting using systemd-resolved but got a permissions error. So I did a bunch of reading online and found out that netplan is actually used to configure the dns so I spent some time reading the netplan documentation on how to change the dns server. This process took me about an hour.


I have done some learning about more generalized network concepts (ip addresses, protocols, routing) but I want to make the jump to actually apply it in some VMs now

I think my next stop is the network section of Unix and Linux System Administration Handbook


I was in the same boat and I found Michael Lucas book quite good for getting the practical working knowledge: https://books.google.co.in/books/about/Networking_for_System...

It tries to address the concepts and the book covers for multiple operating system like Windows, Linux, BSD and even has Some Solaris tidbits. :-)


Strangely there is no current books for the state of the art Linux networking.

Here are a few and far between books on Linux networking but they are seriously outdated [1][2]. I think it's about time someone write an authoritative book on Linux networking in the light of the recent changes on Netfilter, bpfilter, eBPF and LXC container.

[1] Linux Kernel Networking: Implementation and Theory, 2013

[2] Linux Routers: A Primer for Network Administrators, 2nd Edition, 2002


https://access.redhat.com/documentation/en-us/red_hat_enterp... though it's a bit task based

http://policyrouting.org/PolicyRoutingBook/ONLINE/TOC.html but chapter 4 which introduces/explains how to use the ip command never got written!


Correction: chapter 4 is in the printed book. On the web site, it's found at http://policyrouting.org/iproute2-toc.html. The links from the other chapters of the web site don't work but browsing the ToC of the book, it looks like this link on the web site has all the info from the book plus a bit more...


A -- "A lot of application developers are switching to working with containers and haven't historically had to do any manual network configuration. It's all new to them."

B -- "Linux networking conventions change every few years and simplying keeping up with the basics is a chore. netplan is the current flavor of the week but it's still new to plenty of people who've never had to worry about anything more than auto-configured DHCP or cloud-provider-default VPS networking."

My perspective has been that "A" is directly leading to "B" for similar reasons new javascript frameworks keep being released. Linux networking conventions don't change rapidly (5 - 10 years) but new management tooling does get created alongside to sit next to existing tooling, already abstracted out OS tools have new tooling to further abstract out system management which just creates further unnecessary complexity.


> Linux networking conventions change every few years and simplying keeping up with the basics is a chore.

Debian has used /etc/network/interfaces for as long as I know. (at least 15+ years) It's sane, short and straightforward. For servers, this has been more than ideal for every setup I could conceive.


> through your firewall in non-obvious ways. People don't realize that with a default Docker configuration, containers are ignoring any normal firewall rules you may have setup with iptables or ufw.

What do you mean by non-obvious? If I bind a port of the container to the host eg. -p 8080:80 only this port is a hole. Do you have something different in mind? (I'm a docker beginner)


That alone is non-obvious. Docker adds prerouting and masquerade changes. They don't show up in the default iptables rule list (iptables -L). If you don't know about it creating the other tables it adds, and how to list them explicitly, you won't see these rules added. Docker binds to the public interface by default, rather than to much-safer private/localhost defaults. If you have an iptables INPUT DENY rule, it doesn't come into play at all, and that's certainly untuitive for people who aren't iptables experts. For example, if you're using ufw (which most non-power-users rely on) and have it set to DENY by default, that won't protect you from the example you provided. The way Docker works is simply wrong. It should not mess with iptables at all. It does so because that makes it "easier" for novice users, but that ease of use is a real security risk.


> The way Docker works is simply wrong.

And that's not only the networking. Everything about Docker is like that. It's a mess. Don't use.


Docker's custom iptables chains, upon restart of network or docker, can and most likely WILL clobber rules that used to work. Docker adds the DOCKER-USER user chain for the purpose of readding rules back after docker has set up its rules. Most engineers never have to deal with it unless they are deploying to production boxes.


I hope k8s (or some managed subset) eventually provides a stable networking abstraction. There is so much to consider when it comes to network hardening and I'd like it to not have to constantly stress about it, feeling like I don't know enough.


I think it’s time for us to get past “X is simple” or “X is easy.” It’s a highly subjective thing to say, no matter the topic. It’s never helpful, it just makes other people feel bad.


Well, to be honest, some level of sarcasm was meant to be in this title because it precedes a 4000+ words explanation of the topic... But seems like I'm pretty bad at writing good titles. For sure I didn't mean to make anyone feel bad.


FWIW, I thought it was obviously either stupidity[0] or sarcasm based on the title, and the 4000+ words explanation confirmed it as the latter. Can't speak for anyone else, though.

Edit: 0: well, or lies / container vendor shilling


Appreciate the response. Thank you.


I know a guy, he hated networking, so he decided to set up a kubernetes cluster on his home server to abstract all his networking difficulties away. Now he occasionally complains that the network inside the network inside the network is annoying because it doesn't play nicely with the network outside the network on top of the main network. Overall I feel the rate at which he talks about networking issues has increased, but his enthusiasm for talking about them has gone up, so I guess that's a win.


Yeah, I feel cursed by the same habit. I get to about a 75% understanding of a domain and feel like I'm about to run out of things to learn...so I expand the domain and continue until I'm at that 75% mark again...and repeat...


Brilliant write up. Lots of this topic is so much easier to understand when it’s presented from first principles, without any of the LXC or Docker helpfulness hiding the details.

(If the author is reading, thank you! I’ll likely use this material for the pupils in my computer club.)

Managing an IPv6 stack alongside IPv4 is also very informative. IPv6 is still not widely deployed — SMTP is likely tied to v4 for all eternity — but it’s incredibly useful for managing multiple sites of inventory over the internet. Seeing RFC1918 style 10.x.y.z private addresses and IPAM in use by internal ops and IT in 2021 brings tears to my eyes.

Adding a section on using conntrack to watch the way in which the kernel handles MASQUERADE and DNAT would be illustrative as well.

I really like the diagrams too.


Thank you very much for your feedback! I appreciate it a lot because at the end of the day that's what keeps me motivated!


The ip tool chain is a worthy thing to promote.

I especially like all of the replace verbs, which I wish I’d known about sooner. They make idempotency much simpler to express without any if ! ip thing get <long list of route args>; then ip add <same list of args> ; fi stuff.


That's cool! Here's a synopsis of `ip replace`: "replace will either modify an existing address or create a new one if the specified address does not exist"

https://serverfault.com/questions/476926/understanding-ip-ad...


This is a very good article.

Container networking is as simple as linux networking, which isn't.

Its as much of a faff now as it was when I was doing KVM virtualisation professionally (don't, pay and use VMware. You'll be much happier, and have lots of free time)

To debug its a massive arse and lacks any useful and friendly debugging tools.

One thing that does help is either to use VLANs or a second adaptor to separate container traffic from control. It makes debugging slightly easier. By easier, I mean that when you accidentally misconfigure it, there is a better chance that you'll be able to get control of the host still.


Besides VLANs, another way to separate control and data plane traffic is using VRFs.

https://people.kernel.org/dsahern/management-vrf-and-dns

https://people.kernel.org/dsahern/docker-and-management-vrf


This blog post reminded me of the post [1] I wrote about 7 years ago o_O

[1] https://cybernetist.com/2013/11/19/lxc-networking/


A lot of discussion around Docker in general is rehashing stuff that lxc did a long time ago, so this fits the mold.


It is if you use:

    --net=host


Which defeats the purpose of containerization because now the container can sniff traffic on the host, hijack TCP connections, etc.


If you're running untrusted code in your container, you've pretty much already lost.

Containers are useful for deployment and configuration, they are not a robustly secure sandbox. For that you still need to go with a VM.

No cloud provider will offer to run your containers alongside other customer's containers, on a shared kernel. Your containers always run within your own VM.


No, it is levels of security, not a white/black issue. For some customers and providers, containered processes are enough isolation. For many small businesses, even shared hosting with chroot is enough.

Generally speaking, Xen or Firecracker VMs do have smaller attack surface than containered processes on a shared Linux kernel. But configuration and exposed capabilities matter - it is possible to have container better secured than a VM (e.g. minimal Zones/jails env + correct MAC config vs. general Qemu/VMware VM with many default legacy devices and bad/no MAC config).

Motivated attackers can escape even these VMs. So they are not a magical solution.

Common hypervisors are too big and buggy to be pronounced as security panacea. From time to time, VM escapes resurface to public but most are probably guarded and being exploited in quiet. As we know after Spectre and Meltdown, standard computing technology is buggy/bugged all the way down to hardware.

If you want really "robustly secure" server environment, such do exist: for example, separation kernels like the L4 family or the Green Hills INTEGRITY systems. But for web apps, almost nobody bothers.

> No cloud provider will offer to run your containers alongside other customer's containers, on a shared kernel. Your containers always run within your own VM.

Joyent does - via SmartOS zones.


> From time to time, VM escapes resurface to public but most are probably guarded and being exploited in quiet. As we know after Spectre and Meltdown, standard computing technology is buggy/bugged all the way down to hardware.

To my knowledge there has never been a successful escape from the VMs offered by AWS, GCP, or Azure. That would be a pretty big story.

> If you want really "robustly secure" server environment, such do exist: for example, separation kernels like the L4 family or the Green Hills INTEGRITY systems. But for web apps, almost nobody bothers.

What's the reason none of the major cloud providers use seL4? Missing features? By seL4's own account their performance is exceptional, but perhaps its performance can't compete against a hardware-assisted system like AWS Nitro?

> Joyent does - via SmartOS zones.

Thanks I'd not heard of that.


You won't hear much about the VM escapes in those clouds because there is strong incentive for almost everyone to be quiet about them and patch as soon as possible so there "isn't a story". Some researchers do want to make a name for themselves and publish before the patch, but those are minority. And publicly attacking a big corporation is quite risky so they prefer presenting the bug as general problem in the technology. But the bugs in hypervisors exist, see CVE's for Xen for the past decade.

See also

https://security.stackexchange.com/questions/130274/how-do-b...

https://nakedsecurity.sophos.com/2015/05/14/the-venom-virtua...

Regarding L4, I do not know. Probably it is very different and cumbersome to work with compared to linux.


> Some researchers do want to make a name for themselves and publish before the patch, but those are minority. And publicly attacking a big corporation is quite risky so they prefer presenting the bug as general problem in the technology.

That doesn't sound right to me. The industry norm for security research is 'responsible disclosure', which is intended to give the vendor reasonable time to implement the fix, while eventually publishing the knowledge for all to know.

Unless they're simply being paid for their silence, I can't imagine a security researcher wanting to keep quiet about a major achievement like that.

> bugs in hypervisors exist, see CVE's for Xen for the past decade

Sure, but I'm talking specifically about the big 3 cloud providers, not vanilla Xen. Amazon in particular have gone to pretty extreme lengths with their Nitro system.


I have to say, it would be cool if they did. If I could get a dirt-cheap rate for running batch workloads in a potentially antagonistic environment, I could make use of that. Not all data is sensitive.


It wouldn't save you that much compared to just going with a VM, especially if it's computationally intensive.


It is how Heroku has operated since forever. Containers can be secure.


I didn't know that about Heroku. They give you a choice:

> Performance and Private dynos do not share the underlying compute instance with other dynos

https://devcenter.heroku.com/articles/dynos#isolation-and-se...


I know many people use single tenant hardware to run containers, so I wouldn’t say that it defeats the purpose of containerization, just one of the benefits.


The confusion and disagreement on this topic is rooted in faulty assumptions about other organizations' requirements. Many people use containers with host networking, without uid namespaces, without pid namespaces, without bind mounts. What you think is "a container" may not be at all universal. Also the idea that host networking grants all processes CAP_NET_RAW is just weird and wrong.


I believe the purpose of containers is different for different people. build vs deploy.


It's not rocket science, but it's not entirely simple because there are a lot of moving parts, because IP networking is annoyingly complicated, and because we are stuck with IP addresses and interfaces that they are attached to, with many of these low-level details percolating up to user code. I like Linux's a-la-carte container APIs but their flexibility seems to increase complexity, as you end up dealing with multiple namespaces as well as cgroups, etc..

Then there are Kubernetes and Docker which add a lot of complexity on top, while providing some levels of abstraction which don't entirely eliminate the need for understanding the underlying internals.

Having some more mid-level container and networking APIs would be nice, but I'm not sure it would solve the complexity problem.

Guaranteed employment for networking/container/linux experts I guess.


Why does no one talk about IPv6 this day and age?


IPv6 doesn't solve the difficulties of container networking. Containers have private network namespaces, which means that a container can't communicate with other hosts without (1) a dedicated interface or (2) a bridge (basically a software switch) for host-to-container communication.

These problems are Layer 2 problems, while IPv4 and IPv6 are at layer 3.


Maybe they find it too cumbersome to memorize and type an ipv6 address. Or grew up and cut their teeth on v4.


With IPv6 there's not enough tinkering with service meshes, port forwardings, workarounds for private ip address conflicts and VPCs to write articles about.


Benefits don't yet outweigh the cost for many, and years of familiarity/habit mean that it would need some really compelling advantages to get used.


Came here to say this. ULA would have been stacks better than this.


Indeed... VIMAGE and Crossbow are such great technologies for virtualizing container network stacks. Couldn't imagine life without them anymore.


You can have a lot of fun with network namespaces in Linux, like running some programs in their own network/VPN.


i'd rather learn the real network stack. containers are not at all simple. in fact, it's made me adopt freebsd for my next project and i love it.


Are macvlan and macvtap not container things? Are they more for VM networking?


This is the first article i see on HN with (2020) in parens to show that it's a dated article

=\


how is that simple? :O


> Working with containers always feels like magic.

Containers are just application - which existed for many decades - in a single-process (roughly) operating system. Like MS DOS.

Nothing magical except containerized processes can't talk to each other directly - there are security boundaries.


> in a single-process (roughly) operating system.

Containers don't (generally) run in their own operating system, just their own userland. It can actually be a source of security vulnerabilities to assume that containers are completely isolated from each other, such as running a root user container in production, assuming it can't get privileged access to the host. It's less similar to a bare-metal MS DOS application than it is a glorified chroot jail


> Containers don't (generally) run in their own operating system

Right, but containerized application can't (ideally) talk to other applications on the same machine, that's how it's similar to a single-process OS with a single app running. Of course there are details like a single application may still contain multiple processes from OS standpoint, but the overall comparison stands.

> It's less similar to a bare-metal MS DOS application than it is a glorified chroot jail

These two cases are similar enough from containerized application standpoint (only OS services are different than those of MS DOS).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: