Julia writes: "I think "violates a lot of normal Unix assumptions about what nor...

iheartmemcache · on Sept 16, 2016

Oracle has had multi-homed master-master RDBMS setups for > 10 years. I'm pretty sure a half-competent Oracle administrator wouldn't be really 'surprised af' at functionality that's been in Oracle for at least a decade.

For things that need 'care', this has been a solved problem for decades. Banks[0] homed in the WTC on Sept 11 kept on running because OpenVMS has had NUMA clusters and multi-node replication since the DEC Alpha days. This is with 100% transactional integrity maintained and DC failovers measured within the order of 500ms to 5s. (Obviously banks don't all run on VMS.)

Platforms exist like IBM z systems let you live upgrade zOS in a test environment hosted within the mainframe to see if anything breaks, in complete isolation from production of course, revert snapshots, and do basically everything the whole ESX suite (from things like live migrations of VMotion, to newer stuff like growing raid arrays transparently / virtual storage solutions where you can add FC storage dynamically and transparently to the end user). Their stock systems let you live upgrade entire mainframes without a blip. They're built to withstand total system failure (i.e. literally processors, RAM, NICs, and PSU's could all fail on one z13 and you'd have fail-over to a hot-backup without losing any clients attached to the server). HP's Non-Stop, with which I have no experience, offers a similar comprehensive set of solutions.

[0] On Sept 11, a bunch of servers went down with those buildings. * “Because of the intense heat in our data center, all systems crashed except for our AlphaServer GS160... OpenVMS wide-area clustering and volume-shadowing technology kept our primary system running off the drives at our remote site 30 miles away.” --Werner Boensch, Executive Vice President Commerzbank, North America* http://ttk.mirrors.pdp-11.ru/_vax/ftp.hp.com/openvms/integri...

felixgallo · on Sept 16, 2016

I'm saying that an arbitrary number of exact replicas of a master can magically appear on the network believing they are the one true master, identifying themselves as such, and expecting to act that way. Additionally, an arbitrary number of database masters expecting to participate in the cluster may show up or leave at any time. That is somewhat nontrivial for even modern databases to deal with.

bmurphy1976 · on Sept 16, 2016

Why run your database inside kubernetes though? We've always white gloved our database (and a few other special services). You don't have to put 100% of your infrastructure in docker/kubernetes.

finnh · on Sept 16, 2016

That's felixgallo's point exactly.

mentat · on Sept 16, 2016

If you're running multiple copies of anything that cares about the concept of a master it better have its own consensus algorithm. Luckily such things exist and are open source.

philips · on Sept 16, 2016

I think Kubernetes does a good job creating a normal "Unix process environment".

The Pod concept allows for:

    - Container processes share localhost, mount points, etc
    - Providing a "normal" IP address that is routable
    - Ensuring a PID1 can monitor the group of processes (as done by rkt integration)
    - Allowing for normal POSIX IPC (signals, etc)

More here: http://kubernetes.io/docs/user-guide/pods/

As for PetSets I do agree that they need more work to support things that are replicated but not cluster aware. It doesn't magically solve the issues of distributed systems. Also, natively cluster aware things might be better served by controllers. See this demo of an etcd controller:

https://youtu.be/Pc9NlFEjEOc?t=18

felixgallo · on Sept 16, 2016

It definitely does better than many of the rest, in my experience, and for sure it has better defaults and chooses its violations carefully and generally wisely. In fact, I wrote the first draft of a paper on this specific topic:

https://docs.google.com/document/d/1hw_0edCtZ8D4FYhc6oNRTAXm...

delineating some of the more difficult and surprising violations and some possible remediation steps.

GauntletWizard · on Sept 16, 2016

Having been inside Google when Docker started to get big, there's a really simple explanation for all of this:

Kubernetes is a well designed descendant of a well-designed API with pretty specific tradeoffs for distributed systems (that mostly still work at the small scale).

Docker is a reverse-engineered mishmash of experiments attempting to replicate the same ancestor. Things like the horrible network abstraction layer - Google had the advantage of being able to move all their apps to a well understood naming scheme, rather than treating IP addresses as immutable. That any app does this is technical debt, but it worked for a long time. Now it doesn't.

Docker has tried to fix these things by wrapping them, not fixing the underlying debt. That only ever accumulates more debt, and rarely even provides the stopgap solution that is required. It's an admirable effort, and they've done a fantastic job - but a fantastic job at a fool's errand is still not behavior to emulate.