Hacker News new | past | comments | ask | show | jobs | submit login
Thundernetes makes it easy to run your game servers on Kubernetes (github.com/playfab)
64 points by pjmlp on March 27, 2022 | hide | past | favorite | 44 comments



Hey there, this is Dimitris from the Azure PlayFab Multiplayer Servers (MPS) team, where we develop both the MPS service and Thundernetes, happy to answer any questions. For realtime communication, you can find us on the Microsoft Game Dev on Discord https://aka.ms/msftgamedevdiscord


Hi. Can you say how Thunernetes compares to Agones for running game servers on K8S?

https://agones.dev/site/


Is this effectively v3 of Thunderhead? Good to see the pivot towards using k8s, the 20+ minute propping of a 1-2GB container on vanilla Playfab/THv2 really made it a non starter for our company's internal playtesting.

Also is Hesky still around these days?


Yup, he is still around :) I'd call Thundernetes a "spin-off" more than a pivot, tailored to the folks that want full control over their game servers, compared to the more "SaaS" model of MPS (code name "Thunderhead"). Have you tried the MPS service recently? We've done lots of work to improve startup times.


I feel like the secret sauce here and Agones is basically public IPs per node, a pre-warm mechanism (and the software to route games to warmed servers), and an assumption that games are short lived enough that nodes can eventually get scaled down as games fall off.

Seems well and good. I'm not sure if you lose any real DDOS protection by going through a load balancer.

I do have to wonder if its worth running game servers in k8s or just use autoscaling instances without the pod concept. You're really fighting against what k8s does well, for what I assume is no benefit. Thoughts?


see their udp proxy. so you do not make ip public, but proxy these.


Whose udp proxy?


in same org as agones



Don't game servers need to run on bare metal to minimize latency?


This is my field, so even though I’m literally titled “expert”[0]- take what I say with a pinch of salt.

When it comes to gameserver performance what tends to be the biggest issue is predictable performance. So, kubernetes is not inherently antithetical to running game servers as long as you don’t share the cpu core(s) with anything.

In fact there is a project that (while I was not a part of) was spearheaded by ubisoft in collaboration with Google for running game servers on Kubernetes (called Agones)[1].

Ubisoft also used “Thunderhead” (which this project seems to be a variant of) on Azure with rainbow 6 siege. So it’s definitely not considered a show-stopping problem.

When it comes to containers/VMs the biggest hit in terms of performance is in this order: disk/network/memory access/CPU ;; when talking about game servers the most important things go in the reverse of that order, which means that the biggest performance losses are not as important. Network latency might seem like a big deal but you’re limited to frame times in the best case and geographical differentiations will eat much more than the combined weight of frames and Kubernetes networking.

Agones itself bypasses a lot of the Kubernetes networking latency additions, though, not all.

[0]: https://www.linkedin.com/in/jharasym

[1]: https://agones.dev/site/


The Siege team also did a pretty interesting talk about running game servers in the cloud and the trade offs : https://www.gdcvault.com/browse/gdc-17/play/1024036

It’s quite old now but interesting for anyone whose experience is purely running on their own hardware.


Not necessarily in my experience! Most games don't hit the CPU particularly hard.

Mostly reliant on (hardware offloaded) network latency, memory, and storage. Consistency being the thing of most importance. A spike is felt - interpolation can only go so far

CPU virtualization extensions make the latency cost of VMs hardly noticeable, containers are technically better but it's a split hair.

On an unloaded host and a single VM you can achieve essentially identical performance as bare metal - the CPU extensions, pinning cores, and huge pages are key

Some performance metrics even improve! Disk operations tend to do better from an added layer of memory involvement

Edit: there are options for multi-tenant fairness whichever way you go -- Bare metal, VMs, or containers

Performance isn't much of a worry, more so the 'handles'


How important is GPU access for game servers?


Not important at all if the game server is properly isolated from the game client.


Very little, most of them run headless with no graphics

There are exceptions. For years you could only run Satisfactory servers by the server acting like a client - rendering and all


Using container technology doesn't necessarily mean adding virtual network devices/proxying/forwarding. In practical terms, you can run in "host" networking mode (no separate network namespace) on Docker or Kubernetes, and most people would still consider those "containers".

In this case they seem to be NAT'ing packets from a host port on the correct Kubernetes node (the one running the container) to a port in the container, which can be done fast enough with iptables (or similar mechanism used by Kubernetes).


I think keeping that kind of stuff to the minimum would be important.

iptables is a good example -- it can scale rather poorly! Packets are run across the chains at length until a matching rule is found.

For most configurations this isn't a problem - the rules are filtered against quickly.

If density reaches the point to where you have thousands of forwards, it'll slow down a lot!

You'll want to look into optimizations (eg: ipsets), offloading to hardware, or simply going to host networking


If you're running a handful of game servers on each machine, each with a single NATed port, you're fine. At bigger scales it's a problem, but as you mention there are better solutions now.


Not really. Minecraft server notoriously ran in JVM for most of it's existence. Container virtualization is unlikely to add more than a few microseconds of latency.


> Minecraft server notoriously ran in JVM for most of it's existence.

Alas, it still does; at least for the Java client.


Nothing about kubernetes implies virtualization. People only use VMs because they're easier to provision. Technically you could provision physical servers with kubernetes and run containers on them with zero overhead.

Or perhaps more realistic you could provision a tier of very low latency VMs.


Not really. The major concern is getting clients to one machine and running a stateful connection so you don't need to pass messages through some kind of slow, multi-hop broadcast system. The CPU overhead of containers or vms is pretty minimal compared to this main concern.


What about Agones? Earlier, I know of more folks using it... https://agones.dev/site/


Should "easy" and "kubernetes" ever really be in the same sentance?

Not being snarky, just realistic - I feel like cloud providers sell "kubernetes" as the only way to properly develop apps, because you can be cloud-agnostic, without lock-in, you have full control over your infrastructure.

Without realising, that for most projects, running a kubernetes stack requires a person just for tuning and maintainance. Kubernetes is great - when you hit massive problems of scalability, but most people aren't going to be facing the problems kubernetes solves.


These days it feels like the real value of kubernetes is not the ability to "planet scale" but actually the control-plane API. With tools like kind, it actually is really easy to get going locally, no cloud needed.

I see no reason to not use it, even if your deployment cluster lives on a single vm. The API is worth it.


delivering your service/app via kubernetes is slowly becoming the standard way, just like an universal binary format. so as a developer you should build towards delivery for kubernetes. but you are certainly right that one should avoid running your own kubernetes infrastructure - at least until you can afford the accompanying expenses


Very cool project, but I wonder about the economics about using public cloud providers as a gaming host where compute and network are cornerstones to their profitability.


The vast majority of game servers are hosted on public cloud providers. They turn out to be a pretty perfect example of an ideal cloud workload:

- Very elastic demand curve

- Generally ephemeral with any necessary persistent state stored off-host

- Benefit from geographic distribution of cloud regions because they're generally quite latency sensitive


That’s not true in my experience. The cost of running on public cloud is extremely prohibitive.

There are some things that can run on cloud, but the heavy hitting game servers should really be bare metal for the baseline.

I gave a talk about this in Stockholm once, unfortunately it wasn’t recorded but I can share the slides if you want.


It's very true in my experience, and we've looked pretty hard at all the options. Thanks for the offer of the slides, I have first hand experience with this at scale.

It certainly depends a lot on how sophisticated your autoscaling is and how closely you're able to follow demand to limit waste, how well you can manage per-host utilization, whether you are CPU or memory bound, and lots of other factors. But even at truly massive scale the cloud hosting option can be very competitive without nearly as much management overhead.


Interesting! I have the opposite experience and I also operated at quite a decent scale.

I wonder why our experiences are so different.

For context I was running 2,500+ instances at peak with 40vCPU and 256GiB of memory: the most expensive of those of course being in the regions with low density of players like South America and Australia.

We also had a predictive auto-scaler for our cloud components, I would estimate that our waste was 15% at any time, but if we ran bare metal only it would have been cheaper (except for the operational cost and the fact we needed lengthy commitments for hardware)


We run rather a lot more than 2500 instances at peak and a variety of instance types dependent on game mode. Our waste is on the order of 5-7% depending on the time of day -- we can run less waste on the downward slope of the demand curve for instance.

We're also seeing some pretty compelling results from ARM based instances: https://www.wired.com/sponsored/story/changing-the-game/


Aha, Fortnite. You guys will get discounts that we will never get, please keep that in mind.

Our waste was the inverse of yours, less waste on the upswing and more waste on the downswing, due to the long lived nature of our instances.

One instance takes roughly 1,000 players, but if you as a player decide to just keep playing, then we won't kick you out of the game for some hours. Some percentage of players will just continue playing as long as possible without matchmaking or map transfers.


Not nearly as ephemeral as web traffic though. Games usually last longer than a spot instance could normally support.


It might be more expensive to run but you have savings in terms of time to deployment, orchestration and the overall maturity of the cloud itself.

Managing bare metal at scale is challenging.


This is very true. But never underestimate the cost of running gameservers in cloud. For us (with really good discounts and a fantastic custom made predictive auto-scaler) it was still more than double the cost than using our previously used bare-metal outsourcing provider.

The cost difference would have been enough to hire 2.5x my team as additional resources, but that differs with scale (and so do the discounts)… So YMMV.


You can always run Kubernetes on bare metal. This seems easy in this case because you don't need persistent storage or load balancing to use Thundernetes, and you gain the ability to use the generic Kubernetes tools for self-healing, scaling, rolling deployments, etc.


I guess it makes sense for Microsoft to be able to put Xbox game servers in Azure but also game devs can benefit from the scalability of Kubernetes. If have is low tire not paying for hardware you aren't using and vice versa.


This is all so interesting. I've not touched game servers for 6-7 years but boy am I glad that things have progressed somewhat!


i would perfer ms to fix well known bugs(aks, terraform, az resources, windows) and make agones well integrated (and windows well integrated) than producing their own. why ms could not patch agones?


Finally the hard problem of spinning up game servers is solved.


Would like to know your 2cents :)


Pick a cloud platform and write your own system to manage instances. All clouds come with REST APIs for allocating instances and you can manage a warm pool yourself, it's not that hard and for a game this is a core competency. Why bring in something complex like Kubernetes into the mix if you don't strictly need it.

The larger concern here is the egress bandwidth costs from large clouds, which if you don't have Fortnite scale and favorable deals, will cost you ~9c per-GB. If you use bare metal, you usually don't have to pay that, the cost is baked in to the instance.

Cons of bare metal being difficult provisioning, generally having to purchase instances ahead of time whether you need them or not, and having them sit around and not be used when you are off peak time.

Best to write your own service provisioning layer that can support both bare metal and cloud instances, that way you fully understand it, can make changes to it, and it becomes a core competency for your team.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: