You should factor in not only license cost, but the total cost of maintaining and properly patching (as well as the mandatory unscheduled downtime that brings) a Windows stack.
Even if they give you the licenses for free, it's much more expensive.
As little as possible. And again, in a high-availability infrastructure I can patch each machine without downtime. I was contending your point that the downtime was mandatory and unscheduled. Windows never forces you to update at any time. It strongly suggests, sure. By default, the option is auto-update, but that's configurable and not required.
Well... You have to factor in the cost for the added HA infrastructure when you talk Windows. Patching Linux seldom requires more than a couple seconds for downtime for the system and I can often patch and bring up machines before memcaches and varnishes start expiring. Patching Windows often takes half-an-hour of downtime following another similarly extended period of horrible performance when the patch installer is doing whatever it needs to do, per server.
It's different to design a HA cluster that can sustain a minutes-long usage peak (due to a node going down for maintenance) than one that can sustain an hour-long outage of one of the nodes.
It all depends, in the end, on how the applications are designed to run - they are the ones that will decide how many servers you can take down at once. In any case, the shorter they are down, the better.
Consider a site with 3 front-facing web servers that are affected by a 0-day. If a server can be patched in 2 minutes, it will result in 6 minutes of 150% load on the servers not being maintained. If the patch takes an hour, you will have 3 hours of 150% workload. The internet user may not see the downtime, but you have to factor in the higher load. The best solution could be to use 4 or 5 servers instead of 3, but that would also increase your vulnerability window.