They're all good questions. The thing that reads the config should have been fuzz tested with something like AFL. Likely should have a lot more tests. Maybe shouldn't run in a device driver. There's almost no doubt there are engineering process and culture issues here.
> Rollback is hard I guess once your OS can't boot.
This is why the client needs have enough error handling to realise it's latest update has now caused unsuccessful boot and roll that update back locally to the last known good configuration (or completely back to factory and pull all updates again).
How a common bug was rolled out globally with no controls, testing, or rollback strategy is the right question