> it sounds like they might have separate "validation" code
That's what stood out to me. From the CS post: "Template Instances are created and configured through the use of the Content Configuration System, which includes the Content Validator that performs validation checks on the content before it is published."
Lesson learned, a "Validator" that is not actually the same program that will be parsing/reading the file in production, is not a complete test. It's not entirely useless, but it doesn't guarantee anything. The production program could have a latent bug that a completely "valid" (by specification) file might trigger.
I'd argue that it is completely useless. They have the actual parser that runs in production and then a separate "test parser" that doesn't actually reflect reality? Why?
Maybe they have the same parser in the validator and the real driver, but the vagaries of the C language mean that when undefined behavior is encountered, it may crash or it may work just by chance.
I understand what you're saying. But ~8.5 million machines in 78 minutes isn't a fluke caused by undefined behavior. All signs so far indicate that they would have caught this if they'd had even a modest test fleet. Setting aside the ways they could have prevented it before it reaching that point.
That's besides the point. Of course they need a test fleet. But in the absence of that, there's a very real chance that the existing bug triggered on customer machines but not their validator. This thread is speculating on the reason why their existing validation didn't catch this issue.
That's what stood out to me. From the CS post: "Template Instances are created and configured through the use of the Content Configuration System, which includes the Content Validator that performs validation checks on the content before it is published."
Lesson learned, a "Validator" that is not actually the same program that will be parsing/reading the file in production, is not a complete test. It's not entirely useless, but it doesn't guarantee anything. The production program could have a latent bug that a completely "valid" (by specification) file might trigger.