Some companies policies require validation and they require specs that are nailed down. In those cases you'd end up using HTML <5 and that won't validate with data-x.
I understand the allure of an objective way to evaluate the "quality" of your code... but that seems ridiculously naive. I'm pretty confident I could come up with something that uses features in the spec that nobody ever implemented, so it would be fully validated and correct and yet totally nonfunctional for actual users.
HTML5 is a working draft spec. It hasn't been finalised yet. This isn't usually important for most of us but companies that like calling themselves ISO900X etc etc. don't usually want to work with draft specs.
The downvotes (I didn't even know you could downvote here!) are likely disagreeing with the idea of validating as a talisman. "It validates! Yay, it must work!". As HTML5 formalises much of what already exists, it's hard to move to HTML5 and break things - HTML5 isn't just video, audio, canvas etc, it's also a more sane doctype, ability to omit attributes that only ever have one value (type on script elements for instance), ability to nest things inside anchors etc.
Also, as we all know by now, XHTML doesn't make any sense with the browsers that exist – particularly as it's rarely actually valid XML, and even rarer, sent with the right MIME.
In short, XHTML doesn't exist in any practical sense and HTML5 subsumes HTML4 + modifying the stupid bits to fit with what browsers actually do. There isn't really any logical reason to not use HTML5 syntax, though of course, using the new features can be problematic.
I understand that your logic for validating is likely your companies decision and not your own view, and I'm not attacking your values or opinions in anyway.