Hacker News new | past | comments | ask | show | jobs | submit login

I was under the assumption docx was simply an XML wrapper in a well known format. MS can't come in and break it without creating a new extension, invalidating deprecation concerns. That said - I do everything in markdown regardless so I can use a code editor more comfortably.



> I was under the assumption docx was simply an XML wrapper in a well known format. MS can't come in and break it without creating a new extension, invalidating deprecation concerns.

Sort of.

It's a ZIP containing a collection of XML files. The actual content is a single file, but you need separate ancillary XML files for things like styles, links, headers/footers, numbering schemes and so on. Each with complicated namespacing and nesting rules and each file referencing items in the others.

The area where it is most at risk (for me personally in practice) is the lax handling of the file by Word itself. As a specific example there are nested XML elements where sibling child elements define properties on the main element, such as properties that define paragraph styles.

Being siblings they should be supported in any order, but in practice generating DOCX files and importing them into Word will fail for no obvious reason, until you reorder them in the raw XML (even though they are still at the same level in the hierarchy). Then they work.

In other words, it's less the 'spec' and more the MS implementation that makes it fragile. And different versions of Word can have different behaviour in that regard.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: