Hacker News new | past | comments | ask | show | jobs | submit login
8 Reasons why XML Sucks (firstclassthoughts.co.uk)
10 points by arthurk on July 15, 2008 | hide | past | favorite | 7 comments



XML is character based

No, XML is binary. The <?xml at the beginning of the file is the "magic number" that lets the XML parser guess enough of the encoding to read the "encoding='whatever'" part, and then the rest of the file. Everything is binary though, never treat it as text!

The author claims that most editors use latin-1 character encodings; I don't think this is correct. Everything uses UTF-8 these days, and if your UTF-16-encoded XML has a BOM at the beginning, it will use UTF-16 instead. So basically, this is not really a problem. You can use your text editor to edit XML.

In the end, it sounds like the author doesn't like file formats that take more than the minimum possible way to store the data. XML is pretty verbose, but since disk space is cheap and libraries do all of the generation/reading, I don't really care. XML is far from the solution to every problem, but it is much quicker than rolling my own format which nothing else understands. (BTW, did the author ever try gzipping the XML? That should make it much smaller. libxml2 can even operate directly on gzipped XML files.)


So I tried gzipped XML versus just dumping out 4 bytes longs; here's a script that makes an XML file with 1000000 integers, and puts the same data in a binary file next to it:

http://scsys.co.uk:8001/16872

The gzipped XML uses about 5.5M, the binary file uses around 4M. So yes, XML uses more space, but you have to ask if 1.5M for 1000000 records is worth inventing your own format that nobody else understands. Sometimes it might be, other times dealing with XML's trade-offs might be a better use of your time.


Look, it's simple: XML is a markup language. It's for documents. Don't use it for data.


Some of those reasons seem a bit strange to pin on a technology that from my understanding has never claimed to be good at encoding binary data or none hierarchical data.

XML is used to create custom mark-up languages for documents, so why you would put binary data in it I don’t know, surely you’re missing the point if that’s what you’re doing. Perhaps store a link to a binary file rather than encoding the binary inside the XML.

As for the verboseness of it, that’s a trade-off that you consciously make when you choose to use XML, if you want to use binary data and have small files then use a binary format. You gain file size and efficiency loss, but you also get the added advantages of interoperability over hardware and software systems - which is probably the main reason you would want to use XML in the first place.

As for human readability, I don’t think the creators of XML claimed it to be readable like a book, but rather a human being could look at it and see language rather than gobbledygook! It might be hard to understand when there’s loads of XML but it’s still 'readable'.

Editor problems aren’t really the fault of XML either, that’s a bit like saying a song sucks because you’ve got a rubbish CD player.


It would be sweet if he showed his BML format to compare and contrast in regards to his list of grievances.


I agree with just about everything he said. I hate XML and find it extremely tedious, with almost no payoff. I think it's useful when using it in the context of XSLT transformations, but as protocol to exchange data between two services etc., it's far more work than it's worth.


A ninth reason:

http://www.tomychen.org/?p=8

(Look at the includes)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: