Hacker News new | past | comments | ask | show | jobs | submit login

The problem with CSV is it doesn’t specify encoding at the data layer. Somewhat counterintuitively since it has the word “comma” in its name.

No it’s more correctly thought of as a protocol for representing tabular structures as “delimited text”, but DTTF doesn’t have the same ring to it unfortunately.

This faffing around specifics makes CSV as a concept more flexible and “well defined enough” for its main user base, at the cost of simplicity and portability.




The CSV RFC is oriented toward CSV being a MIME type. The line separator in CSV is required to be CR-LF. This can occur in the middle of a quoted datum, and the spec doesn't say whether it represents an abstract newline character or those two literal bytes.


My understanding was this would terminate the record unless enclosed by “encapsulators”, whereupon indeed it would be interpreted as literal text.

Though defined as CRLF in the RFC, presumably for interoperability, you are typically free to define alternative record separators, as well as field separators and encapsulators, and most modern implementations would be smart enough to work with this.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: