Hacker News new | past | comments | ask | show | jobs | submit login

Meanwhile, Excel exports to CSV as “semicolon separated values” depending on your OS locale



Albeit for fairly justifiable reasons


Justifiable how?


Well, Excel has a lot of common use-cases around processing numeric (and particularly financial) data. Since some locales use commas as decimal separators, using a character that's frequently present as a piece of data as a delimiter is a bit silly; it would be hard to think of a _worse_ character to use.

So, that means that Excel in those locales uses semicolons as separators rather than the more-frequently-used-in-data commas. Probably not the decision I'd make in retrospect, but not completely stupid.


Decimal separators being commas in some locales?


They could have just ignored the locale altogether though. Put dots on the numbers when using csv, and assume it has dots when importing


This exactly. Numbers in XLS(X) are (hopefully) not locale-specific – why should they be in CSV?


CSV -> text/csv

Microsoft Excel -> application/vnd.ms-excel

CSV is a text format, xls[x], json, and (mostly) xml are not.


Commas are commonly used in text, too.


Clearly they should have gone with BEL as the delimiter.

  printf "alice\007london\007uk\nbob\007paris\007france\n" > data.bsv
I'm hoping no reasonable person would ever use BEL as punctuation or decimal separator.


If one was going to use a non-printable character as a delimiter, why wouldn't they use the literal record separator "\030"?


Every time you cat a BSV file, your terminal beeps like it's throwing a tantrum. A record separator (RS) based file would be missing this feature! In other words, my previous comment was just a joke! :)

By the way, RS is decimal 30 (not octal '\030'). In octal, RS is '\036'. For example:

  $ printf '\036' | xxd -p
  1e
  $ printf '\x1e' | xxd -p
  1e
See also https://en.cppreference.com/w/cpp/language/ascii for confirmation.


On the off chance you're not being facetious, why not ASCII 0 as a delimiter? (This is a rhetorical question.)


ASCII has characters more or less designed for this

0x1C - File Separator

0x1D - Group Separator

0x1E - Record Separator

0x1F - Unit Separator

So I guess 1F would be the "comma" and 1E would be the "newline."


https://stackoverflow.com/questions/8695118/what-are-the-fil...

I am pretty sure you shifted the meaning, the decimal separator is part of the atomic data it does not need a control character.

You would use 1F instead of the comma/semicolon/tab and 1E to split lines (record means line just like in SQL).

You could then use 1D to store multiple CSV tables in a single file.


Yes but then the text is not human readable or editable in a plain text editor.

This would confuse most users of csvs they are not programmers they at most use text editors and Excel.


I am not proposing to do this, but if you were to use ascii separators you would do it this way


There are some decent arguments for BEL over NUL, however given you posed that as a rhetorical question I feel I can say little other than

ding! ding! ding! winner winner, chicken dinner!

Although BEL would drive me up the wall if I broke out any of my old TTY hardware.


...and excel macros


Sure, let's put quotation marks around all number values.

Oh wait.

lol




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: