Hacker News new | past | comments | ask | show | jobs | submit | ftp-bit's comments login

Perhaps I'm misunderstanding or don't have a good enough grasp of this, but, in what circumstance would you need to parse gigabytes? I've only seen it be used in config files, so...


What usually happens is someone creates an API, one which did not initially have to handle much data, and then it just grew over time. (I guess it's similar to how a lot of the Internet's early application-layer protocols like HTTP, SMTP, etc. are text-based --- the text format was initially more "convenient" for a variety of reasons, but obviously is not very efficient at scale.)

Or, perhaps a more common scenario today, it was designed by people who simply had no knowledge of binary protocols or efficiency at all --- not too long ago I had to deal with an API which returned a binary file, but instead of simply sending the bytes directly, it decided to send a JSON object containing one array, whose elements were strings, and each string was... a hex digit. Instead of sending "Hello world" it would send '{"data":["4","8"," ","6","5"," ","6","C"," " ... '


Log files? More and more places are switching to easily machine-parsable logs to run statistics and checks over, and JSON is a common format (e.g. because it's still somewhat human-readable and will work over logging infrastructure set up to transport lines of text)


There are some quite big JSON files out there; you might also be interested in parsing megabytes but not spending more than 1ms to get through it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: