Hacker News new | past | comments | ask | show | jobs | submit login

The Unix Philosophy:

A) Tasks should done by combinations of simple tools combined into an elegant pipe.

B) Each tools should do three things, one of them at least so-so well:

Those three are:

1) Hackily parse the output of the previous tool by flaky assumptions like that a certain delimiter is always present, that spaces will not occur in such and such an item, or that such and such a field is from this column to that one and never overflows.

2) Do the actual processing correctly and efficiently --- provided nothing in the data, exceeds a 1023 character limit, or overflows an addition of two ints or doubles.

3) Produce output in some way that is hard to parse correctly for the next tool, like columns that may be empty, contain items with spaces, or overflow so that column widths are not reliable.




That's a very twisted definition of Unix.

1) This is not done by every tool. Instead, there are tools built for specific parsing purposes, and each tool is only in charge of interpreting its command-line arguments or, optionally, stdin in some suitable way. Tools are agnostic of the output of previous tools, and only care about processing data that makes sense to them. It's the task of the user to ensure this data is structured correctly.

2) Where are you getting these limits? Shell limits can be controlled by ulimit(1p), but the standard file descriptors function as unlimited streams of data.

3) Again, every process is free to choose the best way to output data. Some have flags that make processing the output easier by another tool, otherwise they default to what makes sense for the user.

You seem to have a bone to pick with the fact that the data shared between processes is unstructured, and that the user must handle this on their own, but this is what enables building independent tools that "do one thing well", yet are still able to work together. Sure, in a tightly controlled environment, tools can share structured data (e.g. objects in PowerShell, JSON in NuShell, Murex, etc.), but this comes at the expense of added complexity, since each tool needs to handle this specific format, encoding, etc. This is difficult to coordinate and scale, and arguably the rich Unix ecosystem wouldn't exist if some arcane format was chosen 40+ years ago, or if a new one needs to be supported whenever something "better" than JSON comes along. Leaving the data unstructured and up to each process and user to handle, ensures both past and future compatibility.


What if the format chosen 40+ years ago wasn't arcane, and arguably you could avoid the legitimate mess of parsing poorly structured data for 40+ years?


And what format would that have been? It would have predated XML and JSON, so it must've been a bespoke format created for this purpose.

Whatever it were, it would need updating, which means all tools would need to be updated to support the changes, while maintaining backwards compatibility. This is a mess in practice, and probably only acceptable to a single project or organization that maintains all tools, but it's not something that allows an open ecosystem to grow.

It's naive to think that a modern solution can "fix" this apparent problem. New shells and environments can be created that try to address it, but their future is uncertain. Meanwhile, the fact that Unix still exists today in many variations is a testament that those early decisions were largely correct.


The downsides you cite are even worse for the unstructured data, it's an even bigger mess in practice with poor "current compatibility"

And the fallacy of alive=right is also worse than "naivety" since it prolongs the pain for a few more decades longer than necessary (it's a big part of the reason why all those much better tools face uncertain future)


Plain is arcane?

Mime type: text/arcane




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: