This is something that reiserfs was trying to solve (though unfortunately we all know what happened with Hans and Nina Reiser), so most of the ideas have been left on the table.
But yeah, the solution he proposed was to start combining different information namespaces (such as the "metadata" namespace that you are alluding to) into the filesystem namespace, as well as allowing for the filesystem to be pluggable so people could write databases that could be searched with the filesystem.
PowerShell is an extremely nice, cross-platform option. Most people are discouraged by the overly long command names (get-childitem and the like), but there are 3 letter aliases (e.g gci) for a lot of the common ones and its really nice.
The second, because when the first fails, you're left with an ls command which is spewing some binary format you can't read without... wait for it... the tool which just failed.
I'd like to ensure my tools are all speaking a format which is too simple to ever fail.
for unixy tools ... well there are lots more,
some have made concessions to be less particular as to who considers them nice or friendly. `ls` is very much in the
public relations camp, the flamboyant front man, first point of contact. Behind 'ls' is 'stat' which is more a unixy text stream tool's tool than the human friendly `ls`
I wasn't aware that people hated it. I very rarely struggle through it (learning curve related, probably) but the objective behavior is something I wish Linux tools would move towards.
Honestly, I'd much rather have seen it be a bunch of console-oriented convenience libs for .NET and focus on making a good REPL console for .NET languages.
No, that was the point of the parent - that it would be nicer if we did have some object-layer, rather than processing (unstructured) text in pipelines. Losing meta-data at each step.
* Implements almost all salient POSIX features and some Gawk extensions.
* awk expression can be used anywhere an expression is allowed, including nested within another awk invocation. Awk variables are lexically scoped locals: each invocation has its nf, fs, rec and others.
* awk expression returns a useful value: the value of the last form in the last :end clause.
* can scan sources other than files, such as in-memory string streams, strings and lists of strings.
* supports regex-delimited record mode, and can optionally keep the record separator as part of the record (via "krs" Boolean variable).
* unlike Awk, range expressions freely combine with other expressions including other range expression.
* ranges are extended with semantic 8 variations, for succinctly expressing range-based situations that would require one or more state flags and convoluted logic in Awk: http://nongnu.org/txr/txr-manpage.html#N-000264BC
* strongly typed: no duck-typed nonsense of "1.23" being a number or string depending on how you use it. Only nil is false.
I extended somebody elses programming language recently, then wrote a BASIC. Mostly to make sure that I understood lexing, parsing, and AST stuff.
While you're right this is reinventing the wheel it can make sense to reimplement old tools to improve safety, security, and to allow them to be embedded in new environments.
Have you ever run a fuzz-tester against (GNU) awk? I have. Even now you can segfault awk with bogus programs, for example:
I can't tell you what motivates others, but I know what motivated me with GoAWK: intellectual curiosity, and the desire to learn how to use Go's profiling tools on a non-trivial project.
I suppose I'm jaded. I've been writing software for so long and have been so underpaid for it, maybe I'm envious of your ability to follow through on your intellectually curiosity with such flourish. Anyway, great work. Looks good!
One of the cool things about Go is that pretty much everything is being developed as a library. This means you can integrate this in your app without external dependencies, portable to any platform Go supports, since it depends only on the stdlib.
I also really like such posts. I have never written an interpreter. What do experts recommend i.e. how can someone without a formal CS background learn about write an interpreter? Any good sources, articles, or books? Thanks in advance.
Depending on how involved the language you are interpreting is, you might get by having only read chapter 6 of The AWK Programming Language[0] (linked in the article), which covers "Little Languages", including what it terms an assembler and interpreter.
If you are interested in more depth, either Crafting Interpreters[1] (mentioned in the article) or Writing an Interpreter in Go[2] looks promising. I've read more of Crafting Interpreters and really enjoy it, though it isn't yet finished. One of the aspects I really enjoy is that the language is implemented and re-implemented in different languages to gradually introduce lower level concepts.
Finally, this one may be a little more "out there" than what you are looking for, but if you are interested in designing a language more than the plumbing of an interpreter Beautiful Racket[3] is really good.
Sure! I also don't have a formal CS background (I studied electrical engineering without doing any programming / CS courses). Bob Nystrom's free online book http://www.craftinginterpreters.com/ is excellent. There's also the (much older but still enlightening) "Let's Build a Compiler" by Jack Crenshaw: https://compilers.iecc.com/crenshaw/
While I'm cool with writing other language processors in a new language (Lisp written in Cobol anyone?) I'm missing the value of this past the bragging rights.
There was a similar article about writing the LuaVM in Go, to package it in bigger Go applications. I've done lots of C based systems and bolted Lua on, so the Go version makes sense.
But is imbedding Awk into a program that gets done on a regular basis?
I believe in this instance it is the author wanting to level up on AWK and Go. The value is learning and fun.
An AWK interpreter written in Go is unlikely to be an improvement, except, well here is another blog post you might be interested in that has a similar sense of adventurous tinkering (it's about improving on grep):
https://ridiculousfish.com/blog/posts/old-age-and-treachery....
That's from 2006 and the tl;dr was graybeards did things a certain way for a reason. And yet nowadays with have things like rg (and ag and a bunch of others).
I think my GP has an objection to it being shared and being on top when there is nothing to learn from this in terms of ideas (which I share) and not to people hacking away.
This might be an avenue for programmers who are comfortable in Go but not C to extend a version of the Awk language.
For example I have wished for a long time that Gawk could parse gzipped files so that filenames could be used directly. I could take a run at implementing that in this interpreter where the C version would be more difficult personally.
That is what I do today. However in the case where metadata is part of the file name, the FILENAME variable is not populated without something else in the pipeline that passes it into the Awk script as a variable.
I look forward to reading this article when I have the time because I am a Go newbie, so a well explained example program will be useful to me. Also, I find language implementation interesting, and I've always admired the AWK language. I for one and glad he did it and wrote an article explaining it.
Finally, I think it is good to see C software being rewritten in more robust languages like Go and Rust, if people are inclined to do so. I don't think rewriting all the C software should be our top priority, especially high-quality programs like AWK, but it's a good overall direction to go when people are interested.
Why do so many modern projects feel the need to include the language and/or the tech stack used as part of the project name? Is it a type of virtue signaling? Does “Go” or “JS” or “Swift” or “Node” make these project more attractive somehow to an end-user (even if the end-user is a programmer)?
>Why do so many modern projects feel the need to include the language and/or the tech stack used as part of the project name?
Because that was a great motivation for them being written in the first place. "Let me write X in Y, to have Y easily used from Y or as a learning exercise or because Y is fun or bring benefits (e.g. Rust and safety, Go and easily parallelizable/static build).
>Does “Go” or “JS” or “Swift” or “Node” make these project more attractive somehow to an end-user (even if the end-user is a programmer)?
Yes, very much. I prefer Go and Rust utilities now whenever I can find them over an equally good alternative.
(Plus, you seem to have forgotten that "if the end-user is a programmer" they might be interested to play with the source, and there the language is very important).
For developers who want to tinker, they can use different means to discover projects. I don't need to see the language or tech name in the file every time I use it just so developers can discover it.
To me it signals lack of creativity. The problem of these projects seems to me that they have no reason to exist (beyond playground for the developer) other than "a tool written in that language". For example, here, awk already exists. Why would someone use GoAWK other than to indulge their love for Go? These are the cases that bother me the most; pushing a language or a tech to give your project validation because it has little as a standalone.
>For developers who want to tinker, they can use different means to discover projects. I don't need to see the language or tech name in the file every time I use it just so developers can discover it.
Well, if you do see it, it hurts you how?
>To me it signals lack of creativity. The problem of these projects seems to me that they have no reason to exist (beyond playground for the developer) other than "a tool written in that language"
Well, to me "a tool written in that language" is quite important. As a developer I don't just care for the tool's functionality, but also its hackability, portability and ecosystem.
The language influences whether I can easily hack the utility or not.
The language influences how easy the utility is to port.
The language influences how easy the utility is to use as a lib, integrate with some other project.
The language influences how many memory bugs the utility might have (e.g. C vs Rust).
The language influences how fast a utility is to build (e.g. Go build times).
The language influences how easy a utility is to deploy, or to have many different versions of (e.g. static builds vs a mess of classpaths and virtualenvs and the like).
The language influences how easy is to install (e.g. go get/install, or cargo equivalent vs messing with languages with no package managers like C, where new or obscure projects are almost never in the official distro repos).
The language influences how easy is to build (e.g. go build
vs the C/C++ autoconf/automake clusterfuck and dependencies libraries hunt).
The language also influences how performant a utility will be (e.g. csv query/processing tools written in Python vs xsv -- or the canonical example, Electron monstrocities vs native tools).
Not all of these traits are guaranteed given a language (to preempt the first knee-jerk objection), e.g. a Python tool could be faster than a Go tool if written well enough.
But historically and statistically speaking, and in a regression to the mean sort of way for each language platform, I've found those things to work better in some language X over another Y.
Do you "hack" every utility you use? Is that a predicate on you using that utility? Would you not investigate a utility you like to see whether you can easily "hack" on it or not?
(I dislike the term “hack” as it denotes to me irresponsible way to throw code, rather than properly design features and implement them in a considerable manner.)
I agree with most of the things you say above, they are very important. I just don't think it should be part of the name and identity of a utility. A tool should not have to scream to the world "I am a fast tool because I was written in C++ and not JS"—that should be a given for every utility.
>Do you "hack" every utility you use? Is that a predicate on you using that utility?
No, but not all the utilities I use are open source either. When they are, and for some domains (e.g cli text manipulation tools and development utilities) I like to be able to hack them, even if I don't, so for that (and for the other reasons) the language they are written in is important to me.
I never said it's the sole criterion.
>I dislike the term “hack” as it denotes to me irresponsible way to throw code, rather than properly design features and implement them in a considerable manner
For deployment code, yes. For utilities, usually that's exactly what I want though: to irresponsibly throw code for my own purposes, when I find it convenient to fix an annoyance or bug in utility I use or add some small feature I want to it.
When I see that "it's written in Go" to me it means that I can actually look in that code and understand most of it and the compilation from source will not be a big deal.
But those are signalling compatibility with other tools. That's fine. It makes sense. But just imagine if someone posted about the original awk (which happens from time to time) and titled it "awk: a text processor written in C".
Thing is, we now have this thing that language X is not good for doing Y, so when one does post a software for doing Y in X, and it actually does a good job, it works as marketing material for language X.
Programming languages are software products as well, and their customers want to feel they have made the right choices sticking with their options.
It is meant to attract other developers interested in the language, but it also serves as a shortcut for what the project is going to be like. "An Awk interpreter written in Go", "An Awk interpreter written in Java", and "An Awk interpreter written in Idris" set different expectations about the product, the project's goals, and what the community of its contributors values and talks about. It tells you why you may or may not want to join the micro-club of its users.
Yeah definitely. The fact that this is written in Go immediately tells me it will be extremely easy to compile and deploy, the code will be easy to understand and it will be reasonably fast. If it has been AWKjs I would have immediately known it could run in the browser.
The language tells you loads! This is a stupid complaint.
Normally, I’d agree with you. But I think in this case it is acceptable because the author is trying to convey that it is an implementation of AWK in Go.
In general, I think that it is acceptable to include the language name in the project if you are trying to to convey that the project is an implementation; of another project in that language and is not really meant to differ significantly from the original. Such projects are typically meant as an experiment, a library implementation for the target language etc. Anything else should be named differently — especially if it is going to differ significantly from the original.
Another thing to keep in mind is that by naming the project GoAWK, the author is as good as claiming that this is the canonical implementation of AWK in Go— something they may not have intended.
I've noticed this too. It's odd because if you've done your job correctly I shouldn't have to know which tools you've used to make it. I'm not going to start using a tool just because it was made with certain other tools.
I somehow have more trust in tools written in languages like C, Go or Python. If it's written in JavaScript, I tend to look for an alternative first, and I use JavaScript for 95% of what I do.
> Why do so many modern projects feel the need to include the language and/or the tech stack used as part of the project name?
It's advertising or propaganda for the language.
Whenever a language is "hot" and going through a full scale PR campaign, you see something like "[fill in the blank] written in Rust/Go/Ruby/etc" over and over again.
No different than all the current "facebook is evil" spam or the "bitcoin is great" spam of a few years ago.
Yeah, this fetishization of specific programming languages is silly. So many people these days are obsessed with learning specific languages and technologies (and sometimes denigrating others), instead of focusing on more fundamental aspects of programming.
This kind of people tends to do badly in generalist coding interviews that rely on their fundamentals.
Because implementation details are interesting to craftspeople, and we are craftspeople. Especially in a project where reimplementation is one of the defining aspects of the project.
Why? GC is overkill for a good number of command line tools.
What advantage do you see in command line tool running in its own container? Given how important pipes are, that's going to be a lot of overhead punting data between containers.
Actually, lack of a GC is overkill (in terms of control needed over memory) for most command line tools.
Having to manually track memory liveness C adds a large amount of complexity to tools like awk, sed, grep (which are already complex beasts themselves).
Many commands that only do a few things (perhaps not awk, since it runs full programs) don't need to free everything, since it will only allocate a few things and will soon terminate and free the entire process.
Nearly any command that you pipe into, or stream content out of, must to allocate and free memory in some non-trivial way.
Sure, those commands could just allocate and never free memory (a-la early C compilers, or the D compiler), but now any use-case that involves a large amount of data will leak noticeably. Not going to fly if you need these commands to be durable and efficient. And unix commands need to be both.
A GC gets you the freedom to operate on large streams for free, without having to worry about memory management (modulo optimization, but that happens later anyways, regardless of GC presence or not).
In some cases, yes. But not in all kind of programs. For example, my Farbfeld Utilities programs, are different how much buffers is needed:
* Some deal with only one pixel at a time, or sometimes two. No dynamic allocation is needed.
* Some deal with one scanline at a time, or sometimes more than one (but a fixed number) at a time. The same buffer can be used for each scanline.
* Some deal with the entire picture (such as those that distort the picture).
But one possibility can be that a program might load multiple pictures and each picture needing the entire picture at once, but does not use them simultaneously, in which case it is sense to free each picture after it is used.
(Or maybe I somehow misunderstood your message or something else.)
The point is, the default decision should be to not have to worry about memory management. Most application shoudn't, because they're not realtime operating systems, or in an environment where memory must be allocated statically.
Almost all unix command line utilities fall into this category. Having to worry about pairing your `free`s with your `malloc`s is a strict increase in cognitive overhead, which should have been spent on verifying the program's semantics are correct.
Messing up low-level memory operations, when you just want to worry about semantic correctness, potentially leads to bugs like RCEs, or dosing somebody with too much radiation.
Thankfully, these days, you don't actually need to choose between GC and manually matching up your `free`s with your `malloc`s.
There is at least some data that GC does have an impact on command line tools like this: https://boyter.org/posts/sloc-cloc-code/ --- More experiments like that would be great to crystallize the exact trade offs here.
Go isn't just about GC. Go is a language, which is stricter checked by the compiler. So beyond direct memory management, Go code should be cleaner, safer and of course easier to read than the corresponding C code.
If you're interested in correctness, cleaner and safer, Rust is a better choice. Go's lack of generics really hurts it when it comes to simplicity and cleanliness, and so far as correctness goes, the Go compiler isn't anywhere near as strict as Rusts one (nor is it as strict as gcc or most other compilers, for that matter, as a conscious choice of favouring speed over correctness)
Easier to read is not something I'd particularly accuse Go of. If you want that, stick with python. Lack of generics can end up really hurting Go for cleanliness.
Something I ran in to the other day when I was trying to produce a reverse sorted list of strings:
I wrote a sed in go a few years back, just to do it. The engine is a Reader so you can use a sed-processed stream anywhere a Reader is accepted. There’s also a command-line driver of course.
It's kinda nuts how far the unixy idea of "just streams of text" has gotten us.