Getting anything out of a subshell that isn't from STDOUT is impossible. So you can't define an array in a subshell and then use it outside the subshell, and you can't return an array (or anything that isn't a string) from a subshell. If you only use subshells and want to use any kind of data structure that isn't a string passed from STDOUT, you have to do it globally. And subshells are slow. So nobody uses subshells.
If you use Bash for programming, you have to stop thinking in terms of the holier-than-thou software engineer, whose ego believes that a superior, "clean" design makes a superior program. You should embrace globals. You should switch between using or not using the enforcement of set variables or program exit status. You should stop using Bashisms and subtle, obscure language features unless you absolutely have to.
Bash is not a "real" programming language, so do not treat it as one. Do not look for hidden features, or try to do things in cute ways that nobody else uses. There is no superior method or hidden knowledge. Just write extremely simple code and understand the quirks of the shell.
This reminds me of the arguments made against writing “real” code in JavaScript in the early days of the web, until Crockford came along and wrote “The Good Parts.” There is no reason to think that a few idioms and curating features could go a long way to leading to a much better, less hacky paradigm for bash shell scripting.
Bash is much older than JavaScript. If it was going to turn into a real programming language, it would have by now. It hasn't.
Also, JavaScript really is not a very good programming language. We are just stuck with it because it's the only language the browser understands. (Well, until recently: things are going to change with the introduction of wasm).
But for the shell, we're not stuck with any one language. Whatever you want to do can be programmed in your favorite language. You can easily write a python script instead of a bash function.
I've seen some pretty interesting things like the use of Rust for front end development, like yew[1] and Seed[2].
There aren't many languages that are practical for WASM output. Scripting and managed languages need to ship their interpreters and runtimes with their WASM blobs, and can end up relatively large. JavaScript's interpreter and runtime are baked into every browser already.
That leaves only compiled and unmanaged languages for potentially good WASM targets. As mentioned before, Rust is seeing a lot development in that space. If LLVM can compile it, then Emscripten can output it to WASM.
Wasm isn't really going to change the front-end / browser side of javascript for the better, but it's great for graphics and anything else cpu intensive.
WASM DOM manipulation will probably come soon, and that will place other languages in the exact same playing field as Javascript. For now, manipulating the DOM still requires calling Javascript.
Actually it turns out that having an interpreted runtime causes a stream of endless problem. We keep try to patch them over but new problems keep coming up.
It turns out that a standarized low level byte code is the right way to ship cross platform applications. We already knew that. It just took a long time to go through the standarization process and be implemented by browsers.
If it seems like it's not in use today, it's largely because JavaScript has momentum and the majority of web programmers don't yet know how to take advantage of wasm (or maybe don't think they need it).
> It turns out that a standarized low level byte code is the right way to ship cross platform applications
So why hasn’t it taken over the world with Java and now with wasm?
> If it seems like it's not in use today, it's largely because JavaScript has momentum and the majority of web programmers don't yet know how to take advantage of wasm (or maybe don't think they need it).
There’s another possible reason. I’m sure you can come up with the answer yourself.
Javascript never really had a chance to "take over the world" before Google released Chrome and the V8 engine. Before that it was just too slow.
Java applets really sucked. They took a long time to load and initialize. They all looked ugly (probably because of the default libraries?).
Also Java itself as a language was really bad and the development experience was awful. I want to say that no one wants to program in Java but in reality many people do (I don't understand those people).
The important lesson here is the implementation is more important than the idea.
Good idea with bad implementation -> goes no where.
wasm is not java.
The important quality about wasm is that it's not garbage collected. It's pretty close to just good old assembly.
>Javascript never really had a chance to "take over the world" before Google released Chrome and the V8 engine.
Chrome was released at the end of 2008, JavaScript was thriving way before that. We had Gmail since 2004, jQuery since 2006. WebApps were the “sweet solution” considered for iPhone apps at first in 2007. Chrome exists because of the healthy ecosystem that Firefox and Safari provided, not the other way around.
>It's pretty close to just good old assembly.
Precisely, and it's yet to be proven that's the best solution for the Web. I love it from the computer science perpective, but historically that idea hasn't struck a chord.
> JavaScript really is not a very good programming language.
Since 2015, JavaScript really is a nice language. I never thought that I'd say that.
Go look at ES6 (the javascript version that came out in 2015) and even TypeScript. You'll be pleasantly surprised, no matter if you're coming from C++, Java, C# or even other "scripting" languages such as PHP and Python.
I use typescript almost daily. It's certainly better than Python or Ruby, I'll grant you that. But I don't think Python is a good language either.
Ultimately I want a langauge with value types like structs and arrays that you can use to do computations without allocating things on the heap.
In Go I can write a function that returns two numbers.
In Javascript I can only return two numbers by allocating an array on the heap with two items. These two items themselves are probably pointers to two number objects allocated separately. So that's 5 allocations to return two numbers. It's insane.
This means fundamentally it's impossible to create applications in javascript that are both sophisticated and high performance.
A great example to illustrate this is the tooling around Javascript itself.
There were all kinds of bundlers: webpack, rollup, parcel, etc. Rollup was considered the fastest and lightest. They were all programmed in Javascript itself. They were all slow. But no one really knew how bad it was because there was nothing really better.
Then esbuild comes along, and blows all of them completely out of the water, out performing them by 100 times. And it's written in Go. A language that supports structs and arrays as value types.
It’s not fundamentally impossible to build fast apps in JS. It’s just not possible to do if you don’t do the legwork of writing code that the compiler will optimize for you. It’s not easy, and it’s not natural in a lot of cases, but saying it is fundamentally impossible goes too far: people build high performance applications like games in JS.
For example, you are worried about value types but under the hood the JIT compiler will actually generate efficient representations that are passed by value if you do not mess things up for it. Modern JS compilers are extremely sophisticated.
For returning values without an allocation, you’d create such a “struct” and then use a memory pool. It would still be heap allocated I believe but not totally sure. People who know more than I do could probably tell you a better method to return multiple values from a function efficiently.
> people build high performance applications like games in JS
Yes, and in my experience this kind of applications (and the browser in general) are one of the only things these days that still push me to upgrade my hardware to have better CPUs and more RAM, as it's perfectly sufficient as is for pretty much everything else I do.
You can’t tell from the outside if you are running a well engineered JS app that would have been fast in another platform, or a non-optimized JS app. My point was that it’s not fundamentally impossible to write fast enough JS code for a game, even though you may come across many slow JS apps regularly. (Not surprising, given the wide distribution and low barrier to entry of the web.)
Yes, you can operate strictly on typed arrays and essentially implement something alike a virtual machine inside your JS code to make it handle memory efficiently, so you're right - it's not "fundamentally impossible", but are you still really writing JavaScript at that point? :P
There's no distinct "modern JavaScript", it's still the same language. It just has bunch of things added to make it more manageable, but it still contains all the footguns and gotchas as it did before.
Its tooling got significantly better in the last decade or so, but you still need to intentionally rely on it in order to work with the language in any sensible way. You can't really hope for any other outcome when maintaining backwards compatibility.
The tooling only got better last year with esbuild.
Everything else is either unmanageably complicated (webpack) or slow (rollup, parcel, babel).
React mainstreamed the idea of using a virtualdom to accelerate UI building (accelerate in the computational sense: by not dealing directly with the slow DOM).
> There is no reason to think that a few idioms and curating features could go a long way to leading to a much better, less hacky paradigm for bash shell scripting.
A couple of years ago, I challenged myself to write some complex scripts and apps using only Bash, and came to this conclusion myself.
You can get pretty far with just Bash alone, especially if you strive to write readable and maintainable code, and not just one off scripts. If you need access to data structures, especially nested ones, you can shell out to another language and then print the results back to stdout so Bash can use them:
Because the shell is used to do things that are easy in the shell. Examples: file manipulation, sorting, field based processing, pipes. Doing these things in a traditional language is more complex and not necessarily better.
Except the shell is bad for those things, except pipes I guess.
I just hate doing things with files in the shell, all those stupid escaping rules, and god forbid you have files with spaces or quotes in the names, or leading or trailing spaces.
It's just spending time learning hacks to get around shell limitations instead of actually getting things done, and at the end it's an ugly mess.
Or maybe I'm just bad at it, but I don't think it's only that.
That's the other part of the problem. The shell is available everywhere. Python needs to be installed. So, if you need shell-related functionality, it is the easiest route to use.
Windows, possibly. And as for the *nixes, including Macs, what version is installed? The way Python 2 eol was extended and extended and extended just hurt the whole ecosystem.
Windows doesn't have bash preinstalled so I'm not sure why you're bringing that up.
On all OSes I've used in the past 3-4 years all of them have been Python3 by default. The specific version is largely irrelevant as I could ask you the same question about which version of bash is running on a given OS (e.g. it's not available on Mac out of the box as the default shell is zsh due to GPLv3).
I did a similar thing a couple of months ago, writing a static site generator for my blog in sh. You can write bigger programs as long as you keep discipline, albeit it is a pain.
> Getting anything out of a subshell that isn't from STDOUT is impossible.
You can use I/O redirection with arbitrary file descriptor numbers, and you can let the shell pick them. So you can `mknod` a pipe, `exec {pipefd_rw}<>"$pipe_name" {pipefd_ro}<"$pipe_name" {pipefd_rw}>"$pipe_name";` and now you have three open file descriptors that you can let the sub-shell use to communicate with the parent. And so you're not limited to stdin/stdout. I do wish there was a built-in for making an anonymous pipe instead of having to make one on filesystem.
(You have to open the named pipe for read-write first because otherwise you'll block, but if you don't want a read-write FD for it you can close it after opening the read and write FDs.)
I'll count myself among nobodies, I guess. I use them a fair bit when part of said subshell cd's somewhere, and I don't feel like using pushd/popd, either of which might fail to operate based on my mistakes, but subshells seem to never fail to exit, eventually.
If you take the time and effort to understand how bash wants you to think, you can learn how to right elegant scripts that are as maintainable as anything else.
I've written a lot in bash over the years, and I feel like I understand it pretty well. But I would never say that even elegant bash scripts are as maintainable as "anything else". It is a clunky programming environment, born of compromises, with many traps that are easy to miss in code review.
* already packaged for your distro or package manager
* supported as an integrated linter in major editors
* available in CodeClimate, Codacy and CodeFactor to auto-check your GitHub repo
* written in Haskell, if you're into that sort of thing. "
Sometimes you are not in your computer, the script does not have private information (i.e. open source or something you don't care to be public). Sometimes the website is simply more convenient.
A random web form can exfiltrate the data you paste into it (and whatever your browser lets it gather). A local program can exfiltrate ... approximately everything of value on the machine?
That's why Linux users typically install things from their distro's package manager. The bar to get malicious software in there is very very high (though it is not impossible).
But if you're still using Windows, then yes, I agree.
While I understand the sentiment, I'm not sure how bash could ever be as maintainable as a something written in e.g. Python (or even better, a strongly-typed language).
The thing with bash is, it's great for tying things together and quick bits and pieces, but it's not set up for writing maintainable code. Arrays, functions, even if statements comparisons can all be done in bash (as first-class features), but are just... easier in other languages. And then think about the refactoring, linting, testing tools available in bash vs other languages. And then on top of that, there's the issue of handling non-zero return codes from programs you call; do you `set -e`, and exit on any non-zero return code even if you wanted to continue, or not `set -e`, ignoring any errors as your script just continues.
Personally, when I feel I want to use a function (or array, or other similar, non-trivial thing), in bash, it's time to reach for another language.
Having said that, there are some nice programs written in bash. https://www.passwordstore.org/ being one that comes to mind.
You are kind of right, I don't write much bash, but I do write some simple scripts that I can call quickly and easily (e.g. start this program with these args, write the log file here with the filename as the current date, etc). Although regarding "Then you must not be writing any bash at all"; I'm not sure how you could have deduced this!
With regards to `print_usage()` and `die()`, yes, I would reach for Python 3 then. The `argparse` module and `throw` are first-class members of the stdlib/language and are better and more standard between programs than if I threw together these myself (and with `throw` you get a stack trace, which is nice).
This is out of necessity. I'm not the sharpest tool in the shed, so I have to go out of my way to write things such that when I come back to them in months or years, I still understand what they do.
For this same reason, is also why I never use shorthand flags for scripts.
I have no clue what IE: "-s -y -o" might do (or even worse, "-syo", dear god my eyes! Also -- is that one special command, or a series of individual commands?!).
But "--silent --assume-yes --output-file" is pretty easy to grok immediately.
I would add a variable ENDPOINT, initialised to "" and set to " --endpoint $THEENDPOINTVALUE" if the endpoint value was passed. Then include that in every invocation?
Sorry if I missed something in the logic, reading on my phone, but from the comment, this feels like something I do frequently...
I write a lot of bash, and can kinda agree that it’s full of footguns. The most of them are not even in bash, but in Unix tools, which you have to use due to the lack of standard libraries.
To have variable typos result in errors with `set -u`, then expand an array that is defined but may be empty, I need write it as `${ARRAY[@]+"${ARRAY[@]}"`. (Further explanation: https://stackoverflow.com/questions/7577052/bash-empty-array...) But maybe bash arrays are too new a feature, and should be avoided, so let's look at something simpler, like command-line arguments.
Parsing command-line arguments is easy, but parsing them correctly is ridiculously hard. `getopts` doesn't handle long flags, so it's out. `getopt` doesn't handle arguments with spaces in them, unless you have a version with non-standard extensions, so it's out. You're left with manual parsing, and it's a royal pain to make sure you handle every expected case (short/long flags, clusters of short flags, trailing arguments to be passed through to a subprocess, short/long options whose value is in the next argument, long options whose value is after an equals sign, and several others I'm probably forgetting about). And this is just to get the arguments into your script, before you actually do anything with it.
I agree that bash has effective idioms, and learning those idioms can make scripts easier to write. I strongly disagree that bash is "as maintainable as anything else", and scripts beyond a few hundred lines should be rewritten before they can continue growing.
Bash `getopts` handles short and long arguments gnu-style without any problem. The following code handles args like "-h", "-i $input-file", "-i$input-file", "--in=$input-file", and "--help":
while getopts :i:h-: option
do case $option in
# accept -i $input-file, -i$input-file
i ) input_file=$OPTARG;;
h ) print_help;;
- ) case $OPTARG in
help ) print_help;;
help=* ) echo "Option has unexpected argument: ${OPTARG%%=*}" >&2; exit 1;;
in=* ) input_file=${OPTARG##*=};;
in ) echo "Option missing argument: $OPTARG" >&2; exit 1;;
* ) echo "Bad option $OPTARG" >&2; exit 1;;
esac;;
'?' ) echo "Unknown option $OPTARG" >&2; exit 1;;
: ) echo "Option missing argument: $OPTARG" >&2; exit 1;;
* ) echo "Bad state in getopts" >&2; exit 1;;
esac
done
shift $((OPTIND-1))
Thank you, I haven't heard of that one before. However, all the examples look like it gets called as an external library, which would either need to be distributed with a script, or to already be installed by a user.
Is there a feature I'm not seeing that would generate an argument parser without external dependencies?
Yes, arguments::generate_parser function. Install bash-modules, then run `. import.sh log arguments`, then call `arguments::generate_parser '--foo)FOO;String'` , and it will generate function with parser, which include parser for foo option for both `--foo VALUE` and `--foo=VALUE`:
...
--foo)
FOO="${2:?ERROR: String value is required for \"-f|--foo\" option. See --help for details.}"
shift 2
;;
--foo=*)
FOO="${1#*=}"
shift 1
;;
...
Because cross-OS (not just cross-linux-distro but even that) standards-making doesn't exist anymore in a real non-broken way, and we're stuck with whatever standards of 20+ years ago. Whether official or de facto (technically bash is just de facto although `bash --posix` or `sh` is an official real standard). There basically isn't any "innovation" happening in this space in a way that could really result in anything as pervasive as bash. Maybe also cause it's just not "interesting" anymore, the 'sexy' things are many levels of abstraction higher than they were when bash came to dominance. It feels like now we're just stuck with it forever, indeed.
Maybe it's time for some other body --- say, freedesktop.org --- to start standardizing things that have traditionally been in the domain of POSIX.
After all, HTML got much better when WHATWG took W3C's toys away and actually started innovating HTML again. Maybe it's time to do something similar with Unix: I'm sick and tired of people telling everyone that we have to stay in some 1995 time capsule because we need to follow The Standard and if The Standard hasn't changed, too bad.
You say "don't exist anymore" but when was it better? Pre-Mac-OS-X you didn't even have an easy to access shell there on Macs, and Windows wasn't bash either. So if there was a golden era for that, it never included the two predominant desktop OSes.
De-facto standardization has at least brought us free officially-supported options for running Bash or other *nix shells on both of those platforms(if terribly outdated on Mac).
I agree with your broader point that we seem stuck with old shells and no contenders seems in a position to replace them. However some are certainly trying, like osh [1] that takes backwards compatibility with bash seriously, which makes adoption easier.
My feeling is that it is about convenience when typing at the terminal. Having a unified language for both scripting and entering commands is quite convenient. Also, bare strings while typing at the terminal is very convenient! It would be really annoying if we always had to add quotes for string args when we're typing at the terminal:
$ cat "file.txt"
...and I don't even know what options would look like... maybe:
$ ls "*.txt", ["l", "a"]
I dunno... this would mean we'd probably need parens now for precedent-type concerns:
$ ls("*.txt", ["l", "a"]) > "file.txt"
Anyway, that is why I understood that it's this way. I can't point to a resource about it, though.
Powershell is how I instrument cross platform CI scripting, with the power of the module concept and manifess now you can write real powerful type safe apis that can run anywhere. It has its quirks though...
Any gotchas with running the same scripts on Windows and Linux? I've been thinking about switching to PowerShell. Love it on Windows. Never tried it on Linux but I did try it on macOS a few years ago and remember some quirks. Currently a bash (over)user.
One gotcha is powershell core has some different apis then the 'windows powershell', best to deploy core on both systems to unify the api.
Also powershell has no support for automatically handling external shell commands that error, you must manually check $LASTEXITCODE (easily done with a custom invoke api that you use for everything).
Also well, some of the escaping in strings is really odd...
You have to add quotes in bash if the filename contains a space. Variables should also be quoted for the same reason, because their values might contain spaces. "${var}"
> It would be really annoying if we always had to add quotes for string args when we're typing at the terminal
With bash (and I suspect with every other popular shells out there), it actually means something different. In your second example, if you don't use quotes, the shell will do the extensions, but if you do, the process will have to do it (and in the case of ls, it cannot).
In some case it's good to have the shell taking care of it, but sometimes (eg, when using 'find') it's not.
I was trying to point out that quoting parameters or not could bring a different meaning for a shell (as one could see it with Bash), with the implied consequence that one would have to carry over that nuance somehow to a shell where all parameters would have to be quoted.
The Bourne shell thus adopted ALGOL syntax, distinct from anything else in UNIX.
David Korn brought a pile of C, and wedged it into a max 64k program space (for Xenix), and his upward-compatible Korn shell (ksh88) had many, many features.
There are two very distinct places where Bash is not, that I have found.
Bash is not in busybox. It pretends to be, but what is really there is the Almquist shell, with a bit of syntactic sugar to silence the common complaints.
Bash is also not in Debian's shell (/bin/sh). In that place, there is no tolerance of bashisms.
It is important to know the POSIX shell for these reasons.
If you want to choose a programming language, there are hundreds to choose from. You could write a small program ("script") in Brainfuck. But that would be really annoying, and it would take you a long time to get anything done. Other languages may also take a long time to write, test, execute, and maintain. And they may be better at some things than others.
You have to step back and remember the point of all this. Why are you programming? To solve a problem, and make a human's life easier. What is the problem you are trying to solve? How can we make the human's life easier? In this case, it's "I want to combine bunch of programs together in a command-line interface, to make it easier to use these programs to get my work done on the command-line." So, what solution should we choose for this scenario?
Lots of different "scripting" languages exist (we used to call them "glue languages"). Today Python is the most popular, and before it was PHP and Perl. But who cares if they're popular? What are they good for? Why would you choose one over the other?
Python is a general purpose language which is easy to learn, easy to write, and easy to read. Well that sounds nice, but it's very "general" and not specific to our use case. PHP is a language designed for use in web development. Perl is a language designed for system administration-type tasks.
Bash scripting is designed for making it easy to combine already-existing programs in a command-line shell, with no modification or creation of programs required. The only "dependency" is the Bash interpreter (which happens to also be the command-line interface we started our problem with! how convenient!) and an existing collection of software tools that work with each other through a command-line interface. Every part of it is explicitly not designed to "make programming easier". Instead, it is designed to make "combining programs in the command-line" easier. The result is things like every single word you type is potentially just an outside program being executed, or arguments to that program. No other language has this quirky behavior, because they're not designed solely to make command-lines easier to use.
I know Perl well, and the language is fantastically useful as a system scripting language. It combines some of the best features of common shell scripting programs with more programming language-specific features. It is designed with shortcuts and "magic" to make quick work of command-line scripting tasks. But ultimately Perl was not built into the command-line, so its utility is always a bit stilted. Going between the shell and the Perl code and back to the shell isn't as flexible as if the Perl code was embedded in the command-line. Perl also doesn't have as simple a facility for pipes as a command-line shell does. Oh, and a Perl install is rather large, isn't available everywhere by default (anymore), and has all the usual dependency problems. So even though I know Perl very well, if I need to just combine existing programs in the command-line, I always use Shell scripting instead of Perl. (There are Perl shells which solve this problem, but then everyone needs to learn Perl, and Perl can be.... idiosyncratic)
You could also use Awk for a large number of general programming tasks, but again it's designed for a different specific problem: pattern-scanning and processing, not generally combining programs together.
So we use shell scripting to solve the problem because it was designed specifically for our problem, it's ubiquitous, and simple to use. You could use any other programming language, but it wouldn't fit our scenario as well. Once your use case changes, and you are no longer only trying to combine existing programs together, or the existing programs don't do what you want them to do, then you need to use a different language that better fits that new use case.
I tend to start writing things in bash (piping into things like grep, tac, sed, etc
However if I’m not careful they get painfully complex and I tend to regret not writing it in perk. As such I now switch to using Perl when I get to the function stage, or anything other than the most simple bash arithmetic.
I don't know why "we" are. Just use Python. Or whatever language you're using for your main system (I've written "scripts" in Scala, sure it takes a second or two to warm up the JVM but it's honestly fine). Or TCL if you really must have a language that can be used as a login shell.
shell is good enough, when people started to use script language, they just wanted to fill the empty between bash and C, no one want to replace it, so bash and other shell still there
> Getting anything out of a subshell that isn't from STDOUT is impossible.
Yes that is a known limitation that has horrible, horrible workarounds. Don't use them.
That said, for scripts whose main input _is_ from a file or STDIN, and also whose main output _is_ to a file or STDOUT, then bash is far more often the right tool for the job than it is given credit for. Of course, I'm talking about bash and a host of other utilities that are often packaged with it such as awk, grep, sed, cut, etc.
For processing text I find myself often choosing between bash and Python, and very often if I choose bash I'll feel that I've made the right choice.
djb has always used them, even in the shortest of scripts.
Otherwise I agree with everything in this comment.
There is a reasonable argument to avoid using bash for non-interactive scripts. The benefits of any additional bash features, so-called "bashisms", arguably do not outweigh the costs of making these scripts non-portable. One of the many advantages of shell scripts is that they are portable and have tremendous longevity; shell scripts can last a long, long time. There are no version changes and aggressive feature creep to worry about as is routinely the case with programming languages. The scripts just keep working, every day, and we can forget we are even using them, e.g., they are being used in the various ways by UNIX-like OS distributions.
One of the today's programmer memes was "Get Stuff Done". Maybe the shell was not meant for today's programmers. But for "sysadmins", or "DevOps", or whatever term anyone comes up with in the future, people who can "administer/operate" computers running a UNIX-like OS for themselves or for someone else like a client or an employer, the Bourne shell works for its intended purpose, better than anything anyone has come up with in the last 50+ years. There's a lot of stuff that "gets done" with the shell, whether it is on someone's own computer, their client's or their employer's.
Attempts at "shell replacements", cf. alternative shells, usually look like interpreters for programming languages, not shells. Perhaps this is not a coincidence.
Bourne shell is relatively small and fast. Computers with small form factors often use UNIX-like OS and when they do they usually include shells. Many programmers seem to dislike the notion that such a layer exists. They often try to blame the shell instead of their own lack of interest in learning to use it.
The shell is boring, and "boring" is sometimes the wisest choice. Most software has expanded to consume available resources (often the developer's choice not the user's). This makes computer performance gains difficult for the end user to discern, e.g., decade after decade, routine tasks seem to take the same amount of time. However the shell and many "standard" UNIX utilities have not changed as much in the same period. Subshells may have been "slow" many years ago. IMHO, they do not feel slow today. Routine tasks performed with the shell seem to run faster, as one would expect after hardware upgrade.
I posit: Life is too short to learn every programming language du jour but it is long enough to learn the Bourne shell, reasonably well.
"Shell" as used here means the Bourne POSIX-like shell, such as NetBSD's Almquist shell, not userland utilities that the shell may or may not call in a script. Scripts can of course test for differences in utilities where there is uncertainty.
If scripts written for a POSIX-like "lowest common denominator" shell were not portable, methinks projects written for such a shell, such as autoconf, would not work on so many diffferent systems. The amount of free software found in open source OS repositories that is built using "configure" shell scripts is not small.
Debian and other Linux distributions use a shell derived from NetBSD's Almquist shell. People who design and maintain these operating systems suggest that, for non-interactive use, this Bourne shell is faster than Bash.
One of the principal things that "got done" using Bash (at least until 2017 -- after that, I couldn't say) off the top of my head was the GFS at NCEP. Yes, it was predominantly Fortran for number crunching; but what tied all the individual programs together was a super massive, maintained, and constantly modified and improved Bash script. And yes, there were plans in the works to switch out for something else (which is why I cannot say, four years later, what the status quo is). I should also mention, it was this Bash script that ran parallelized on the supercomputer -- and it was ultimately this script which, every three hours, produced the next set of forecasts, 24/7.
Well you can use arbitrary file descriptors to get things out of a subshell, not just stdout. I had to do it when I had to get something out of band, while redirecting stdout through a pipeline.
Mind, if possible, it is even uglier than using stdout.
Seems like that should work fine, and it's not even that odd looking or hard to understand. But I haven't tried, not sure if there's some surprise lurking in the depths of tempfiles or something.
You're right about the disadvantages of subshells, but wrong about this:
> avoid bashisms
What's holier-than-thou is insisting that people stick to a dialect of shell scripting that hasn't changed in decades because some people think that conforming to POSIX is its own reward even if it makes life harder and programs worse.
No, thanks. I'll stick to bashisms. Things like coprocesses and filemap make certain classes of program much easier to write. And yes, shell scripts are programs, and the quickest way to turn these programs into unmaintainable balls of mud is to follow your advice to avoid normal programming best practices simply because the program you're writing happens to be a shell script.
Here is a script that shows exporting results from a subprocess to a parent shell using a temporary file:
#!/usr/bin/env bash
process_line() {
if [[ $* == *foo* ]]; then
echo "found_foo=1" >>$IPC
fi
}
IPC=$(mktemp)
find . -maxdepth 1 -type f | while read line; do
# note: this is run a subprocess
process_line "$line"
done
source "$IPC"
echo "found_foo=$found_foo"
> Bash is not a "real" programming language, so do not treat it as one.
Bash is definitely a different paradigm to your average imperative or functional language and thus requires a different approach but I wouldn’t go so far as to say it’s not “real”.
It certainly fits the criteria of a programming language even if it does have more warts than a pantomime witch.
I don’t think it makes much sense to judge software by whether it is Turing complete because otherwise that would make Minecraft a programming language
So are you deprecating the term "Turing Complete"? Mighty bold move there. And on that same note, have you seen the stuff being done in Minecraft? It is quite up for debate whether it is a programming language.
> So are you deprecating the term "Turing Complete"?
No. It’s still an interesting yard stick. It just doesn’t describe programming languages in full. For example, and someone counterintuitively, it is actually possible to design a programming language that isn’t Turing complete. Some esoteric functional languages do actually fall into that category. But generally programming languages would be a subset of Turing complete software.
> And on that same note, have you seen the stuff being done in Minecraft?
I have, hence why I cited it as an example.
> It is quite up for debate whether it is a programming language.
One could debate anything but it doesn’t mean the argument is made in good faith.
Minecraft is a game. I’m happy to even extend the definition and say it’s supports some visual programming mechanics. But that doesn’t mean it is a programming language.
Just because something has 4 legs and a tail, it doesn’t automatically mean it is a dog.
If anything you are saying it should be deprecated in place for a real language. If people want to do more advanced things why does the basic language of their shell not support it? Why should you not care about the correctness of your shell scripts which can run important things? Seems a perfect place for a functional language. It’s high level, you can just declare what you want. I imagine your interpreter can add in all sorts of checks to help make your scripting more correct.
Powershell attempts to use a more scripting friendly language in the shell.
And it’s a horrible shell experience.
It’s great for scripting (but not as good as other scripting languages) but it’s a terrible experience on a day to day basis, with the only salvation being the existence of 2-3 letter aliases which make things slightly more manageable.
Bash is a great shell language that whose scripts are useful because it allows you to trivially convert your manual shell actions into an executable script.
If you need to do anything complex (or rather, something you probably wouldn’t type into the shell manually if you had to do the thing on a one off basis), then you’re probably better off using a different scripting language.
But changing BASH with scripting as a priority would probably make it a worse shell language (which is not to say there are no improvements to be made…there are).
> So you can't define an array in a subshell and then use it outside the subshell, and you can't return an array (or anything that isn't a string) from a subshell
Huh. How does one return a value from a bash function? I've always only ever seen the "echo calling convention" used in scripts, or use of global variables.
A lot of the time you don't even have the luxury of assuming "bash" is present, you've got only "sh".
Since all valid "sh" syntax is valid "bash", you've got to restrict yourself to only sh-valid code in the event the host doesn't have bash.
I have gotten used to doing this -- these scripts usually wind up being run in Docker containers that don't have "bash" installed for the sake of minimalism.
IE, you can't use [[ $predicate ]], but instead [ $predicate ], etc. Lot of subtle differences.
Not all sh syntax is valid bash — in bash [[ is reserved but not in sh. And if you want syntax that is valid but doesn’t do the same thing in sh vs bash there’s plenty of examples.
I would agree with this and use python instead, but for uuhh "python problems with strings". IF you are 'blue sky' open, making new things, yes, sure. no BASH. But for things I built 8 years ago that are non-trivial, and I don't prioritize an entire rewrite, BASH works very well. And guess what, every two years that 8-years-ago just comes along with it. BASH is here to stay.
Sometimes we want to write functions that have side effects. Subshell functions don't have side effects. Sometimes you don't want side effects, but sometimes you do, and it's silly to completely foreclose on even the possibility of using side effects.
The article didn't say you should always use subshells for functions, no exceptions. Just that most of the time it probably makes more sense to do so. And subshell functions can still have side effects like interacting with the filesystem, they just can't modify the parent shell's environment.
A “side-effect” in this case is the ability for a function you call to do something “outside” of itself. For example, to change the value of a global variable.
It’s generally considered best practice (in the functional programming community at least) to write “pure” functions (that is, functions without side effects) because it’s much easier to reason about what they are doing.
So good news, subshells can _only_ be pure (the only way to get anything back from them is if they write some string to stdout), but sometimes you do actually want to have some side-effects (imagine a function that reads a config file and then wants to set some of the global variables to the values found there).
Well, bad luck. If you use the subshell syntax you literally can’t do that.
And what makes you think side effects aren't possible in the model described?
Think of grep as a way to control flow. If one side effect happens, send to stdout one thing. Another side effect, send something else. Then iterate over those outputs.
> Getting anything out of a subshell that isn't from STDOUT is impossible. So you can't define an array in a subshell and then use it outside the subshell, and you can't return an array (or anything that isn't a string) from a subshell. If you only use subshells and want to use any kind of data structure that isn't a string passed from STDOUT, you have to do it globally.
Yes, but that is no problem at all, because that is the way shell scripts work and if you do it in a functional way, where the function has no state, it is great (we a few exceptions).
> And subshells are slow. So nobody uses subshells.
Shell scripts are slow in general, subshells don't make an exception but also don't have a large impact. And people do use subshells.
> If you use Bash for programming, you have to stop thinking in terms of the holier-than-thou software engineer, whose ego believes that a superior, "clean" design makes a superior program. You should embrace globals. You should switch between using or not using the enforcement of set variables or program exit status. You should stop using Bashisms and subtle, obscure language features unless you absolutely have to.
If you use Shell scripts, you should understand that this language has been designed decades ago and that professionals advise to use it just in short scripts to connect binaries.
> Bash is not a "real" programming language, so do not treat it as one. Do not look for hidden features, or try to do things in cute ways that nobody else uses. There is no superior method or hidden knowledge. Just write extremely simple code and understand the quirks of the shell.
That part is mostly okay, but "real" doesn't get the point, as it is a real language, just one with many problems. However, given its superior strength we haven't overcome it yet...
Basically, yes. Most arguments expressed here apply to Bash as well as to other POSIX compatible shells. Even the trick from the initial blog post is also works for POSIX shell AFAIK (didn't check). Definitely working is something like (which is just a little more verbose):
myFunc() { (
#...
) }
So in this case I don't think it makes sense to make a distinction between the two. In essence, I think Bash and the other POSIX shells share most of their greatest strengths and weaknesses.
My understanding was, that the initial argument was about Bash as a programming language and most aspects discussed were aspects were Bash is no different than POSIX:
1. Return values of subshells
2. Clean design
3. being a "real" programming language
The only Bash specific part was
> You should stop using Bashisms and subtle, obscure language features unless you absolutely have to.
But in fact, the usage of subshells to limit the scope and subshell return values have nothing to do with Bashisms.
> Well, because it simply doesn't work for them: returning from a function does not trigger the EXIT signal.
It doesn't trigger EXIT, but it does trigger RETURN. Just trap both:
#!/bin/bash
foo() {
trap "echo 'Cleanup!'" RETURN EXIT
#return
#exit
echo "Kill me with ^C or \"kill $$\""
while true ; do : ; done
}
foo # should print 'Cleanup!' on SIGTERM,
# returning, or calling exit
Here's an interesting bit of...abuse :) Since Bash is effectively stringly-typed, it can be used as a functional programming language, with pipes similar to function composition.
e.g.: wtfp.sh
#!/usr/bin/env bash
map() {
local fn="$1"
local input
while read -r input; do
"${fn}" "${input}"
done
}
reduce() {
local fn="$1"
local init="$2"
local input
local output="${init}"
while read -r input; do
output="$("${fn}" "${output}" "${input}")"
done
echo "${output}"
}
filter() {
local fn="$1"
local input
while read -r input; do
"${fn}" "${input}" && echo "${input}"
done
}
add() {
echo $(( $1 + $2 ))
}
increment() {
echo $(( $1 + 1 ))
}
square() {
echo $(( $1 * $1 ))
}
even() {
return $(( $1 % 2 ))
}
sum() {
reduce add 0
}
map increment | map square | filter even | sum
This is the pipe mill[1] pattern. As you say, you can use it to mimic function composition from a functional paradigm. It can make for some elegant solutions to dealing with streams of data.
The issue is that pipe mills are very slow.
$ mill() { while read -r line; do echo $line; done }
$ export FIVE_MEGS=$(( 5 * 1024 ** 2 ))
$ time yes | mill | pv -S -s "$FIVE_MEGS" > /dev/null
5.00MiB 0:00:23 [ 221KiB/s] [============>] 100%
real 0m23.084s
user 0m14.121s
sys 0m26.780s
$ time yes | pv -S -s "$FIVE_MEGS" > /dev/null
5.00MiB 0:00:00 [6.55GiB/s] [============>] 100%
real 0m0.005s
user 0m0.000s
sys 0m0.006s
Even Python loops are faster.
$ export PYLOOP="from sys import stdin
for line in stdin:
print(line)"
$ time yes | python3 -c "$PYLOOP" | pv -S -s "$FIVE_MEGS" > /dev/null
5.00MiB 0:00:00 [67.1MiB/s] [============>] 100%
real 0m0.082s
user 0m0.071s
sys 0m0.019s
Fun shell function fact: used to be if you `break` or `continue` in a function without a loop, bash would find the loop:
breaker_breaker() { break; }
foo() { breaker_breaker; }
while true; do
echo Loop
foo
done
bash dynamically crawled up the call stack until it hit a break-able loop. If you squint it almost looks like exception handling! Anyways this no longer works in bash, though it still does in zsh.
Well, bash's new behaviour (noisily complain and do nothing, reporting success) doesn't do anything useful, so implementing it gratuitously breaks old code to no gain.
Dynamic stack-crawling is more or less the most popular historical behaviour (though afaict the Bourne/Korn lineage just silently fails breaks outside the enclosing function); it's just generally consistent with everything else in the shell that's globally/dynamically scoped. Even in the shells where you can't break out of your enclosing function, `eval break` still breaks through loops in your current context, and that kind of looks like a function you called that called break if you squint.
(POSIX expressly leaves this case undefined with a carveout for break/continue inside the body of a function lexically contained within a loop.)
That's more pass-by-reference, right? Which can be used to set things in the calling context, to be sure, but seems meaningfully different from "run this code N frames up", partly because it is limited in what it can do and partly because it can only change variables that you are actually mentioning in what you pass in.
I usually associate "call-by-name" with laziness. Isn't this more pass-by-reference that may happen to be implemented using names under the hood? Alternatively, how would you distinguish them?
I'd argue that most of the work in a bash program is done by functions like find, grep, etc, and that the time to fork is not all that relevant. We don't program the same kinds of things in bash that we might in C++
Not quite. WSL2 uses a Linux kernel. WSL1 uses a Windows kernel and fork is much slower there. Also there's userspace variants like MSYS2, Cygwin, etc.
Assuming that fork is fast everywhere is how you end up with things like ffmpeg's configure script that runs in seconds on linux and _minutes_ on Windows.
if you want to fork your function call then you do it explicitly with $(my_function). I'm aware that people are always discovering things for the first time but there is literally decades of thought that has gone into why bash behaves the way it does. and there's a pretty good reason why the bash authors decided not to make function calls fork by default...
Sure it's slow on startup, but if you design your functions to pipe to each other as if you were writing purely-functional code, startup time is mostly irrelevant. Function startup happens once and from there they just feed down to the parallel-by-default pipe stream.
Everything is relative, a $(true) is on the order of 0.1 milliseconds. If this makes something ridiculously slow, you may be implementing a too big part of your work in bash!
You can also turn a Bash function into a script command with almost no effort, such as making a file called "run" and putting this in it:
#!/usr/bin/env bash
set -eo pipefail
function hey {
echo "Hey!"
}
TIMEFORMAT=$'\nTask completed in %3lR'
time "${@}"
Now after running a `chmod +x hey` can run it with: ./run hey
Feel free to replace "time" with "eval" too if you don't want your command timed.
This is a really useful pattern because it means you can create a "run" script with a bunch of sub-commands (private or public), auto-render help menus and create project specific scripts without any boilerplate. Bash also supports having function names with ":" in the name so you can namespace your commands like "./run lint:docker" or "./run lint:frontend".
One of my biggest pet peeves is people defining commands and utilities as shell functions and demanding I source some environment setup script instead of just making regular command scripts and running them as normal programs.
My shell is my computing environment. It's rude for a script author to make me change my computing environment when he could just as easily have made a script and created his own sandboxed environment that wouldn't interfere with mine.
People should never ship end user interfaces as bundles of shell functions that have to be sourced.
I remember when I first interacted with Ruby Gems and learned I needed to source some file to make it work, I thought, "Wow, Ruby is so weird and magical they can't even use the shell normally." The joke was on me when Python virtual envs became a thing.
If you don't want the timing, use `"${@}"`, not `eval "${@}"`. The latter will parse all of your arguments through the shell an additional, unwanted time.
For example, take the command `./run echo "Don't"`. When using `time`, this will print `Don't`. If you use just `"${@}"`, it will also print `Don't`.
But if you use `eval "$@"`, this will crash with an error like
eval: line 10: unexpected EOF while looking for matching `''
One thing I like about make is that you get an automated bash completion for targets, which I find helpful. Is there an equivalent sneakiness for doing this with this style of script?
If you check the first link in my previous reply it does include an auto-generated help menu that uses compgen to print a list of functions near the bottom. This way you can run `./run` or `./run help` to get a list of commands. I find this is helpful enough without needing completion, especially since every function ends up being a top level command.
People do underestimate the shell. Particularly I see people shoehorning collections of commands into a Makefile, when a shell script would work just fine.
On the other hand, with a lot of glue work I do, I eventually want to something more complex (use lists and maps, complex string building and regexes, date handling) and while you /can/ do that in bash, I might as well start in Python and have everything in one language and take advantage of things like code sharing via modules. (And yes you can share code in shell, but again it’s not as nice.)
Might as well, yes, but I've found writing shell scripts in Python to be cumbersome because whatever flavor of os.system() I end up using just doesn't work well syntactically. I can run a command and pipe to a bunch more commands way easier in a shell because I'm already using a shell when interacting with the computer. Perl had this figured out, but proved unable to continue evolving (aka adding types, like Python/Ruby/JavaScript have managed to.)
If there's a modern library/workflow that makes this not the case, I'm all ears!
> writing shell scripts in Python to be cumbersome because whatever flavor of os.system() I end up using just doesn't work well syntactically.
This is exactly how I felt. Bash can't handle structured data well. Python (being general purpose programming language) can't handle calling external programs well because it's not the "focus" of the language. My shameless plug solution is a "real" programming language that can do both well (along with proper handling of exit codes and more goodies for "devops"y scripting).
> Bash can't handle structured data well. Python (being general purpose programming language) can't handle calling external programs well because it's not the "focus" of the language.
Perl and Ruby do both very well. Or at least they do the latter in a simpler way than Python, and they're no worse than it for the former.
NGS is programming-language-first while other modern shells are typically shell-first. Multiple dispatch would probably be the most prominent manifestation of this approach in NGS.
> NGS is programming-language-first while other modern shells are typically shell-first.
Not always no.
Even if your point were true, being programmer-first is not always a desirable feature. The vast majority of shell work is basic and repetitive. Most of the time people just want something that functions a little like Bash but less shitty for scripting. The fact that Powershell, LISP nor Python haven’t taken over the world for shell usage proves that a higher level language REPL generally makes for a shit daily shell. So usually you end up with just a small few purists using it. And that’s not really good enough.
Whereas Oil, Elvish, Murex and even Fish (to a less dramatic extent) are looking at what makes a good shell and then fixing shell scripting within that shell. Having gone through the REPL phase myself with a great many different languages and found them all painful for daily use, I’m inclined to agree with shell-first approach.
> Multiple dispatch would probably be the most prominent manifestation of this approach in NGS.
Again, NGS isn’t unique in that regard. Murex has methods, Powershell has methods. I’ve not seen anything new in NGSs “Multiple Dispatch” nor “MultiMethod” docs that other alt shells aren’t also doing.
Don’t take this the wrong way, it looks very impressive what you’re doing. But it’s not unique these days. And I say this having tried a great many options out there.
> being programmer-first is not always a desirable feature.
programming-language-first allows convenient scripting. I could not think of a way to make anything shell-first be convenient for scripting (anything beyond tiny scale).
> higher level language REPL generally makes for a shit daily shell.
I think it proves that general purpose languages, where using them as a shell was afterthought "makes for a shit daily shell".
NGS, on the other hand, has somewhat-bash-like (read: easily run external programs) syntax at the top level, which should be good for CLI. Want to use more advanced features that are typically associated with "real" programming languages? OK, pay the price, switch syntax with { ... } and have full blown programming language at your disposal, in the shell.
> NGS isn’t unique in that regard.
I do scan alternative shells from time to time. I could have missed but I didn't see multiple dispatch in any of them. Which ones have it? Just to clarify: In which shell you can define several methods with the same name and when called, the method to invoke is selected based on the types of the arguments?
There is a huge difference in having methods and multiple dispatch.
> not unique these days
NGS is a mixture of "borrowed" features and unique ones. Multiple dispatch is an old concept and has been implemented in other programming languages. Examples of things in NGS that I have not seen anywhere else: syntax for run-command-and-parse-output, proper handling of exit codes.
Regarding exit codes. Typical approach to exit codes varies. Python - "I don't care, the programmer should handle it". Some Python libraries and other places - "Non-zero is an error". That's simplistic and does not reflect the reality in which some utilities return 1 for "false". bash (and probably other shells too) is unable to handle in a straightforward manner situation where external command can return exit codes for "true", "false" and "error". It just doesn't fit in the "if" with two branches. NGS does handle it with "if" with two branches + possible exception thrown for "error" exit code.
Edit: and some other features around handling exit codes such as short syntax to provide expected exit code (any other exit code throws exception)
Edit: clarification - NGS knows that some external programs have exit code 1 which does not signify an error.
> I do scan alternative shells from time to time. I could have missed but I didn't see multiple dispatch in any of them. Which ones have it? Just to clarify: In which shell you can define several methods with the same name and when called, the method to invoke is selected based on the types of the arguments?
Murex does this but in a slightly different way. Rather than method overloading (which is a bad feature in any language in my opinion but I get it has its fans) murex has APIs that allow for writing data type agnostic methods. in practice that means the same tools to query a length of an array, or grab an item in a map (etc) work irrespective of whether the data type is a JSON array or map, YAML, CSV, S-Expressions or even just ‘ps’ (etc) output…and so on and so forth. This expands out to all methods, meaning you basically have a ‘jq’ like toolset that works exactly the same - same commands etc - for most data formats (certainly the ones I use daily). Which in effect is the same end result as what you’re describing except without the method overloading and instead allowing me as a shell script writer not to worry less about the data type or format of the piped data.
There are a few specific methods that do allow for overloading though, but that’s where functionality is a little more complex (these are usually event handlers though. I don’t use them much so I can’t say how well they work)
> NGS does handle it with "if" with two branches + possible exception thrown for "error" exit code.
Murex does this too. I’m pretty sure I’ve seen other shells do that as well (possibly Elvish?)
Iconically it was actually the error handling that drew me to murex and the smart handling of data types within methods that kept me there. Basically all the same features you’re promoting in NGS. Which is why I’m saying there is a lot out there that does the same.
Anyhow, it’s been interesting reading about your work on NGS. Good luck
murex author here. Hopefully I can answer some questions:
> I do scan alternative shells from time to time. I could have missed but I didn't see multiple dispatch in any of them. Which ones have it? Just to clarify: In which shell you can define several methods with the same name and when called, the method to invoke is selected based on the types of the arguments?
I've not heard of the term "multiple dispatch" but reading the thread it sounds like you're describing function overloading. Powershell does support this with classes[0].
murex does it's overloading at the API level[1]. The reason behind that decision is to keep the methods simple (eg a pipeline might contain JSON or CSV data) because you as a shell user don't want to run into situations where you've written a function that supports one data type but not another, you just want it to work first time. So murex automatically abstracts that part away for you. In addition to the point the other poster mentioned about consistent `jq`-like methods that are data type agnostic, murex allows for easy iteration through objects agnostic of their data type (eg `open somedata -> foreach { do stuff }` -- where you don't need to think about data types, file formats, etc, murex does the heavy lifting for you in a consistent and predictable way).
There are some specific builtins that support "overloading" of sorts but instead of overloading the function they have handlers that call user defined functions[2][3]. This removes the mystery behind function overloading because you can easily trace which handlers exist for which data types.
It's also worth adding that function overloading is supported in the same way that its also supported in Bash too where you might have a function but if there is a alias of the same name that will take priority. There are also private[4] functions (which are namespaced) so you have additional controls to avoid accidental overloading when writing modules too. And the interactive shell expands out any command (eg displays a hint text stating if a command is a function, alias, external command, etc) so there's a reduced risk of being unaware if `ls` is an alias, function, `/bin/ls`, or even a symlink to something else.
Named parameters are optional because neither Windows nor POSIX have any understanding of named arguments in their processes. I did consider abstracting that into murex's functions regardless like Python et al would but I couldn't design a way that wasn't jarring nor cumbersome to write in a hurry nor worked transparently with Windows and POSIX ARGS[]. So I've come up with an optional builtin called `args`[5] which allows you to define flags, which are effectively the same thing as named parameters except they're supported natively by Windows and POSIX ARGS[], so you can write a murex script as a native shell script without leaving the user to write another abstraction layer themselves interpreting an indexed argument into an named arguments. Thus giving you the flexibility to write really quick Bash-like functions or more verbose scripting style functions depending on your need.
> Examples of things in NGS that I have not seen anywhere else: syntax for run-command-and-parse-output, proper handling of exit codes.
Both of these are baked into murex as well. The run-command-and-parse-output is the API stuff mentioned above. It's where the overloading happens.
As for error handling: any command is considered a failure if there is either a non-zero exit code, STDERR contains a failed message (eg "false", "failed", etc) or STDERR is > STDOUT[6]. This has covered every use case I've come across.
Additionally STDERR is highlighted red by default and when a process fails you're given a mini-stack trace showing where in the script the error happened. You have try/catch blocks, saner `if` syntax etc that all use the same API for detecting if a process has failed too. Keeping everything consistent.
The other thing murex has to help catch bugs and error is a testing framework baked into the shell language. The docs for test[7] need expanding but in essence:
- you can write proper unit tests
- you can intercept STDOUT (even if it's mid pipeline) and test that output. This works around instances where (1) you can't have full on unit tests due to side effects in a function that can't be easily mocked (2) you want to add your own debugging routines (rather than just printing values to STDOUT or running commands manually to see their output -- like one would normally have to do with shell scripting)
- you can add state watches. Except in murex the watches are Turing complete so you're not just adding noise to your terminal output but rather putting meaningful debug messages and scripts.
And all of the debugging stuff can be written straight into your normal shell routines and cause no additional execution overhead unless you've purposely enabled test mode.
So its fair to say I've spent a significant amount of time designing smarter ways of handling failures than your average shell scripting language. As I'm sure you have too.
> Regarding exit codes. Typical approach to exit codes varies. Python - "I don't care, the programmer should handle it". Some Python libraries and other places - "Non-zero is an error". That's simplistic and does not reflect the reality in which some utilities return 1 for "false". bash (and probably other shells too) is unable to handle in a straightforward manner situation where external command can return exit codes for "true", "false" and "error". It just doesn't fit in the "if" with two branches. NGS does handle it with "if" with two branches + possible exception thrown for "error" exit code.
If you're describing that as a "two branch" approach then technically murex could be argued as having "three branches" because it checks exit code, STDERR contents, and payload size too. Personally I just describe it as "error handling" because in my view this is how peoples expectations are rather than the reality of handling forked executables.
I guess where NGS and murex really differ is NGS likes to expose its smart features whereas in murex they're abstracted away a little (they can still be altered, customised, etc) to keep the daily mundane shell usage as KISS (keep it simple stupid) as possible. eg you can overload function calls if you really wanted but that can often cause unforeseen complications or other unexpected annoyances right when you least want it to. So murex keeps that stuff around for when you need it but finds ways to avoid people needing to rely on it and furthermore unwraps the covers of what a routine does in the interactive terminal. It's all about offering abstractions but removing surprises.
> NGS knows that some external programs have exit code 1 which does not signify an error.
Having different behaviours for different executables hard coded into the shell is one behaviour I purposely avoided. I do completely understand the incentive behind wanting to do this and wouldn't criticise others for doing that but given external programs can change without the shell being aware, they can be overloaded with aliases, functions, etc, and they might even just differ between Linux, BSD, Windows, etc -- well it just seemed like hard coding executable behaviour causes more potential issues than it solves. You also then run into problems where users expect the shell to understand all external executables but there are some you haven't anticipated thus breaking expectation / assumptions. Ultimately it creates a kind of special magic that is unpredictable and outside the control of the shell itself. So instead I've relied on having foundational logic that is consistent. It's the one side of shell programming where I've placed the responsibility onto the developer to get right rather than "automagically" doing what I think they are expecting. This of course does mean I have to focus even harder on ensuring all other aspects of the shell are predictable and low maintenance so that the developer can cover any specific edge cases with ease. Which goes a long way to explaining why I've chosen the path I've chosen (ie reducing the amount of syntax sugar, overloading, etc needed for daily use that one might rely on in a more traditional programming language).
Somewhat similar. Since methods do not live in classes in NGS, I would argue the mechanism in NGS is simpler (more elegant?). You can just define your_method(c1:Class1, c2:Class2).
> murex does it's overloading at the API level[1].
mmm. I see the support but looking at documentation at https://murex.rocks/docs/apis/Unmarshal.html , I see it's not exposed into the Murex language, it's in Go. Is this correct?
> where you don't need to think about data types, file formats, etc, murex does the heavy lifting for you in a consistent and predictable way
That is good and what I would expect.
( sorry, running out of time, to be continued :) )
Learn something new every day. Seen (and used) this methodology before but wasn't aware it was called that. I'd always heard of it as "overloading" which is conceptually similar but not identically the same.
> mmm. I see the support but looking at documentation at https://murex.rocks/docs/apis/Unmarshal.html , I see it's not exposed into the Murex language, it's in Go. Is this correct?
It's APIs written in Go (for performance and convenience -- you wouldn't want to write a YAML marshaller in a shell scripting language). So the methods are exposed as builtins. However you can add method written in murex if you wanted.
The idea being get 99.9% right by default but leave some flexibility for the user to customise if they want.
Murex builtins are also written in a way that they're all optional includes into the core project. Thus you can easily write your own builtins, marshallers, etc in Go if you want performance and then call them from your shell script (or you could write them in Python, Java, Perl, etc and run them as an external executable if you wanted too. But if you do that you lose access to murex's typed pipelines. Which is where the really interesting stuff happens (I haven't yet figured out a non-shitty way to send typed data over POSIX pipes to external executables).
> sorry, running out of time, to be continued :)
Any time :) I'm finding this conversation really interesting and educational
I should have added (outside of edit time now) that I do like what you've done with NGS. Nothing I've posted above is intended to suggest I disagree with your approach. We're covering mostly the same problems but we've just looked at it from different angles. Which is good -- the more options out there the better I say.
In NGS the common case for multiple dispatch would be to define your own type and then add methods to existing multimethod to handle that type; additionally I've simplified the dispatch algorithm exactly for this reason - unclear answer to "what's going to be called actually?".
> function overloading is supported in the same way that its also supported in Bash too where you might have a function but if there is a alias of the same name that will take priority.
I don't think it is called overloading. That's very different from having two methods with the same name and the "right one" is called based on types of passed arguments.
> interpreting an indexed argument into an named arguments.
In NGS it happens in exactly one place, when main() is called. At that point command line arguments are automatically parsed based on main() parameters and passed to main().
I don't think I understand your reasoning behind not including named parameters.
> NGS likes to expose its smart features whereas in murex they're abstracted away a little
Sounds about right. Power to the people!
> Having different behaviours for different executables hard coded into the shell is one behaviour I purposely avoided.
I can see why. NGS prefers to do the right thing in most cases and have shorter and cleaner code. Yes adding some risk, which I see as not that big - exception when an exit code was "ok" for unknown program (unknown external programs default to exception on non-zero exit code).
I also dream about moving the hard coded information about programs into separate schema, which could be re-used between different shells. That would be similar to externally provided typescript definitions for existing code, described at https://www.typescriptlang.org/docs/handbook/declaration-fil...
> differ between Linux, BSD, Windows,
"detect program variant" feature in the schema I mentioned above
> hard coding executable behaviour
Yep, smells a bit
> causes more potential issues than it solves.
I think in particular situation with exit codes the risk is low.
> users expect the shell to understand all external executables but there are some you haven't anticipated thus breaking expectation / assumptions
Yes. Downside. That's the kind of surprises that suck. Need to make sure at least the docs are clearly warning about this.
> So instead I've relied on having foundational logic that is consistent.
Totally understandable.
> STDERR contains a failed message (eg "false", "failed", etc) or STDERR is > STDOUT[6]. This has covered every use case I've come across.
Sounds also like potentially surprising, somewhat similar to handling exit codes. Yes, it's not per program and that's why it's better but still.
> Additionally STDERR is highlighted red by default
As it should! Yes!
> mini-stack trace
Yes, please!
> testing framework baked into the shell language.
> So its fair to say I've spent a significant amount of time designing smarter ways of handling failures than your average shell scripting language. As I'm sure you have too.
I guess. Differently of course :)
> Personally I just describe it as "error handling" because in my view this is how peoples expectations are rather than the reality of handling forked executables.
Sounds like I was not clear enough as if something is not uniform in this regard in NGS. Exceptions are not only for external programs. It just happens that external program can be called inside "if" condition; an exception might occur there.
> > function overloading is supported in the same way that its also supported in Bash too where you might have a function but if there is a alias of the same name that will take priority.
> I don't think it is called overloading. That's very different from having two methods with the same name and the "right one" is called based on types of passed arguments.
Yeah you're right that's not overloading. I'm not really sure why included that in my description.
> In NGS it happens in exactly one place, when main() is called. At that point command line arguments are automatically parsed based on main() parameters and passed to main().
> I don't think I understand your reasoning behind not including named parameters.
there isn't really an entry point in murex scripts so
function foobar {
out foobar
}
foobar
is no different from
#!/usr/bin/env murex
out foobar
and no different from typing the following into the interactive terminal
$ out foobar
Which means every call to a builtin, function or external executable is parsed an executed as if it has been forked, at least with regards to how parameters are just an array (actually on Windows they're not even an array. Parameters are passed just as one long string, whitespace and quotation marks and all. Which is just another example of why Windows is a shit platform). This is a limitation of POSIX (and Windows) that I decided I didn't want to work around because otherwise some parts of the language would behave like POSIX (eg calling external programs and how named parameters are passed as `--key value` style flags vs scripting functions where named parameters positional arguments).
I wasn't really interested in creating a two tier language where parameters are conceptually different depending on what's being called so I decided to make flags easy instead:
function hippo {
# Hungry hippo example function demonstrating the `args` builtin
args: args {
"Flags": {
"--name": "str",
"--hungry": "bool"
}
}
$args[Flags] -> set flags
if { $flags[--hungry] } then {
out: "$flags[--name] is hungry"
} else {
out: "$flags[--name] is not hungry"
}
}
That all said. I'm still not 100% happy with this design:
- `args` still contains more boilerplate code than I'm happy with
- murex cannot use `args` to automatically generate an autocomplete suggestion
So I expect this design will change again.
> Sounds also like potentially surprising, somewhat similar to handling exit codes. Yes, it's not per program and that's why it's better but still.
I don't see how that's surprising because it's literally the same thing you were describing with NGS (if I understood you correctly).
> Sounds like I was not clear enough as if something is not uniform in this regard in NGS. Exceptions are not only for external programs. It just happens that external program can be called inside "if" condition; an exception might occur there.
It was me who was unclear as I assumed you meant it was the same error handling for builtins and functions too (as is also the case with murex). I just meant the "reality of handling forked executables" as an example rather than as a commentary about the scope of your error handling :)
> there isn't really an entry point in murex scripts
I have a nice trick in NGS for that. Under the idea that "small scripts should not suffer", script is running top to bottom without "entry point". However, if the script has defined main() function, it is invoked (with command line arguments passed).
> `args` still contains more boilerplate code than I'm happy with
Is there anything preventing you to have exactly the same functionality but with syntactic sugar that it looks like parameters declaration? (Just to be clear, keeping all the ARGV machinery).
Something like (assuming local variables are supported; if not, it could still be $args[Flags] etc):
function hippo(name:str, hungry:bool) {
if { $hungry } then {
out: "$name is hungry"
} else {
out: "$name is not hungry"
}
}
> I don't see how that's surprising because it's literally the same thing you were describing with NGS (if I understood you correctly).
Yes it is almost the same in NGS with regards to exit codes, which you preferred not to do in Murex. On the other hand, checking stderr looks very similar to knowing about exit codes and here you decided to go for it. I'm puzzled why. It is somewhat fragile, like knowing about exit codes.
> It was me who was unclear as I assumed you meant it was the same error handling for builtins and functions too (as is also the case with murex). I just meant the "reality of handling forked executables" as an example rather than as a commentary about the scope of your error handling :)
Here I lost you completely but I hope it's fine :)
In NGS, exception handling is used throughout the language, consistently, well at least I hope it is.
> Is there anything preventing you to have exactly the same functionality but with syntactic sugar that it looks like parameters declaration? (Just to be clear, keeping all the ARGV machinery).
> Something like (assuming local variables are supported; if not, it could still be $args[Flags] etc):
> example code
oooh I like that idea. Thank you for the suggestion.
> Yes it is almost the same in NGS with regards to exit codes, which you preferred not to do in Murex. On the other hand, checking stderr looks very similar to knowing about exit codes and here you decided to go for it. I'm puzzled why. It is somewhat fragile, like knowing about exit codes.
No, murex still checks for exit codes. Essentially a program is considered successful unless the following conditions are met:
+ exit code is > 0
+ STDERR in ('false', 'no', 'off', 'fail', 'failed')
+ or []byte(STDERR) > []byte(STDOUT)
The exit code is self explanatory
STDERR messages are useful for functions that return a message rather than exit code. Particularly inside `if` blocks, eg
= 1 == 2 -> set math
if { $math } then { out: $math } else { err: $math }
will return `false` to STDERR because $math == "false" (ie `= 1 == 2` will return "false")
As for the STDERR > STDOUT test, that covers an edge case where utilities don't return non-zero exit codes but do spam STDERR when there are problems. It's an uncommon edge case but the only time this condition is checked is when a command is evaluated in a way where you wouldn't normally want the STDOUT nor STDERR to be written to the terminal.
This also allows you to get clever and do stuff like
if { which: foobar } else { err: "foobar is not installed }
where `if` is evaluating the result of `which` and you don't need to do more complex tests like you would in Bash to test the output of `which`.
As in not throwing exception or as in evaluates to true?
> exit code is > 0
You refused to get into specific exit codes for specific programs citing potentially surprising behavior (which I agree will sometimes be an issue).
> STDERR in ('false', 'no', 'off', 'fail', 'failed')
I correlated this with the following source code:
> if len(s) == 0 || s == "null" || s == "0" || s == "false" || s == "no" || s == "off" || s == "fail" || s == "failed" || s == "disabled" { return false
... which also has a potential for surprising behavior. (side note: appears to work on stdout, not stderr)
> or []byte(STDERR) > []byte(STDOUT)
Does that mean len(stderr) > len(stdout) ?
... also has a potential for surprising behavior. I can easily imagine a program with lots of debug/warnings/log on stderr and truthy/non-error output.
> STDERR messages are useful for functions that return a message rather than exit code.
Might be problematic because typically exit codes are used for that.
> where `if` is evaluating the result of `which` and you don't need to do more complex tests like you would in Bash to test the output of `which`.
Mmm.. Never did this. In which situation looking just at exit code of `which` is not good enough?
> if { which: foobar } else { err: "foobar is not installed }
Just to show off :) in NGS that would be:
if not(Program('foobar')) {
error("foobar is not installed")
}
I use Make extensively for glue scripts or build scripts that call other tools. Make gives you four big advantages over a pure shell-script:
1. Tab completion. All major bash-completion packages know how to show Makefile targets in response to a TAB.
2. Parallel, serial, or single-target execution.
3. Automatic dependency resolution between tasks. Tasks that build files can also use timestamps to see what needs rebuilding.
4. Discoverability. Anybody who sees a Makefile will usually understand that something is supposed to be run from the Makefile's directory. Chances are good that they'll check the tab-completions too. There are conventions for standard targets like 'clean' and 'all'.
If you have a project with a build-process that has a bunch of small tasks that you might sometimes want to run piece-by-piece, Make is the perfect tool IMO.
Since make does not sanitize input or handle error, I use it only for parallelism/dependency management and offload all build to shell scripts. I've found this to be way more maintainable.
Discoverability goes out the window the instant someone uses something like automake unfortunately. Then the makefile becomes an absolute mess of dummy targets and near gibberish.
I write a LOT of bash/shell scripts. And I don't like it, it's just part of what I have to do.
Learning a handful of bash idioms and best-practices has made a massive impact for me, and life much easier. The shell is something you cannot avoid if you're a programmer or other sort of code-wrangler.
You can interact with it + be (mostly) clueless and still get things done, but it's a huge return-on-investment to set up "shellcheck" and lookup "bash'isms", etc.
----
(Off-topic: I am convinced Ruby cannot be beaten for shell-scripting purposes. If I had a wish, it would be that every machine had a tiny Ruby interpreter on it so I could just use Ruby. I'm not even "a Ruby guy", it's just unreasonably good/easy for this sort of thing. And I keep my mind open for better alternatives constantly.)
I'm not sure how much closer to describing your exact intent in English a language can get than:
successfully_made_executable = system 'chmod +x /usr/local/bin/hasura'
abort 'Failed making CLI executable' unless successfully_made_executable
Though I have NOT written Perl (either old Perl, or Raku/Perl 6) but I do believe it may be roughly this semantic too.
EDIT: Looks like Perl/Raku is essentially the same as Ruby in this regard. So besides it being a whacky language, take that for what you will:
$successfully_made_executable = shell "chmod +x /usr/local/bin/hasura"
die 'Failed making CLI executable' unless $successfully_made_executable.exitcode is 1
Oh yeah, bash functions are great and absolutely abusable. Sometimes you need some grand hacks to get it to work well, but when it works well, it can do some magic. You can even export functions over ssh!
I wrote this a few years back which ran on bunches of hosts and fed into an infrastructure network mapper based on each hosts' open network sockets to other known hosts. It wasn't really feasible to install a set of tools on random hosts.. but I still had root ssh access across the board. So I needed something tool agnostic, short, auditable, and effectively guaranteed to work:
Exactly -- check out script in the link, it shows how it would be used. I'm not sure why the first `remote_cmd` is called (probably local testing and forgot to delete it), so ignore that.
Try this and you'll see how it returns a dramatic amount of bash as its output:
Ah, makes sense! Yeah, that's nice. You can do something similar with scripting languages by piping source code into the interpreter. Super useful back in the day for doing crap on machines in an LSF cluster when the NFS was down/slow.
I would imagine there would be performance implications of defining every bash function as a subshell, is why it's not universally recommended to define functions this way?
It probably doesn't matter too much if you have only a handful of function invocations in your exec. But if you have a a couple orders of magnitude more... RAM is going to be an issue too maybe.
Creating a new process for every single function invocation seems crazy to me, and but might actually be just fine for many "ordinary" use cases? (Although might not have been on computers of 20+ years ago, which might also be why it's not something advised, so much of bash tradition is decades old?)
Of course the subshell is copy-on-write so the ram requirements shouldn't be huge. But assuming some stack you are using at least some stack in each process you are looking at 4k per call which adds up fairly quickly.
If you care about performance, you better don't write shell scripts. The typical task of a shell script is to start programs and connect them in a meaningful way. This is in itself a pretty expensive task performance wise.
So arguing about the cost of sub-shells is somewhat besides the point.
Yes, and that's my reaction too. While I can see the rationale for always starting another process, in practice I haven't found the leakage to be a big problem.
Actually Oil functions don't use dynamic scope, but this is done in-process, not with another process:
Also nested functions don't really add much useful to shell. It's purely a matter of textual code organization and doesn't affect the semantics. I define all functions at the top level.
If your hot path needs that level of micro-optimisation then you’re far better off rewriting it an language that compiles instead of interprets and forks. Even Ruby and Perl would run circles around Bash.
> Subshells are, as the name suggests, running in a subshell. They don't strictly have to be OS subprocesses
Are there really cases where subshells are invoked within the same process? In my experience, it has never been the case. That's why I've been trying to minimize the use of subshells because spawning a new process is a bit slow.
By creating a comma-delimited list of command-line arguments, the parsing logic and control flow can be influenced from a single location in the code base. The trick is to eval-uate "log=utile_log" when the "-V" command-line argument is provided (or assign ARG_ARCH to the user-supplied value after "-a"). Using the $log variable invokes the function, such as in:
preprocess() {
$log "Preprocess"
}
If "-V" isn't supplied, then every invocation of $log simply returns without printing a message. The upshot is a reduction in conditional statements. Function pointers FTW. I wrote about this technique in my Typesetting Markdown series:
I'm quite happy to see that something Bash-related is on Hacker News! Unfortunately it seems that I don't really agree with much the author...
While I do agree that it would be nice to be able to have 'local' functions and have inter-function cleanup work better, the logical conclusion for me was not to use function subshells. Since the use case is for larger programs (where different functions may want to have their own cleanup mechanisms), I'm opting to go for more of a library route. For example, I'm working on a Bash library that includes a function to allow different sources to add (and remove) functions to the same `TRAP`. A similar function may be useful, possibly involving the `RETURN` trap and the `-T` flag for the use case the author brings up. Obviously, using a package manager for _Bash_ of all languages brings in a lot of overhead, but I think it can be quite powerful, especially with a potential "Bundle" feature that makes scripts work without the package manager.
Concerning specifically the use of subshells, (as other commenters have pointed out) it significantly reduces performance. I also disagree that dynamic scoping is necessarily bad for Bash. I find it quite useful when I need to use various common functions to manipulate a variable - since modifying and 'returning' variables from a function is usually either slow or verbose with Bash. Admittedly though, this feature is quite annoying at times - for example, most public functions in my Bash package manager[2] all have their variables prefixed with two underscores - because they `source` all the shell scripts of all package dependencies - so I want to be extra certain nothing weird happens
Bash is completely overlooked in technical interviews, at least the ones I've been involved with. But once in the role you'd find bash scripts keeping the lights on behind the scenes.
At one time, I did learn myself to write shell scripts. I even wrote this 3Kl line monstrosity [0]
However, I would strongly advice to master a proper programming language. I respect the article and the efforts of the author, but I feel that it is the past.
I mastered Python a bit and the ability to just use things like dictionaries, proper parsing libraries and such, instead of kilometers of fragile pipes, it is so much better.
I understand something like Python may feel total overkill, but that 10 line shell script suddenly needs quite a bit of error handling and some other features and before you know it, you wish you started out with python or something similar.
Interesting idea. I have thought about doing the trap cleanup but found it cumbersome to reason about when there are many functions so this is helpful. I would like to have seen a complete example at the end rather than just explaining why it's cool and then leaving it to the reader to imagine what it looks like.
Besides cleanup, one thing I think I would love to see is a good mechanism for logging. I have started to build a file for functions and then other files, which source that as a library and calls the functions as needed. I would love to be able to tell the library functions to log something if the parent file wants it, print to stderr or stdout by default or be silent if the caller wants that instead.
I have used bash to write an OCR processor that called a python wrapper around tesseract, and then turned the pdf output into json to go into a solr search database by parsing the output with sed.
I recently discovered, similar to the author of the post for this thread, that local variables are dynamically scoped.
I have been writing a lot more shell scripts lately, using a "library" [1] of sorts I've been writing. When I was debugging one of my scripts that uses mycmd, I discovered that I had failed to declare some of my variables local and they were leaking out to the global scope.
I had recently added functionality to call a set of functions on script exit, so I added something that would output the defined variables, in hopes that I could write something that will output them at the beginning and then the end and show the difference. I was surprised when variables defined in my dispatch function [2] for those at exit functions were showing up, even though they were definitely defined as local. It was then that I dug around and discovered the dynamic scope of variables.
I've been trying to figure out how to accomplish what I desire but exclude those variables from calling functions. I haven't been able to find an obvious way to see if the variable is coming from a calling function. I might be able to use techniques like you've pointed out in your linked post to add the tracing that I want. Still need to think more on this.
Yep, do this OP... don't try to hack in some automated script that now has a race condition with device setup. udev has a ton of hooks for enabling, disabling and doing anything when device state changes.
Look into NOPASSWD in the sudoers manpage. You can just put the code in a script then give %wheel (or whomever) NOPASSWD access to run it. This can also be thrown in sudoers.d for ease of copying and managing config across machines.
We used to have very long shell scripts and recently I refactor most of them to use functions and shellcheck in our presubmit. This has greatly helped catching bugs and improving readability
Just don’t forget to install the correct Ruby version, of course keeping the system’s Ruby intact, so better use RVM for that, and of course you shouldn’t install random gems globally, so better do the whole thing via Bundler, and of course don’t forget to check in the Gemfile.lock too.
Actually, you can expect a recent version of Python 3 from most distros these days. Stdlib is quite powerful and no hurt would come from a small number of globally installed libs with stable APIs, like click and requests.
The key word there is "these days". There have been a lot of computers built and software set up in the last ~20 years that are still getting used today and will be for a long time. Not every environment is using $latest.
Despite this you can expect your bash scripts written today to work on any of these machines and their ancient OS installs. This is primarily because people writing in Bash care enough to not use new features, not the lack of new features in Bash.
Python 3 (3.6) is available in the base repos for Ubuntu 18.04 and CentOS 7. Ubuntu 16.04 was EOLed in 2021-04 and CentOS 6 is EOL since 2020-12. Using anything older is plain insecure (unless you are paying for ESR contracts beyond the EOL date of the LTS). I am definitely not talking about "latest".
Canonical offers free Ubuntu Advantage for 3 machines which gives you extended service maintenance for even older distros like 14.04 (which is what I am using to type this). Ubuntu is a great distro because it supports each LTS for a full 10 years. Constantly upgrading and breaking things and having to switch software and workflows is way more insecure imo.
Additionally, there are tons of devices out there that will never get updated that run even older unsupported linux distros and do their job just fine. And modern written bash will run just fine there too.
> Just don’t forget to install the correct Ruby version
This justification for bash has always been perplexing to me. If I’m operating in an environment where my ONLY reliable infra invariant is “bash will probably work”, cleaning up the org’s infrastructure clusterfuck is probably my #1 priority.
(Or I guess in a few cases the fleet are not boxen but little embedded/iot devices, in which case you probably don't want to be running any of these sorts of scripts for a whole host of reasons...)
> of course you shouldn’t install random gems globally… Or you can just use bash and keep your sanity.
What are some scenarios where you’d need a gem in Ruby but Bash just works?
The only situations I can think of are cases where you’re using Bash to call out to some executable that does fancy things. Which (1) you can do from any scripting language anyways, and (2) means you’re just shifting dependency management from the package manager to the image/Dockerfile.
But actually, the combination of justifications here is particularly perplexing. If the org has a stable way of handling system images, then you'll know which version of $scripting_language should exist and where it's installed. The only way you end up with language version woes is if you don't have standardized infra. BUT... if you don't have standardized infra, and Bash can do things that Ruby can't without special Gems, then it stands to reason that you're in a situation where your Bash scripts depend on the magic state of individual unicorn boxes?! Which is particularly fragile and frightening and far worse than installing some local gems or whatever!
IDK. The purported benefits of Bash always sound like they flow out of environments where there are basically no fleet-wide infrastructure invariants.
"Just use $scripting_language" might be the best advice in this thread just as a sort of canary in the coalmine. I.e., if your org can't "Just use $scripting_language" because "which version?" then the team will probably benefit tremendously in an infinite variety of ways from an afternoon of infrastructure cleanup. Regardless of whether they use bash or a scripting language going forward :)
The advantages of bash are almost all related to it NOT being a "real" programming language. The terseness, ease of writing self-modifying code and anonymous functions, lack of typing, flexible syntax, easy interoperability with any other language and program through any available interface, are not really desirable for writing stable and maintainable code. They are hugely desirable for quickly hacking something together, testing and learning, and the numerous simple scripting tasks involved in system administration.
> They are hugely desirable for quickly hacking something together
Undeniably, yes. I was there in the 90s ;-)
But hacking things together in a way that's robust is difficult, and bash isn't a good match for that difficulty.
These days I mostly operate in the realm of "how can I enable others to hack things together without blowing tens of millions of our dollars and their very early career on a stupid mistake".
But it is so full of landmines that even those quick dirty hacks will fail.
Also, don’t even start me up on self-modifying code, we have one at a work project and it sometimes just fails and results in inserting the same echo statement at each run, resulting in every bootstrap displaying 2^n messages, depending on when have I last cleaned it up…
> This justification for bash has always been perplexing to me. If I’m operating in an environment where my ONLY reliable infra invariant is “bash will probably work”, cleaning up the org’s infrastructure clusterfuck is probably my #1 priority.
That is not your job. Your job is to get that machine working using whatever is already installed. Adding a new package means going to the production committee with your proposal and justification and analysis of the increased threat surface.
>> This justification for bash has always been perplexing to me. If I’m operating in an environment where my ONLY reliable infra invariant is “bash will probably work”, cleaning up the org’s infrastructure clusterfuck is probably my #1 priority.
> That is not your job.
This sort of thing is definitely your job have the word "Principal" in your job title, and probably also if the word "Senior is in there as well ;-)
And in any case everyone is responsible for excellence in the milieu in which their team operates. If Senior or even fresh grad Jr. comes to me with a solid good idea I'll champion for it as if it were my own baby. And then recommend/fight for rapid promotion in the case of Jr or put in a good word for promotions for the Sr.
If you recommend your org have standardized images with well-documented info about language versions etc. and the answer you get from your management/tech leadership is "not your job", I recommend finding a new job.
> Adding a new package means going to the production committee with your proposal and justification and analysis of the increased threat surface.
The context of my quote was "knowing which version of ruby/perl/python is installed". There's almost certainly a version of one of those on your standard linux machine, and everyone pushing to prod should damn well be able to look up exactly which one.
> Adding a new package
The general debate here goes way beyond adding a new package. Good infra needs WAY better invariants than "definitely bash is installed in the usual place". If a concern is "IDK which version of Ruby is installed on the machines I'm targeting" then either you're fighting fires and need to keep every intervention really damn simple or else your org as Real Issues. In either case, bash is the enemy.
> The defaults are everything.
Those defaults aren't handed down from Gods. Your org chooses them.
Installing ruby/predicting the version of ruby installed/writing ruby that can run on any version installed... is unfortunately non-trivial.
I think pretty much the only reason people write bash is because you have an incredibly high chance of bash being installed and a version of bash being installed that will run whatever bash you write just fine.
Perl is honestly almost as reliable to be there predictably and compatibly... but I guess people would rather write bash than Perl?
Is there any risk of Bash ever going away? It seems like it's the de facto shell. I remember considering whether or not I should learn Perl at one point. It didn't even feel like a choice with Bash. Trendy shells seem like they have no choice but to support it too.
I feel like what usually makes me reach for something beyond Bash is really a matter of wanting or needing some dependency that wasn't written in it for whatever reason. Usually this happens right at the point where the script/utility starts to turn into a library/program, so it's trivial to just transpose the control flow into whatever language is required at that point and go from there. This of course raises the type of concern you mentioned about Ruby, but at that point it's hopefully worth the trouble to address.
Bash might be consistent, but what about the programs you're calling out to? Even basic utilities have different options between BSD and GNU Coreutils. Something like git might not have options you're expecting due to differences between versions. Or if you need to download a file using HTTP, you will run into a problem when you run on a machine that has wget when your script was expecting cURL.
And yes, you have these sorts of problems with other languages, but my point here is that Bash doesn't free you from them.
If you use Bash for programming, you have to stop thinking in terms of the holier-than-thou software engineer, whose ego believes that a superior, "clean" design makes a superior program. You should embrace globals. You should switch between using or not using the enforcement of set variables or program exit status. You should stop using Bashisms and subtle, obscure language features unless you absolutely have to.
Bash is not a "real" programming language, so do not treat it as one. Do not look for hidden features, or try to do things in cute ways that nobody else uses. There is no superior method or hidden knowledge. Just write extremely simple code and understand the quirks of the shell.