I am hoping this appears at https://pubs.opengroup.org/onlinepubs/9699919799/ soon. This is the link I use most often to go through the specification. In fact, I owe a lot of my shell scripting skills to this online resource.
As a specific example, the seemingly simple matter of when the shell decides to split a string based on $IFS and when it does not were quite confusing to me until I went through the specification here: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...
For example, if
a="foo bar"
then
ls $a
will split the value into two fields (thus two arguments to ls). Of course we should surround $a with double-quotes to avoid the field splitting. However the following is fine:
case $a in
No field splitting occurs here. However, to be kind to your code reviewer, you might want to double-quote this anyway for the sake of simplicity and consistency. Behaviour like this is specified in sections "Field Splitting" and "Case Conditional Construct" of the aforementioned link. Specification documents like this were formative in in my journey toward learning to write shell scripts confidently.
The note on the front page of the Austin Groups' website says as much with regards to publication:
> June 14, 2024: IEEE Std 1003.1-2024 has been published by IEEE. The Open Group Base Specifications, Issue 8 has been published by The Open Group. At this stage only PDF is available. The HTML edition to follow soon.
I do the same, except for the "x" (that is, "set -euo pipefail"); depending on what you're doing, "set -x" might be helpful, might be too much noise, or it could even break things which were not expecting the extra output (and in the worst case, it might end up echoing secret tokens into your build logs).
I find it to be a good default when writing a script, but yes, it can get noisy and potentially leak stuff, if that is what your script deals with. That hasn't been a concern in the settings I usually use it, though.
I am not sure how it could break anything though, unless you are parsing stderr of your script in a subsequent step, which would seem unusual anyway.
Oh wow, I didn't know that. I am pretty sure I had a sh trip up on it just recently, so I thought it still was bash only. Yes, use it everywhere possible.
Emphasis on now - it's new in this version, so probably expect a little bit of delay before it's actually available everywhere. But yes, excellent addition and I'm happy to have it available more broadly now.
I think many people are even more surprised by this:
x=$a # not split! It means the same thing as x="$a"
They were taught that you have to quote everything, which is a reasonable rule to follow, but it's not true.
---
I never wrote about this on the Oils blog (https://www.oilshell.org/ ), but the post would be titled:
Shell Has Context Sensitive Evaluation
Basically the two contexts you should think of are:
(1) EVAL WORD SEQUENCE
This occurs in 2 places in POSIX shell:
ls $x$y # simple command is a sequence of words
for i in $x$y; do echo $i; done # for loop
And 1 place in bash:
a=( $x$y ) # array literal
In these cases, the shell "wants" a sequence of strings, not a single one. So it does splitting.
---
(2) EVAL WORD TO STRING
But there are many other contexts where the shell does not "want" a sequence of strings.
It wants a SINGLE string. And conversely, it actually JOINS arrays of strings, rather than splitting.
Usually "$@" is an array / sequence of strings, while $@ or $* is a string, roughly speaking.
But the shell doesn't want sequences of strings in MANY cases, e.g.
a=$@ # I only want 1 string here, so I JOIN rather than splitting
echo hi > "$@" # redirect arg (not all shells agree though!)
case "$@" in ... esac # as you point out
So the bottom line is that variables aren't really strings OR arrays of strings. Whatever the shell wants, it converts it to.
And shells also DISAGREE on the specifics of those rules. POSIX shell has the array "$@", but arrays in general are not in POSIX.
---
And even worse, think about this case:
local x=$a
Does it behave like an assignment, which wants a single string?
Or does it behave like a simple command, which wants a sequence?
You can look at it both ways. The bottom line is that assignment builtins are special and they don't follow the normal rules of simple commands. Shells have differed, but POSIX decided on this awhile ago.
---
This is all of course mind numbing trivia that has no real reason for existing ... YSH fixes it, and it's now pure native C++, no more Python.
Sometimes when I'm being lazy, say for example I have a variable with some folder/file absolute path... I'll convert the slashes to spaces, and word split there, sending the results into the stack.
That's also one of the more notable incompatibilities of zsh - by default it treats $a the same as "$a", and you're supposed to use arrays if you want multiple words. Although I'm not an expert - maybe ysh and zsh differ in the details here?
Yup, in terms of ls $a -- YSH happens to be like zsh. OSH is compatible with bash and does the POSIX word splitting, but YSH is not.
In general YSH is pretty different than zsh though -- it's more of a Python- JS-like language with structured data, e.g.
ysh$ var a = ['list', 'of' strings']
ysh$ write -- @a
list
of
strings
In zsh I still think that's the pretty obscure "${a[@]}" rather than @a.
Arrays are also "flat" in zsh -- you can't have an array of arrays, because there's no garbage collector. But YSH has arbitrarily nested JSON-like data structures, and JSON serialization built in.
I agree! You can blame me for some of those proposals :-), my thanks to the POSIX team for getting this out the door.
The sed -E option makes it easy to portably use extended regular expressions.
The find -print0, xargs -0, and read -d provide portable ways to securely process lists of files. They were already widely implemented, but now they're officially part of the spec and can be counted on being present in many other places.
Also c17 -G to create shared objects, SIGWINCH and tcgetwinsize() to query the size of a terminal window, lots of new shell features, make is now somewhat useful, gettext() and associated commands are in, asprintf(), C17 support, strlcpy, strlcat, and many more all new and exciting features.
Finally! It always seemed very strange to me that posix said that shared objects were a thing and provided a rtld API for using them, but never specified how to create them.
I for one am overjoyed that i can now type readlink instead of invoking a shell script. I know newbies will also be overjoyed now that they can just read a LLONG_MAX page manual containing the solution to all their problems
Is it mostly for shell scripts? Aren't people targetting bash or basic bourne shell features intead of posix? Is shellcheck checking for best practices instead of POSIX compliance?
And for other applications (GUI, servers, etc) strict POSIX compliance might be too restrictive?
And with many things being Linux (or Linux-like like WSL) the need for this might be less?
Are Android and/or iOS fully POSIX compliant?
Any good blog or presentation describing the current state of POSIX?
It's as good as any other standard: as a baseline of agreed-upon "correct" behavior for interoperability. Anything you add from there is gravy. It's more for implementers than users, e.g. for someone writing a new shell rather than writing shell scripts. Having the standard handy is also pretty useful when writing foreign interfaces, like the posix modules in python and perl.
As for the current state of POSIX, well, you're looking at it. Might find a blog or two of someone on the POSIX committees, but the organizations aren't the kind that keep blogs. Probably best to just dive into the Wikipedia article on POSIX and start following the references on the bottom. You'll probably want to look into SUS, the Single Unix Specification, as well: it's identical to POSIX (plus curses for some reason) but it's the label that OS vendors may use rather than POSIX. macOS and some Linux distributions claim to be fully SUS-compliant; Linux as a whole does not, because its official scope is limited to the kernel which only implements a subset of POSIX.
Fun fact: the name "POSIX" was coined by Richard Stallman.
One big factor is for performance reasons in shell scripts. At the time, the switch decreased boot times for Debian by 7.5%. Bourne shell features add a lot of overhead and that's not always an acceptable tradeoff.
Also, if you're using bash features in a script, you can always just add #!/bin/bash to the top of your file instead of #!/bin/sh to force a bash compatible shell.
> One big factor is for performance reasons in shell scripts. At the time, the switch decreased boot times for Debian by 7.5%.
And that should no longer be a relevant factor, since most of the boot process is now implemented directly in C (within systemd), instead of a bunch of shell scripts.
Prefer `#!/usr/bin/env bash` instead, since /bin/bash isn't a standardized location for bash (even across Linux distros). The former causes $PATH to be searched for bash.
This is pretty common advice but I think it is fighting the previous war. This idea is useful for virtualenv-type tricks if you want to ensure use of your personal version of the interpreter on a shared system, but you have to boil an ocean of scripts. You don't know if you caught them all. Docker won instead - a quick filesystem namespace comprehensively catches everything. Just use #!/bin/bash.
EDIT: I was thinking about Linux, but I suppose macOS users are stuck with needing this for Homebrew-supplied bash?
#!/bin/bash won't work on, say, nixos, and as you noted, many non-linux platforms. It's a few more characters to do something (more) portable! You're right that docker (or something like nix flakes, which are lighter-weight/easier to introspect imo) are probably a better solution in the long run, though.
POSIX omits from standardisation what is not necessary for an application to work, so that a wide range of systems can be supported. As for Shebang, it can be rewritten in the installation script and is therefore considered an area of system administration.
> One big factor is for performance reasons in shell scripts. At the time, the switch decreased boot times for Debian by 7.5%. Bourne shell features add a lot of overhead and that's not always an acceptable tradeoff.
I seem to recall it was much smaller than that, something like 4% on a 2008 EEE-PC or something like that, but I can't find any numbers on that right now.
The Debian startup scripts were already POSIX; it's not hard to get better performance out of zsh or bash by avoiding expensive processes lookups.
Overall, I consider this to be mostly a myth, or at least extremely simplistic.
> Aren't people targetting bash or basic bourne shell features intead of posix?
I know many banks still have AIX systems with shells like ksh89, ksh93, etc. as the default shell. So if a shell script is written to work with a POSIX shell (instead of a particular shell), it has a better chance of running on such systems.
Also, on Debian, the default non-interactive shell is dash [1]. This is the Debian Almquist Shell (dash). It is a POSIX-compliant shell derived from ash. So again, if we write system scripts for Debian and want it to run on Debian without any hassle, it makes sense to write the system scripts to conform to POSIX shell. Although shellcheck cannot perform full POSIX compliance check at this time, it is still a pretty good tool that can help with checking compliance with dash in particular.
> So again, if we write system scripts for Debian and want it to run on Debian without any hassle, it makes sense to write the scripts to conform to POSIX shell.
Or explicitly use bash in your shebang.
One of the problems with Bash is that it insists on doing bash-y things even when you tell it to act like sh.
People ask why you should write (or at least test) code to be multi-platform (even the basics of running it on BSD or macOS): it's because it forces you to be honest. Things change and initial assumptions may not be the same forever.
Indeed, Bash is always available on Debian! After all, Debian uses Bash as the default interactive shell. But that's not the point of writing system scripts for dash. They are written for dash because that's still the default non-interactive shell. And it is so because dash is leaner and faster. Quoting from the link I posted in my previous comment:
> Since it executes scripts faster than bash, and has fewer library dependencies (making it more robust against software or hardware failures), it is used as the default system shell on Debian systems.
My view is that shell scripting feels like the wild-west, so I try to conform to POSIX to maintain some level of sanity, though it feels restrictive at times. I rely on ShellCheck to help me write shell scripts that are POSIX compliant.
I’ve wondered the same. The most common standard I’ve seen in everyday work for a long time now is “runs on my Mac and the Linux server we’re deploying to”.
I’m not talking about shops that ship software that customers receive and install on prem on their HPUX or whatever. That’s still a thing and people have to take that into account. I’m grateful I’m no longer among them.
Every now and again, I get annoyed by those cases where Linux has some API/utility/etc but macOS doesn't. Getting that API/utility into POSIX greatly increases the odds that Apple will end up implementing it. (Whether just by copying it from FreeBSD, or by writing it themselves.)
Embedded shell scripts typically end up targeting the common denominator of bash/BusyBox/coreutils but the POSIX standards are a pretty good reference too. People often don't realize this, but standards are useful even if they aren't implemented 100% fully and correctly.
I’m probably going to be downvoted for this, but anyway: the single unix specification has always been the only place where i could find C headers documentation in by header file.
Meaning: i don’t want to know all the 6000 functions glibc support, i want to know what will (for example) net/inet.h will bring to my code (ideally with documentation).
For sonme reason that doesn’t seem to be a thing. Not for glibc for sure. But the SUS does that. And i like it.
I keep getting surprised at how little the tech industry as a whole seems to care about documentation. Way too many authors just upload the new PDF to some website - of course overwriting the old one.
As an implementer I'm often more interested in the exact changes than in the current wording. My product is already supporting the old spec, what do I need to change to support the new one? A redlined version is more valuable than the full PDF. Bonus points if it actually comes with the reasoning behind it so I don't have to guess why some seemingly-arbitrary change was made.
My dream documentation is a simple Markdown file (or similar) stored in a git repository. It allows me to see the current version, the old version, the diff, and the commit messages can even store the reasoning.
Though with a sufficiently-gnarly release history, even the Git repo might not tell the full story. I've recently been tracing the history of one particular library that's been around since the early 2000s, and hardly any of the versioned Git tags correspond exactly to the files in the released tarballs. Finding all the releases was quite tedious: the tarballs were published on multiple websites, some tarballs were updated in place (without changing the version number), one website (which held some versions exclusive to it) routinely deleted very old versions, and that website also no longer exists outside the Internet Archive. Overall, some of the releases have been totally lost to time, and the Git repo is of no help in reconstructing them.
Not an official one, as far as I'm aware. Maybe the Wikipedia article is the best resource for that right now: https://en.wikipedia.org/wiki/POSIX#Versions (but doesn't seem to have info on POSIX.1-2024 yet)
i guess the parts of the ecosystem low enough to care about things like POSIX compliance are mostly attached to some foundation or other, so maybe those foundations will purchase copies for their core maintainers? but that's a pretty counter-intuitive thing. i wonder if there are large closed-source POSIX implementors out there that this is aimed at, but are there really enough closed-source implementations out there for any of them to care about compatibility with eachother?
if command -v command >/dev/null 2>&1
then command -v local >/dev/null 2>&1 || alias local=typeset
fi
eval "__fn=;__fn(){ local __fn=leak;};__fn || :;"
if test -n "$__fn"
then echo local leaks here!
fi
It will not crash on posh because I'm being tricky with eval and command. posh supports local variable scope.
The only shell partially missing local support is ksh, but it has a gotcha. It works if the function is declared with the `function` keyword. All you have to do is use ksh's own tools to redeclare all functions automatically:
__list=$(typeset +f)
IFS="$__eol" # __eol should have a line break
for __decl in $__list
do
__name=${__decl%" #"*}
__name=${__name%"()"}
__body="$(typeset -f "$__name" || :)"
eval "function $__name ${__body#"$__decl"}"
done
IFS=" "
Of course, for this to work, all functions must be loaded before running and any declared after the fix will not be local, which is a good idea anyway. I always put the alias polyfill on the header of my library and the eval/for polyfill just before invoking my main function.
There is still an inconsistency with default local values. To get the same behavior everywhere, always initialize local variables
THIS IS FINE:
local foo=;
local foo=bar;
THIS CAN INHERIT WEIRD STUFF:
local foo;
Done, you have portable bourne sh scope everywhere imaginable.
Those aren't part of Wasm. WebRTC and WebSockets both predate Wasm by 6 years, and neither require Wasm to work. postMessage is part of Web Workers, also separate frow Wasm.
I'll ask again, what would it mean for POSIX to "support" Wasm?
Your question fundamentally makes no sense; POSIX is not a protocol, it is an operating system interface definition. It describes the OS foundations which you can then build websockets, IRC, a web browser, a calculator, compilers, local login prompts, {python,ruby,...) interpreters, etc. on top of. Your question is analog to "why can't {Windows,Android} work on the web?"
As a specific example, the seemingly simple matter of when the shell decides to split a string based on $IFS and when it does not were quite confusing to me until I went through the specification here: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...
For example, if
then will split the value into two fields (thus two arguments to ls). Of course we should surround $a with double-quotes to avoid the field splitting. However the following is fine: No field splitting occurs here. However, to be kind to your code reviewer, you might want to double-quote this anyway for the sake of simplicity and consistency. Behaviour like this is specified in sections "Field Splitting" and "Case Conditional Construct" of the aforementioned link. Specification documents like this were formative in in my journey toward learning to write shell scripts confidently.