Hacker News new | past | comments | ask | show | jobs | submit login
Posix.1-2024 is published (ieee.org)
173 points by phoebos 3 months ago | hide | past | favorite | 84 comments



I am hoping this appears at https://pubs.opengroup.org/onlinepubs/9699919799/ soon. This is the link I use most often to go through the specification. In fact, I owe a lot of my shell scripting skills to this online resource.

As a specific example, the seemingly simple matter of when the shell decides to split a string based on $IFS and when it does not were quite confusing to me until I went through the specification here: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...

For example, if

  a="foo bar"
then

  ls $a
will split the value into two fields (thus two arguments to ls). Of course we should surround $a with double-quotes to avoid the field splitting. However the following is fine:

  case $a in
No field splitting occurs here. However, to be kind to your code reviewer, you might want to double-quote this anyway for the sake of simplicity and consistency. Behaviour like this is specified in sections "Field Splitting" and "Case Conditional Construct" of the aforementioned link. Specification documents like this were formative in in my journey toward learning to write shell scripts confidently.


The note on the front page of the Austin Groups' website says as much with regards to publication:

> June 14, 2024: IEEE Std 1003.1-2024 has been published by IEEE. The Open Group Base Specifications, Issue 8 has been published by The Open Group. At this stage only PDF is available. The HTML edition to follow soon.

https://www.opengroup.org/austin/


I make a habit to always quote strings with "${a?}". That way a typo variable won't blindly go ahead and do the wrong thing.


I can recommend starting every bash script with

    set -euxo pipefail
The "u" has basically the same effect as the question mark, but for every variable usage.


I do the same, except for the "x" (that is, "set -euo pipefail"); depending on what you're doing, "set -x" might be helpful, might be too much noise, or it could even break things which were not expecting the extra output (and in the worst case, it might end up echoing secret tokens into your build logs).


I find it to be a good default when writing a script, but yes, it can get noisy and potentially leak stuff, if that is what your script deals with. That hasn't been a concern in the settings I usually use it, though.

I am not sure how it could break anything though, unless you are parsing stderr of your script in a subsequent step, which would seem unusual anyway.


set -o pipefail is now POSIX compliant. Use it in all POSIX shell scripts, not just bash scripts.


Oh wow, I didn't know that. I am pretty sure I had a sh trip up on it just recently, so I thought it still was bash only. Yes, use it everywhere possible.


Emphasis on now - it's new in this version, so probably expect a little bit of delay before it's actually available everywhere. But yes, excellent addition and I'm happy to have it available more broadly now.


Oh, it seems set -u was fixed with regards to arrays in 2011 or so: https://stackoverflow.com/questions/7577052/unbound-variable...

Maybe I can start using it again. (I think I noticed that issue while I was at Google, and they used an older version of Bash.)


I prefer set -Eeuxo pipefail, so my ERR trap is inherited in functions.


I think many people are even more surprised by this:

   x=$a  # not split!  It means the same thing as x="$a"
They were taught that you have to quote everything, which is a reasonable rule to follow, but it's not true.

---

I never wrote about this on the Oils blog (https://www.oilshell.org/ ), but the post would be titled:

Shell Has Context Sensitive Evaluation

Basically the two contexts you should think of are:

(1) EVAL WORD SEQUENCE

This occurs in 2 places in POSIX shell:

   ls $x$y   # simple command is a sequence of words
 
   for i in $x$y; do echo $i; done  # for loop
And 1 place in bash:

   a=( $x$y )  # array literal
In these cases, the shell "wants" a sequence of strings, not a single one. So it does splitting.

---

(2) EVAL WORD TO STRING

But there are many other contexts where the shell does not "want" a sequence of strings.

It wants a SINGLE string. And conversely, it actually JOINS arrays of strings, rather than splitting.

Usually "$@" is an array / sequence of strings, while $@ or $* is a string, roughly speaking.

But the shell doesn't want sequences of strings in MANY cases, e.g.

    a=$@  # I only want 1 string here, so I JOIN rather than splitting

    echo hi > "$@"   # redirect arg (not all shells agree though!)

    case "$@" in ... esac  # as you point out

So the bottom line is that variables aren't really strings OR arrays of strings. Whatever the shell wants, it converts it to.

And shells also DISAGREE on the specifics of those rules. POSIX shell has the array "$@", but arrays in general are not in POSIX.

---

And even worse, think about this case:

   local x=$a
Does it behave like an assignment, which wants a single string?

Or does it behave like a simple command, which wants a sequence?

You can look at it both ways. The bottom line is that assignment builtins are special and they don't follow the normal rules of simple commands. Shells have differed, but POSIX decided on this awhile ago.

---

This is all of course mind numbing trivia that has no real reason for existing ... YSH fixes it, and it's now pure native C++, no more Python.

YSH Doesn't Require Quoting Everywhere - https://www.oilshell.org/blog/2021/04/simple-word-eval.html (Oil was renamed to YSH since this blog post was written)

Simple Word Evaluation in Unix Shell - https://www.oilshell.org/release/latest/doc/simple-word-eval...

In YSH you can tell just by looking it's a single string or an array.

    ls $a  # identical to ls "$a"

    ls @myarray  # splice an array
It never "molests" your variables. There's no auto-conversion, and you can upgrade to those rules with

    shopt --set ysh:upgrade


Sometimes when I'm being lazy, say for example I have a variable with some folder/file absolute path... I'll convert the slashes to spaces, and word split there, sending the results into the stack.

    $ readarray -d '/' <<<${PWD:1:-1}
    $ echo ${MAPFILE[@]}
and you get a nice list of folders you can push/pop as you wish...


That's also one of the more notable incompatibilities of zsh - by default it treats $a the same as "$a", and you're supposed to use arrays if you want multiple words. Although I'm not an expert - maybe ysh and zsh differ in the details here?


Yup, in terms of ls $a -- YSH happens to be like zsh. OSH is compatible with bash and does the POSIX word splitting, but YSH is not.

In general YSH is pretty different than zsh though -- it's more of a Python- JS-like language with structured data, e.g.

    ysh$ var a = ['list', 'of' strings']
    ysh$ write -- @a
    list
    of
    strings
In zsh I still think that's the pretty obscure "${a[@]}" rather than @a.

Arrays are also "flat" in zsh -- you can't have an array of arrays, because there's no garbage collector. But YSH has arbitrarily nested JSON-like data structures, and JSON serialization built in.

I need to put some code examples on the home page, but for now - https://www.oilshell.org/release/latest/doc/ysh-tour.html


You can use $=a to get word splitting in zsh.

While incompatible, the zsh behaviour makes a lot more sense.

You can use "setopt sh_word_split" to get the POSIX behaviour.

Or split explicitly with ${(s: :)a}, ${(s:SPLIT-ON-THIS:)a}, etc. (not compatible with anything but zsh).


Just always quote. No need to remember different behavior in different situations.


This is old hat though; I think it was already specified that way in 1990 POSIX.


> However the following is fine: > > case $a in > > No field splitting occurs here

This kind of bullshit is how I made a career rewriting people's buggy shell scripts in Python


Wow, no kidding. I’ve been writing little shell scripts to do random things for literally decades and this is the first time I heard about it.

I also rewrite my stuff in Python as soon as it becomes nontrivial.


Some goodies for POSIX sh programmers:

* readlink/realpath (https://austingroupbugs.net/view.php?id=1457)

* find -print0, xargs -0 and read -d (https://austingroupbugs.net/view.php?id=243)

* find -iname (https://austingroupbugs.net/view.php?id=1031

* sed -E (https://austingroupbugs.net/view.php?id=528)

* set -o pipefail (https://austingroupbugs.net/view.php?id=789)


I agree! You can blame me for some of those proposals :-), my thanks to the POSIX team for getting this out the door.

The sed -E option makes it easy to portably use extended regular expressions.

The find -print0, xargs -0, and read -d provide portable ways to securely process lists of files. They were already widely implemented, but now they're officially part of the spec and can be counted on being present in many other places.


Thanks, those will indeed be useful. Looks like `pipefail` is already in Dash https://salsa.debian.org/debian/dash/-/blame/debian/unstable...


wow, I just realized you're the same dwheeler from the work on diverse double-compilation!

Thanks for the improvements on POSIX, I've read many issues and discussions raised by you in the past couple of years.

If fact, I think it was one of yoir comments on make(1)'s dynamic dependency graph that reassured me I had a correct grasp on its execution model!


Thanks so much, that means a lot.


Also c17 -G to create shared objects, SIGWINCH and tcgetwinsize() to query the size of a terminal window, lots of new shell features, make is now somewhat useful, gettext() and associated commands are in, asprintf(), C17 support, strlcpy, strlcat, and many more all new and exciting features.


c17 -G to create shared objects

Finally! It always seemed very strange to me that posix said that shared objects were a thing and provided a rtld API for using them, but never specified how to create them.


asprintf is a pretty cool one. https://frippery.org/make/2024.html details some of the make changes.


asprintf is a nice toy, but it should really take a context and a "realloc" function pointer to be useful, in general. Here's hoping for 2040!


How many other posix functions take allocators?


I'm most excited about getentropy().



But `find -iname` always worked; why was it proposed again?


???

It wasn't standardized before, so it didn't "always work" on any implementation.


I for one am overjoyed that i can now type readlink instead of invoking a shell script. I know newbies will also be overjoyed now that they can just read a LLONG_MAX page manual containing the solution to all their problems


Where is POSIX actually useful today?

Is it mostly for shell scripts? Aren't people targetting bash or basic bourne shell features intead of posix? Is shellcheck checking for best practices instead of POSIX compliance?

And for other applications (GUI, servers, etc) strict POSIX compliance might be too restrictive?

And with many things being Linux (or Linux-like like WSL) the need for this might be less?

Are Android and/or iOS fully POSIX compliant?

Any good blog or presentation describing the current state of POSIX?


It's as good as any other standard: as a baseline of agreed-upon "correct" behavior for interoperability. Anything you add from there is gravy. It's more for implementers than users, e.g. for someone writing a new shell rather than writing shell scripts. Having the standard handy is also pretty useful when writing foreign interfaces, like the posix modules in python and perl.

As for the current state of POSIX, well, you're looking at it. Might find a blog or two of someone on the POSIX committees, but the organizations aren't the kind that keep blogs. Probably best to just dive into the Wikipedia article on POSIX and start following the references on the bottom. You'll probably want to look into SUS, the Single Unix Specification, as well: it's identical to POSIX (plus curses for some reason) but it's the label that OS vendors may use rather than POSIX. macOS and some Linux distributions claim to be fully SUS-compliant; Linux as a whole does not, because its official scope is limited to the kernel which only implements a subset of POSIX.

Fun fact: the name "POSIX" was coined by Richard Stallman.


The original discussion on why Debian switched from bash to dash for /bin/sh is insightful.

https://lwn.net/Articles/343924/

One big factor is for performance reasons in shell scripts. At the time, the switch decreased boot times for Debian by 7.5%. Bourne shell features add a lot of overhead and that's not always an acceptable tradeoff.

Also, if you're using bash features in a script, you can always just add #!/bin/bash to the top of your file instead of #!/bin/sh to force a bash compatible shell.


> One big factor is for performance reasons in shell scripts. At the time, the switch decreased boot times for Debian by 7.5%.

And that should no longer be a relevant factor, since most of the boot process is now implemented directly in C (within systemd), instead of a bunch of shell scripts.


Prefer `#!/usr/bin/env bash` instead, since /bin/bash isn't a standardized location for bash (even across Linux distros). The former causes $PATH to be searched for bash.


This is pretty common advice but I think it is fighting the previous war. This idea is useful for virtualenv-type tricks if you want to ensure use of your personal version of the interpreter on a shared system, but you have to boil an ocean of scripts. You don't know if you caught them all. Docker won instead - a quick filesystem namespace comprehensively catches everything. Just use #!/bin/bash.

EDIT: I was thinking about Linux, but I suppose macOS users are stuck with needing this for Homebrew-supplied bash?


#!/bin/bash won't work on, say, nixos, and as you noted, many non-linux platforms. It's a few more characters to do something (more) portable! You're right that docker (or something like nix flakes, which are lighter-weight/easier to introspect imo) are probably a better solution in the long run, though.


Is /usr/bin/env a standardized location?


I don't know, but it's been more reliable for me than /bin/bash. I think env is part of POSIX.


No. No new locations added in POSIX.1-2024.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...

POSIX omits from standardisation what is not necessary for an application to work, so that a wide range of systems can be supported. As for Shebang, it can be rewritten in the installation script and is therefore considered an area of system administration.


> One big factor is for performance reasons in shell scripts. At the time, the switch decreased boot times for Debian by 7.5%. Bourne shell features add a lot of overhead and that's not always an acceptable tradeoff.

I seem to recall it was much smaller than that, something like 4% on a 2008 EEE-PC or something like that, but I can't find any numbers on that right now.

The Debian startup scripts were already POSIX; it's not hard to get better performance out of zsh or bash by avoiding expensive processes lookups.

Overall, I consider this to be mostly a myth, or at least extremely simplistic.


> Aren't people targetting bash or basic bourne shell features intead of posix?

I know many banks still have AIX systems with shells like ksh89, ksh93, etc. as the default shell. So if a shell script is written to work with a POSIX shell (instead of a particular shell), it has a better chance of running on such systems.

Also, on Debian, the default non-interactive shell is dash [1]. This is the Debian Almquist Shell (dash). It is a POSIX-compliant shell derived from ash. So again, if we write system scripts for Debian and want it to run on Debian without any hassle, it makes sense to write the system scripts to conform to POSIX shell. Although shellcheck cannot perform full POSIX compliance check at this time, it is still a pretty good tool that can help with checking compliance with dash in particular.

[1]: https://packages.debian.org/stable/dash


> So again, if we write system scripts for Debian and want it to run on Debian without any hassle, it makes sense to write the scripts to conform to POSIX shell.

Or explicitly use bash in your shebang.

One of the problems with Bash is that it insists on doing bash-y things even when you tell it to act like sh.

People ask why you should write (or at least test) code to be multi-platform (even the basics of running it on BSD or macOS): it's because it forces you to be honest. Things change and initial assumptions may not be the same forever.

* https://wiki.debian.org/Shell

* https://archlinux.org/packages/?name=checkbashisms

* https://wiki.ubuntu.com/DashAsBinSh


> One of the problems with Bash is that it insists on doing bash-y things even when you tell it to act like sh.

Its behaviour is a common behaviour of all sh, not just Bash.


Bash has `Priority: required` and is marked as "essential", it's available on every Debian system.


Indeed, Bash is always available on Debian! After all, Debian uses Bash as the default interactive shell. But that's not the point of writing system scripts for dash. They are written for dash because that's still the default non-interactive shell. And it is so because dash is leaner and faster. Quoting from the link I posted in my previous comment:

> Since it executes scripts faster than bash, and has fewer library dependencies (making it more robust against software or hardware failures), it is used as the default system shell on Debian systems.


> Where is POSIX actually useful today?

Defining a stable API to code against?

> And with many things being Linux (or Linux-like like WSL) the need for this might be less?

Define "being Linux". RHEL? Ubuntu? Other? Is /bin/sh linked to Bash or something?

* https://mywiki.wooledge.org/Bashism

* https://linux.die.net/man/1/checkbashisms

> Are Android and/or iOS fully POSIX compliant?

UNIX® Certified Products include macOS:

* https://www.opengroup.org/openbrand/register/

POSIX:

* https://posix.opengroup.org/register.html


My view is that shell scripting feels like the wild-west, so I try to conform to POSIX to maintain some level of sanity, though it feels restrictive at times. I rely on ShellCheck to help me write shell scripts that are POSIX compliant.


I’ve wondered the same. The most common standard I’ve seen in everyday work for a long time now is “runs on my Mac and the Linux server we’re deploying to”.

I’m not talking about shops that ship software that customers receive and install on prem on their HPUX or whatever. That’s still a thing and people have to take that into account. I’m grateful I’m no longer among them.


> Where is POSIX actually useful today?

Every now and again, I get annoyed by those cases where Linux has some API/utility/etc but macOS doesn't. Getting that API/utility into POSIX greatly increases the odds that Apple will end up implementing it. (Whether just by copying it from FreeBSD, or by writing it themselves.)


Embedded shell scripts typically end up targeting the common denominator of bash/BusyBox/coreutils but the POSIX standards are a pretty good reference too. People often don't realize this, but standards are useful even if they aren't implemented 100% fully and correctly.


I’m probably going to be downvoted for this, but anyway: the single unix specification has always been the only place where i could find C headers documentation in by header file.

Meaning: i don’t want to know all the 6000 functions glibc support, i want to know what will (for example) net/inet.h will bring to my code (ideally with documentation).

For sonme reason that doesn’t seem to be a thing. Not for glibc for sure. But the SUS does that. And i like it.


I wonder how long till https://pubs.opengroup.org/onlinepubs/9699919799/ has the new revision.


Is there a changelog for these standards?


I keep getting surprised at how little the tech industry as a whole seems to care about documentation. Way too many authors just upload the new PDF to some website - of course overwriting the old one.

As an implementer I'm often more interested in the exact changes than in the current wording. My product is already supporting the old spec, what do I need to change to support the new one? A redlined version is more valuable than the full PDF. Bonus points if it actually comes with the reasoning behind it so I don't have to guess why some seemingly-arbitrary change was made.

My dream documentation is a simple Markdown file (or similar) stored in a git repository. It allows me to see the current version, the old version, the diff, and the commit messages can even store the reasoning.


Though with a sufficiently-gnarly release history, even the Git repo might not tell the full story. I've recently been tracing the history of one particular library that's been around since the early 2000s, and hardly any of the versioned Git tags correspond exactly to the files in the released tarballs. Finding all the releases was quite tedious: the tarballs were published on multiple websites, some tarballs were updated in place (without changing the version number), one website (which held some versions exclusive to it) routinely deleted very old versions, and that website also no longer exists outside the Internet Archive. Overall, some of the releases have been totally lost to time, and the Git repo is of no help in reconstructing them.


POSIX folks "have no plans to move to a public git repository for managing the development of the standard."

https://lore.kernel.org/linux-man/04801FEA-3560-4BA5-93EF-76...


Not an official one, as far as I'm aware. Maybe the Wikipedia article is the best resource for that right now: https://en.wikipedia.org/wiki/POSIX#Versions (but doesn't seem to have info on POSIX.1-2024 yet)


The PDF is behind a login wall :(


Leak when?


dead on arrival as far as i'm concerned.

i guess the parts of the ecosystem low enough to care about things like POSIX compliance are mostly attached to some foundation or other, so maybe those foundations will purchase copies for their core maintainers? but that's a pretty counter-intuitive thing. i wonder if there are large closed-source POSIX implementors out there that this is aimed at, but are there really enough closed-source implementations out there for any of them to care about compatibility with eachother?


Did we get `local`?


Sadly no. That's one of the few things that can't be done with, but semantics can subtly vary between implementation. Which is why I do a runtime check for them: https://git.sr.ht/~q3cpma/scripts/tree/b4b3c62f6a77828d0c445...


Just use:

        if command -v command >/dev/null 2>&1
        then command -v local >/dev/null 2>&1 || alias local=typeset
        fi

        eval "__fn=;__fn(){ local __fn=leak;};__fn || :;"
        if test -n "$__fn"
        then echo local leaks here!
        fi
It will not crash on posh because I'm being tricky with eval and command. posh supports local variable scope.

The only shell partially missing local support is ksh, but it has a gotcha. It works if the function is declared with the `function` keyword. All you have to do is use ksh's own tools to redeclare all functions automatically:

        __list=$(typeset +f)

        IFS="$__eol" # __eol should have a line break
        for __decl in $__list
        do
            __name=${__decl%" #"*}
            __name=${__name%"()"}
            __body="$(typeset -f "$__name" || :)"
            eval "function $__name ${__body#"$__decl"}"
        done
        IFS=" "

Of course, for this to work, all functions must be loaded before running and any declared after the fix will not be local, which is a good idea anyway. I always put the alias polyfill on the header of my library and the eval/for polyfill just before invoking my main function.

There is still an inconsistency with default local values. To get the same behavior everywhere, always initialize local variables

THIS IS FINE:

        local foo=;
        local foo=bar;
THIS CAN INHERIT WEIRD STUFF:

        local foo;
Done, you have portable bourne sh scope everywhere imaginable.



Wonder how long before it appears in https://pubs.opengroup.org/onlinepubs/


Still no arrays in the standard? GTFO.


So, what's new?


wasm support?


What would it even mean for an interface description to "support" WASM which is a (virtual) machine target for implementations?


webrtc, ws, postmessage?


Those aren't part of Wasm. WebRTC and WebSockets both predate Wasm by 6 years, and neither require Wasm to work. postMessage is part of Web Workers, also separate frow Wasm.

I'll ask again, what would it mean for POSIX to "support" Wasm?


A standard document defining how it could work on the web. Wasm is literally just Javascript.


ircv3 did it


With all due respect, you don't seem to understand what POSIX is.


I appreciate your respect. Can you contrast POSIX with IRC and explain why IRC can work on the web but POSIX can't?

https://ircv3.net/specs/extensions/websocket


Your question fundamentally makes no sense; POSIX is not a protocol, it is an operating system interface definition. It describes the OS foundations which you can then build websockets, IRC, a web browser, a calculator, compilers, local login prompts, {python,ruby,...) interpreters, etc. on top of. Your question is analog to "why can't {Windows,Android} work on the web?"


You are probably looking for something like WASI preview1 or WASIX:

https://wasix.org/docs


That is probable. I am not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: