The shell and its crappy handling of whitespace

jprete · on July 31, 2023

I think whitespace handling is a problem, but not the only one. Shell data structures are awful and confusing, so best avoided. Error handling is also subpar, requiring boilerplate for every reasonable error situation. And there’s a constant need to be careful about stdout va stderr and how exactly a function returns data.

I find moving to even Python to be an inadequate answer, because the shell does two crucial things very very well - it manipulates the filesystem and runs commands in their most native format. And even Python is very cumbersome at those two tasks.

But sometimes you need good data structures/execution flow control, and good filesystem/command control, at the same time.

Intuitive, consistent, predictable whitespace handling would fix a lot of shell scripting problems, though.

(I haven’t given Powershell a serious shot, maybe I should.)

ori_b · on July 31, 2023

This is the #1 reason I enjoy using the plan 9 rc shell[1] for scripts. There's exactly one place where word splitting happens: at the point where command output is evaluated. And there, it's trivial to use any character you want:

    x = `{echo hi there: $user}  # evalutates to list ('hi' 'there:' 'ori')
    y = `:{echo hi there: $user} # evalutates to list ('hi there' ' ori')

There's no other word splitting, so:

    args = ('a' 'b c' 'd e f')
    echo $#args
    echo $args(3)

will print:

    3
    d e f

The shell itself is pleasantly simple; there's not much to learn[2]. And while it's not fun for interactive use on unix because it offloads too much of the interactive pleasantness to the plan 9 window system (rio), it's still great for scripting.

[1] http://shithub.us/cinap_lenrek/rc/HEAD/info.html

[2] http://man.9front.org/1/rc

pjmlp · on Aug 1, 2023

> I find moving to even Python to be an inadequate answer, because the shell does two crucial things very very well - it manipulates the filesystem and runs commands in their most native format. And even Python is very cumbersome at those two tasks.

Thankfully Perl exists.

thiht · on July 31, 2023

Perl is better than Python at being an improved shell.

In Perl, you can use backticks to execute commands for example

paradox460 · on July 31, 2023

Ruby too. And there's the FileUtils lib in ruby that mimics common Unix command like mv, cp, ls, mkdir

cyclotron3k · on July 31, 2023

I find myself using `Pathname` a lot more often than I use `FileUtils`

paradox460 · on July 31, 2023

Same for me with Dir vs FileUtils.ls

The globbing support dir has is wonderful

WorldMaker · on Aug 1, 2023

Powershell is worth a deeper look if you have time. It can be odd "at first glance", especially trying to do things you know from other shells, but it does have a lot more data structures available for use in the shell, a different mostly predictable approach to error handling (including try { } catch { } just like a "real" language), a standardized arguments parsing model for its own commands and cmdlets (though you'll still likely be using lots of DOS or Unix commands with their own bespoke argument parsing, as usual), and now that Powershell is open source and cross-platform it is more useful than ever.

sixbrx · on Aug 2, 2023

Powershell is good, but one downside is that you have to explicitly check the exit code of each external program called and stop the script yourself for non-zero exit codes if that's what you want to do. Ie. there's no "set -e".

You can make a helper function or commandlet of course, but it obfuscates the code and is a pain to distribute around always where needed. They argue that non-zero exit codes for errors in external programs is a convention that doesn't exist consistently where Powershell is used. OK, but all the programs I use do follow that convention and so I find that very painful.

WorldMaker · on Aug 2, 2023

Yeah, DOS/Windows applications have used for some long number of years non-zero return codes for a lot more complicated things than just "error" but also "type of success". The documentation on Robocopy is an interesting example.

PowerShell's default handling for native commands is treating anything written to stderr as signaling an error, which is somewhat more reliable for old DOS/Windows commands (but not necessarily reliable for some Unix commands that use stderr as a secondary logger for various reasons including because they expect stdout to pipe redirected more often than not). Though you need to set $ErrorActionPreference [0] to something other than "Continue" to notice this behavior.

However, since Powershell 7.3 there is also a setting you can temporarily change to treat non-zero returns as errors: $PSNativeCommandUseErrorActionPreference [1]. Note that as its name suggests you need to set $ErrorActionPreference to something useful for it to work as you intend.

[0] https://learn.microsoft.com/en-us/powershell/module/microsof...

[1] https://learn.microsoft.com/en-us/powershell/module/microsof...

massysett · on Aug 1, 2023

Take a look at Oil Shell?

https://www.oilshell.org/

pasc1878 · on July 31, 2023

Given your comments you need to try xonsh - this is a python shell.

You can call pure python plus it has the simple running of command like other shells.

diarrhea · on July 31, 2023

Perhaps I need to give that a serious try at last. I don’t dare touch a shell other than bash or zsh because you’ll have to come back to those anyway, for example for debugging containers. Then another shell is another thing to learn, configure but most importantly go wrong.

I currently use zsh and quickly drop into IPython when needed.

pram · on Aug 1, 2023

I use declare in a lot of my bash scripts for associative arrays and some other stuff. It can make scripts easier to read/reason about IMO. Something useful to learn if you’ve never heard of it.

https://linuxhint.com/bash_declare_command/

svilen_dobrev · on Aug 1, 2023

like, the || && operators having same precedence.. and other inconsistencies, i.e. how to pass "a*" down as parameter..

btw csh/tcsh "" quoting rules are atrocious, avoid it..

PaulDavisThe1st · on July 31, 2023

> I think what bugs me most about this problem in the shell is that it's so uncharacteristic if the Bell Labs people to have made such an unforced error. They got so many things right, why not this?

All it takes to understand it is that it took a while for anyone to consider that spaces in filenames was (or might be) a thing.

I don't know that this is true, but having started on Unix about the same time as the author of TFA, I know that I found it quite disorienting when I first started interacting with Windows/macOS users who regularly used spaces in their filenames and thought nothing of it.

I suspect this wasn't an "error" as much as just a failure to grok the pretty basic idea that "bite me.jpg" is an entirely valid filename, and maybe more useful than "biteme.jpg" or "bite_me.jpg"

Brian_K_White · on July 31, 2023

and steve's & edy's taxes.xls and 10" deck.doc and Surprise!.mov and... all the other shell active characters which are otherwise just normal characters that normal people will want to use.

If it's on the keyboard, they expect to be able to use it.

teo_zero · on Aug 1, 2023

I disagree. There's also a Return key, but you don't want CR in file names. A file name is not a document's title.

yetihehe · on Aug 1, 2023

> There's also a Return key, but you don't want CR in file names.

I don't want CR in file names, but someone somewhere will want it for some strange formatting reason (I've seen things).

> A file name is not a document's title.

You say that, other people have other ideas. When you let non-technical people use your too technical system, it will crash.

---

A software tester walks into a bar.

Runs into a bar.

Crawls into a bar.

Dances into a bar.

Flies into a bar.

Jumps into a bar.

And orders:

a beer.

2 beers.

0 beers.

99999999 beers.

a lizard in a beer glass.

-1 beer.

"qwertyuiop" beers.

Testing complete.

A real customer walks into the bar and asks where the bathroom is.

The bar goes up in flames.

Brian_K_White · on Aug 2, 2023

1, Return is not a glyph.

2, Never the less, people do actually expect to use even that.

And they're not even wrong. It's not their fault that it breaks things.

In Windows they get away with it beacause for most users there is no such thing as a command line which is actually at least 2 little programming languages combined with special syntax (the shell itself and readline, and maybe even the tty line discipline and stty settings count as yet a 3rd little dsl in there, maybe even the terminal emulation counts as yet another!).

All input, including entering filenames, even when you have to type them, goes through fully managed input widgets that allow almost everything, and they catch and reject the few things they must right on input.

I've written things that take user input and url-encodes it for the matching filename myself, so that the user can do whatever they want, yet the the filename remains convenient and safe, yet still directly human legible & meaningful unlike base64 or random hashes.

That interface with I guess no less than 4 different layers of special syntax which is a shell, is useful but it's really not something to expect anyone but a programmer or technician to have to deal with. All the cases where a "bad" filename breaks something are just examples of the application's internals leaking out and being exposed to the user, which was always wrong since that's also the basis of all security exploits.

It's like taking that urlencoded data I mentioned and dropping it directly into printf, segfault, and complaining that the user should know better than to include %d in a text input.

teo_zero · on Aug 4, 2023

I don't understand if you agree or disagree with me. When you say that you have written things to encode user input so that no "bad" characters (among which I assume you include CR) end up in the filename, you seem to agree that it's not enough for CR to be in the keyboard for it to be automatically allowed in filenames. Which was my point.

Gibbon1 · on July 31, 2023

Yeah the big mistake was thinking spaces in file names was a good idea. It's not it's a terrible awful idea.

Course not have a hard separation between paths and file names is also not good.

  ./folder_what/folder_x/folder_zee:file_name.whatever

The above would be better

xmcqdpt2 · on Aug 1, 2023

I mean, I don't see why I should work around legacy limitations of 1970s software today, whenever I use a computer. Otherwise we might as well decide that file names should have no capitalization and be eight characters long, or that paths should be limited to 127 characters, etc.

SAI_Peregrinus · on Aug 1, 2023

How would it be better? Lots of names for people & things include spaces, so file names related to those people or things naturally would also include spaces. John Smith's name is John Smith, not John_Smith or JohnSmith or johnsmith or any other mangling.

Gibbon1 · on Aug 1, 2023

Yeah and still you shouldn't break your encoding in such a gross way just because of that.

Also please for the love of god do not use peoples names as file names because they are not unique and they change oftener than you think.

Clamchop · on Aug 1, 2023

Not every usage of files is going to be at a scale where we have to worry about collisions, and anyway it's tangential.

Whatever the concern, the ship has long since sailed on spaces in filenames, it's not going to be undone, and we are obligated to handle them.

ElectricalUnion · on Aug 1, 2023

> I know that I found it quite disorienting when I first started interacting with Windows/macOS users who regularly used spaces in their filenames and thought nothing of it.

At least most Windows users are "trained" by the Windows that asterisk, question mark, vertical bar, double quote, prefix and suffix spaces "aren't valid characters for files" (in a weird way, it's a Windows limitation, not a NTFS one). I expect only the worst (case insensitive stuff) naming schemes when the files comes from a macOS user.

hn92726819 · on Aug 1, 2023

The author claims to have 35 years of shell usage (that, I believe), but these are the arguments he uses? I'll summarize for anyone who doesn't want to waste their time: "Quote your sh variables and expansions". That's the first thing I learned and the first thing I teach about shell. Using shell wrong and then complaining about it supposedly not working is a weak argument.

Let's see what he says in this article:

- "$* is literally useless": no, it's used to covert arguments to a string separated by the first char in IFS. Useful pretty much only for logging, but sometimes joining strings. $@ is POSIX and should be used in all other cases. A niche feature that generally shouldn't be used isn't an issue in shell

- $* and $@ are the same except when $@ is quoted: true, except in more modern shells like bash, where $@ is an array, not a specially-treated variable. I don't know who told him to use $* for 35 years, but the author should be mad at them instead

- To make it work properly you have to say `for i in ; do cp "$i" /tmp; done`: wrong. cp can't distinguish arguments and flags. You must either use `./` or `cp -t /tmp -- "$i"` (or some variation). It's correct in the glob/wordsplit sense, but saying this is proper is incorrect

- "And the shell doesn't have this behavior for any other sort of special character": He doesnt even mention why spaces are magic. Hint: they aren't. It's entirely controlled by IFS, which you can change. The default is space, tab, and newline. He also doesn't mention globbing, which can arguably be more disasterous.

An article about wordsplitting and he doesn't even mention it once? This is at best a rant

atoav · on Aug 1, 2023

I know all the quoting and escaping rules there are to know and still — considering that it is a shell's job to work with text the way it is designed is just a major pain.

I am grateful for all those who historically came up with the concepts and put in the work, but if anybody were to design a text interface to computers where things can be piped into other commands etc. today they could heavily improve the usability of the thing and safe thousands of collective hours wasted. And peobably like when trying to create a sucessor to email, nobody would use it.

hn92726819 · on Aug 2, 2023

The way I see it, variables are designed to be quoted. I think everyone assumes they aren't for some reason, which is is mistake. Just because you sometimes can get away with not quoting doesn't mean it isn't incorrect. I can think of exactly one situation where you don't want to quote a bash variable, yet people do it constantly. it's frustrating.

Thank god for shellcheck. It can yell at everyone for me

hn92726819 · on Aug 1, 2023

To be clear, I think shell has problems too. But this article is poorly written. I don't think it makes sense to incorrectly use a tool and then complain about how bad it is. And to qualify your article with 35 years of experience? This just reflects that the author didnt take time to learn shell for 35 years

Do yourself a favor and read Greg's entire wiki: https://mywiki.wooledge.org/BashFAQ and just learn how to use it properly, and then you can complain about how painful it is to learn or how easy it is to use incorrectly rather than how bad it is if you use it wrong.

vbernat · on July 31, 2023

Zsh does not split words by default, so you don't need to quote everything. This is the main reason I switch to Zsh instead of Bash when there I need a bit more than the base shell.

frizlab · on July 31, 2023

Ooooh I did not know that. I’m so used to quoting everything I won’t stop anyway, but it’s good to know.

pxeger1 · on July 31, 2023

I can't exactly remember but I think there are a few rare circumstances where you still have to quote things. Inside ${} maybe?

tuukkah · on Aug 1, 2023

According to this, it does not happen when you use variables (unless you explicitly ask for it), but it does happen on the output of the command substitution:

> A command enclosed in parentheses preceded by a dollar sign, like `$(...)', or quoted with grave accents, like ``...`', is replaced with its standard output, with any trailing newlines deleted. If the substitution is not enclosed in double quotes, the output is broken into words using the IFS parameter. https://www.csse.uwa.edu.au/programming/linux/zsh-doc/zsh_15...

vbernat · on July 31, 2023

I don't think there is an exception. Zsh still applies word splitting on process substitution (I suppose this is more natural here).

tuukkah · on Aug 1, 2023

The way you put it, it does sound like an exception. But what I read, it does not seem to apply on process (or any other) substitution but on the output of command substitution (see under your sibling comment).

mbwgh · on July 31, 2023

For filesystem operations like batch renames at least, I am usually happy with `vidir` (part of `moreutils`: https://joeyh.name/code/moreutils/).

`vidir [path]` will open an editor with the given directory as buffer contents. Editing and saving will translate to a sequence of `mv` invocations.

dspillett · on July 31, 2023

Allowing spaces in filenames where command invocations are pure text streams⁰ is the problem IMO, though one made several distinct times¹, and one that would be made again if all past occurrences were somehow removed from history, so suggesting we fix things that way is pointless as a way forward.

Requiring filenames to be quotes if they contain spaces, or optionally otherwise, would help – similar to CSV values with commas in. Though this opens up other issues: what about filenames containing quotes, and when nesting calls² the question of what part does the unpacking becomes a complex question without explicit structure. And to be honest we have enough trouble with clients sending us malformed CSV with unquoted commas that I can confidently state this wouldn't work either.

And you can't trust users to follow instructions, even internal users, where the limitation might be an issue. If you put a simple helper tool together and say “it won't work with spaces in filenames”, either it becomes a high priority ticket that the thing you jammed together as a favour doesn't work with spaces, or you get so sick or responding to problem reports that turn out to be due to the matter that you spend time fixing it up anyway. </rant>

--

[0] for systems that pass structured data, such as powershell, spaces are handled fine, though that would be a lot of work to retrofit into shells & utilities with a generally useful level of conformance

[1] like the convergent evolution of wings in pterodactyls/birds/bats or crab-like features all over the place

[2] for instance ssh to run a complex command line involving sudo

rakoo · on Aug 1, 2023

Note: this is an issue with sh/bash/fish/zsh, not shells in general. rc, from plan9 (http://doc.cat-v.org/plan_9/4th_edition/papers/rc), correctly has lists as a first-class data structure and none of the problems in the article happen; a list is a list, not a string split on whitespaces.

See https://drewdevault.com/2023/07/31/The-rc-shell-and-whitespa... to see the examples work out of the box as you woula naturally write them.

ElectricalUnion · on Aug 1, 2023

> Note: this is an issue with sh/bash/fish/zsh, not shells in general. rc (...) has lists as a first-class data structure and none of the problems in the article happen; a list is a list, not a string split on whitespaces.

Not really, bash, fish and zsh all have arrays as "first-class data structures"; it's only really a problem if you really want to limit yourself to POSIX shell syntax.

shortrounddev2 · on July 31, 2023

Powershell solves these problems by having structured data and types and not just streams of bytes

hnlmorg · on July 31, 2023

This isn’t related to pipes. This is related to how command lines and variables are tokenised and expanded. Zsh doesn’t have this issue, nor does most, if not all, other shells written in the last 10+ years.

Arnavion · on July 31, 2023

It's related to pipes in the sense that if you write a pipe like `gci | % { foo $_.Name }` or `gci | % Name | bar` you won't have to worry about quoting either, because the `.Name` is a single string object and is not word split. But overall, yes the fundamental reason is that PS also does not do word-splitting on string values.

hnlmorg · on July 31, 2023

It’s not related to pipes. I’ve written a shell that has typed pipes and variables handle whitespace correctly. Those two are distinctly separate concerns. It just so happens that many modern shells solve both those problems but it doesn’t mean they’re related.

Arnavion · on July 31, 2023

I recommend reading the comment you respond to before you respond to it.

hnlmorg · on July 31, 2023

The problem is both Windows and POSIX passes parameters as a string. So unless you’re never planning on interfacing with any external executables, you still need to convert that variable to a string. That’s where the concern is, not whether pipes are typed.

ElectricalUnion · on Aug 1, 2023

> because the `.Name` is a single string object

File path names are bytestrings on most OSes, so you have undefined behaviour if your pipeline can't handle that.

Arnavion · on Aug 2, 2023

That's irrelevant to the overall point regarding word-splitting. dotnet represents filenames as String, so there can be nothing in the pipeline that couldn't handle them. Files whose names can't be represented in String are a dotnet problem, not a PS problem.

ElectricalUnion · on Aug 3, 2023

I am pretty sure that if something can't be handled by a program because it's underlying language can't handle it still counts as "program can't handle the problem".

"just don't mangle my filenames" is a pretty basic requirement for a universal use shell.

thesuperbigfrog · on July 31, 2023

>> Powershell solves these problems by having structured data and types and not just streams of bytes

Powershell's structured data can cause issues if you expect it to be similar to Unix-like shells.

For example, Powershell mangles piped binary data by default:

https://brianreiter.org/2010/01/29/powershells-object-pipeli...

Correct handling of piped binary data requires additional flags:

https://stackoverflow.com/questions/54086430/how-to-pipe-bin...

Smaug123 · on July 31, 2023

Isn't this exactly what the parent was saying? "Stream of bytes" is not a meaningful thing that Powershell deals with by default. Using an operation like "pipe to file" on a text stream (like the Success output stream) will treat the stream as text, regardless of whether you managed to get raw binary data into the stream. This is kind of the whole point of Powershell: there is no such thing as raw binary data, and you have to work harder if you want to deal with something as bytes. It's certainly a culture shock coming from "everything is a stream of bytes" Unix-land.

thesuperbigfrog · on July 31, 2023

>> This is kind of the whole point of Powershell: there is no such thing as raw binary data, and you have to work harder if you want to deal with something as bytes. It's certainly a culture shock coming from "everything is a stream of bytes" Unix-land.

Exactly. It means that system administrators need to be aware of the differences when working with Powershell versus other shells to avoid silent corruption of binary data.

shortrounddev2 · on July 31, 2023

Yes, and? A Python developer needs to learn a different language if they become a java developer. A sysadmin needs to learn Windows if they're going to administer Windows.

thesuperbigfrog · on July 31, 2023

>> A sysadmin needs to learn Windows if they're going to administer Windows.

Powershell is cross-platform and runs on Linux and MacOS as well as Windows:

https://github.com/PowerShell/PowerShell

You can discover differences between Powershell and traditional shells as a Linux admin.

shortrounddev2 · on July 31, 2023

Ok, similarly, you'll need to learn powershell if your going to use it as a tool as a system admin

shortrounddev2 · on July 31, 2023

> if you expect it to be similar to Unix-like shells

And Python can cause issues if you expect it to be similar to Java. They're 2 different tools, so if you expect it to behave similarly to bash, that's on you

Brian_K_White · on July 31, 2023

Not if java tried to call itself javashell, but didn't act like a shell.

Clamchop · on Aug 1, 2023

Uh, why are sh and derivatives the final word on how a shell must behave? Or at least it appears that you are saying PowerShell doesn't "act like a shell".

Is this a No True Scotsman?

Brian_K_White · on Aug 1, 2023

Why bother have words that mean things then? Call anything you want anything you want. Sounds useful.

porridgeraisin · on Aug 2, 2023

Since when did the definition of a shell include "deals with everything as a unstructured stream of bytes"?

grrdotcloud · on July 31, 2023

Agreed but at what cost?

Get-Process | Out-Host -Paging | Format-List

Case sensitive hyphenated compound word commands?

shortrounddev2 · on July 31, 2023

At the cost of having to press tab to auto-complete a statement you've written 10 billion times before. God forbid a software engineer has to type something

> Case sensitive

they're not

Also your command is equivalent to:

ps | oh -paging | fl

Arnavion · on July 31, 2023

They're not case-sensitive.

grrdotcloud · on July 31, 2023

Well I'm sensitive!

Jaxan · on July 31, 2023

I don’t think it’s actually case sensitive. And there is autocomplete anyways.

jahsome · on July 31, 2023

Powershell is rather famously case insensitive.

It is idiomatic to capitalize liberally though.

WorldMaker · on Aug 1, 2023

Idiomatic in scripts (and documentation and training). REPL idioms are much more loosey-goosey.

That's actually something I have come to appreciate about Powershell is that idiomatic distinction concept that scripts are more likely to be read by other people (including Future You with no memory of having written the script in the first place) and to write "full sentences" for an audience, but feel free to be fast and loose in the REPL all you want.

jahsome · on Aug 1, 2023

Totally agree! I am a bash fanboi with a fish fetish, but recently started in a windows shop. I've actually grown quite fond of the flexibility powershell offers, and how usable it is without requiring esoteric knowledge.

DrBazza · on July 31, 2023

Corollary, if a developer, purposely root your ci system in a path with spaces in it to pick up these sorts of problems at build time.

Smaug123 · on July 31, 2023

I don't think I've ever had this kind of stupid error in a script that Shellcheck accepts, by the way, which is a more principled way to pick up the problems.

collinvandyck76 · on July 31, 2023

this is a great idea thank you

SAI_Peregrinus · on Aug 1, 2023

`Τĥιs ñåmè įß ą váĺîδ/ POSIX paτĥ/` is one I like using. If your scripts pass shellcheck, they'll handle those two directories without issue. Every byte (octet) other than 0x00 is valid in a path. Every byte other than 0x00 or / is valid in a file name. If you can't handle that, your script isn't compatible with all valid POSIX paths. That may be acceptable (e.g. you're generating all the paths in question), but isn't always.

duskwuff · on Aug 1, 2023

> Every byte (octet) other than 0x00 is valid in a path.

I'd counter that there are a couple of differing levels of validity:

1. Printable ASCII alphanumerics, period, hyphen, underscore. Guaranteed to be portable anywhere and completely safe.

2. Safe ASCII punctuation (space, ~, $, etc). Usable on virtually all systems, likely to cause weird behavior in poorly written shell scripts.

3. Unsafe ASCII punctuation (< > : " \ | ? *). Disallowed on Windows, also likely to cause weird behavior when allowed.

4. Invalid UTF-8 sequences. Disallowed on macOS, not representable at all on Windows (which uses UTF-16 for filenames!), and will cause havoc in some programming languages like Python which assume that the filesystem is Unicode.

5. ASCII control characters (0x01 - 0x1e). Also disallowed on Windows, and you'd have to be nuts to use them even if they are supported.

saurik · on Aug 1, 2023

I am pretty sure (but not 100% sure) that Windows uses UCS-2 for filenames, not UTF-16, and so it is worse than that, as you can have unpaired surrogates.

duskwuff · on Aug 1, 2023

You're entirely correct; I was oversimplifying the (fairly horrifying) situation.

xmcqdpt2 · on Aug 1, 2023

You forgot filenames like "CON" or "aux" or "NUL" which are disallowed on Windows entirely. Also having filenames that differ in capitalization only. And hard links, don't use those. And paths that are longer than whatever is the limit on NTFS (it's not very long).

IMO it's better to just avoid file systems altogether if you can and put everything in a database.

SAI_Peregrinus · on Aug 1, 2023

Windows doesn't have a POSIX shell. Windows has subsystems to provide one, but shell scripts aren't portable to Windows in its default configuration.

POSIX is not the same as "portable", certainly not "universally portable". POSIX shell scripts should be written to handle the capabilities of POSIX systems. Likewise scripts for POSIX-compliant shells, like Bash scripts. Non-POSIX shells may impose whatever restrictions they like, e.g. Windows disallows the file names CON, PRN, AUX, NUL, COM0 through COM9, and LPT0 through LPT9. Those are in your category 1 but are definitely not safe on Windows.

xmcqdpt2 · on Aug 1, 2023

You should throw in a few newlines in your path for good measure.

teo_zero · on Aug 1, 2023

I suggest you add a trailing - (hyphen) to detect scripts unable to treat filenames separately from options.

PaulHoule · on July 31, 2023

It is a reason to “just use Python” for scripting, or Java or C for that matter.

seadan83 · on July 31, 2023

I was once asked in an Amazon interview a famous problem of theirs, find all phone numbers on their website. This was a real world problem they ran into. The first engineer there went about it in java, took days and pages of code (and still various corner cases were issues, it was slow, etc..) Another came along, threw down a few lines of shell and was done.

There are some things shell does really well.

fbdab103 · on July 31, 2023

I am trying to be charitable to the Java implementation, but I am struggling to understand the complexity there. Java can iterate over files, it can iterate over strings, the rules to permissively identify phone numbers are not great, but both implementations would suffer from that complexity. Unless it was full-on FactoryFactoryPPIExtractor<PhoneNumber> style coding.

Arnavion · on July 31, 2023

"find all phone numbers on their website" sounds like it had to act like a web crawler, not parse a bunch of files, so perhaps the complexity was because of that.

seadan83 · on Aug 1, 2023

I like how y'all are thinking the complexity was in the web crawling.

It was actually just a simple act of crawling over source files! Amazon was looking to remove their customer support phone numbers from all web pages (to save money, they wanted people to stop calling them; back in the early 2010s)

I share the wonder that it took several days (perhaps that is the anecdote taking a life of its own, perhaps it was actually several hours. Or maybe it was was several days interspersed with on-call support, meetings, and other work running in parallel).

Though.. the file parsing libraries back in the early 2010s in Java are somewhat basic, probably using Java 5. Which does make me wonder, how long would that exercise take in Java 8 or later compared to Java 5?

qohen · on Aug 1, 2023

Steve Yegge discusses this in his "The Five Essential Phone-Screen Questions" blog post[0] from 2004 and states that his team-member (at Amazon) produced the list of phone-numbers in an hour:

Here are some facts for you to ponder:

1. Our Contact Reduction team really did have exactly this problem in 2003. This isn't a made-up example.

2. Someone on our team produced the list within an hour, and the list supported more than just the 2 formats above.

3. About 25% to 35% of all software development engineer candidates, independent of experience level, cannot solve this problem, even given the entire interview hour and lots of hints.

[0] https://sites.google.com/site/steveyegge2/five-essential-pho...

fbdab103 · on July 31, 2023

I was considering that, but web crawling via shell definitely crosses the line of things I would not want to attempt. Although, I guess you could outsource that complexity with `wget --mirror`, bringing you right back to file-crawling.

PaulHoule · on July 31, 2023

My take is this.

There is a very simple way to write a breadth-first webcrawler that works in stages and saves the frontier in an ordinary text file, this kind of webcrawler can be implemented in almost any programming language, particularly shell. That is, you write a script that loops over the urls in the frontier, fetches them, then writes new urls it find in the next frontier, you can use sort, uniq and such to manage the frontier.

That simple breadth first webcrawler has the big advantage that it does not get stuck in web traps (say a calendar that has a button to go the next year which could easily scan up to the year 9595, if man is still alive). Or rather, after a certain number of passes (maybe N=10-15] all the real URLs have been crawled and the remaining URLs are web traps.

In a language like Java it is tempting to write a much more complex crawl manager that has the potential to crawl faster with threads but takes more care to not crash target sites, get caught in web traps, etc. The first web crawler I wrote (1999) had a highly complex crawl control system, future web crawlers I wrote got simpler and simpler because the simple breadth first crawler works very well for small projects.

jks · on July 31, 2023

This reminds me of the story in which Don Knuth wrote a literate program in WEB (reportedly over 10 pages) to do some text processing and Doug McIlroy did the same thing in a six-line shell script:

http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...

cb321 · on July 31, 2023

Knuth was asked to demonstrate his new literate programming and probably thought an 8-page mini-paper was about the right "length" to do so:

  https://buttondown.email/hillelwayne/archive/donald-knuth-was-framed/

McIlroy was asked to show off productivity in the newly popular Unix system, and he did that.

While often framed as "a contest" the two approaches are really solving different enough problems as being apples-oranges. So, the only meaningful judgement of a contest is "who better achieved their narrow goals?"

I've found in discussions with people that they seem unable to separate their judgement of "a winner" from their own personal weight on the value of the two narrow goals ("detailed articulation mixing code & text" vs. "just getting shit done").

There is probably some broader lesson about people in that (and indeed probably a specific term in the science of psychology) - those who really like apples or really like oranges being inclined to force a non-contest into one of their favored "Eat oranges!" rhetorical moves.

EDIT: and it's actually even more subtle - the very same person can side with either depending on broader context or framing. There are those who would praise detailed documentation over write-only hacks requiring detailed memory of tool innards in one situation yet decry it as a waste of time in another. (All part of why survey results are not to be accepted naively.)

13of40 · on July 31, 2023

They have powershell on those unix computers nowadays. Just sayin'.

rcme · on July 31, 2023

I-Don’t-Want-All-My-Commands-To-Be-This-Long

shortrounddev2 · on July 31, 2023

There's aliases, and you can just hit tab to autocomplete the commands and their argument names. I find Powershell to be far more readable than bash because of its verbosity

rcme · on July 31, 2023

Aliases aren’t portable

vips7L · on Aug 1, 2023

I don’t think I’ve ever seen a portable posix shell script. They all rely on bash-isms and gnu binary behavior

shortrounddev2 · on July 31, 2023

Personally I don't write portable shell scripts. They exist only for my own usage on my desktop. If we rely on any kind of script for deployment purposes at work, we use the scripting lang in the build system or a python script

Const-me · on July 31, 2023

Interestingly, I actually do.

When I write bash scripts which will go to git and probably stay there for a while, I use long form of the switches, like --long-option-name instead of -lon. They make the script more readable when after a year someone wants to modify it.

diarrhea · on July 31, 2023

I use long options always, even going through the pain of typing them out when to tab completion is available. Exceptions are truly trivial things like ls.

It allows things like history | fzf to work very well.

mbwgh · on July 31, 2023

fzf comes with some bash completions (https://wiki.archlinux.org/title/fzf), including Ctrl+r for basically `history | fzf`.

dspillett · on July 31, 2023

It is nice for maintaining scripts though. While short names and options are typing-savers interactively (though sometimes problematical when a single character typo can radically change the meaning of a command in a way that means it is still a valid command so will be executed) in scripts I want long names so people after me (including future me) can understand without reference to man pages or similar. OK, the same could be achieved with copious comments, but then you have lots of pairs of bits of text to keep in sync.

13of40 · on July 31, 2023

Get-Alias

ape4 · on July 31, 2023

If only all UI applications replaced whitespace with underscores when making files. I know... that will never happen and won't help with existing files.

JadeNB · on July 31, 2023

No, please no! Let's first figure out what it means to work with whitespace properly, then update our tools to do it. Behind-the-scenes mangling to work around flawed tools seems to be the macOS favored solution, with .DS_Store files dropped silently everywhere, and "case-insensitive but case-respecting" by default file systems, and increasingly byzantine approaches to file management so that it's harder and harder to figure out where anything is actually stored, which defeats the whole point of a Unix underlayer ….

pasc1878 · on July 31, 2023

Well .DS_Stores are just used by Finder - but do show the limitations of Unix Filesystems. Where should the data they hold be stored?

The macOS API is really Foundation - a layer above POSIX - if you use that then the filesystem is not that bad. It gives another solution to meta data like .DS_Store which is bundles. Bundles are an object in the Foundation API which you manipulate as one - on the filesystem they are a directory. Think of a bundle as a zipfile that you can look at using POSIX tools.

The files being in odd places is for similar reasons to XDG"s attempt at cleaning up Unix. I think that the MacOS/NeXT structure predates XDG.

thesuperbigfrog · on July 31, 2023

>> Let's first figure out what it means to work with whitespace properly

What does it mean to work with whitespace properly?

When commands are entered at the command line and a shell is parsing, how do you distinguish between whitespace between commands, command parameters and switches, filenames that include whitespace characters, paths that include whitespace characters, etc. ?

It is a difficult question that does not have good answers.

arp242 · on July 31, 2023

Not expanding whitespace on $var would be a huge improvement; shells like rc and zsh have been doing that since the early 90s and it's a huge improvement, so hardly a new innovation. Bash refuses to implement this even as an option because ... reasons? Significant chunks of bash are still written in K&R C today, so it's not exactly cutting edge...

Autocompletion works well enough for doing things like "ls file with space": either just as "ls file<Tab>" or "ls 'file<Tab" (in zsh it picks up on a starting ' or " so you don't end up with loads of backslashes).

teo_zero · on Aug 1, 2023

On the other hand, not splitting words at spaces during variable expansion would make impossible things like:

  OPTS="-a -b -c"
  foo $OPTS

arp242 · on Aug 1, 2023

Use $=opts in zsh for explicit splitting. Better would to be use an array though: 'opts=(-a -b -c)' and then just 'foo $opts'.

I'm not sure about rc on this as I'm not too familiar with rc, but I'd be surprised if there wasn't a way to do this.

o11c · on Aug 1, 2023

Just use an array. POSIX sh is a useless target, all serious shells support arbitrary arrays.

dspillett · on July 31, 2023

> Let's first figure out what it means to work with whitespace properly

We know how. Using structure beyond a line of text. The problem with that is then your shell and all the command you run through it need to agree on that structure and implement it.

PowerShell works this way, though has a few of its own gotchas (mangling binary data if you aren't careful being one of them). A couple of shells for unix-a-like systems do to, but it falls apart when you start calling de-facto standard tools that are not retro-fitted to support accepting and outputting that structure.

Once you have decided on the “proper way” you'll find that was the easy part and getting that way implemented in enough places that it is useful to be a somewhat Sisyphean task.

IMO the only way to do spaces in filenames properly without an impractically radical change to all your tooling, is to consider spaces in filenames to be a bug. But good luck getting the world to accept that!

foundart · on July 31, 2023

> Let's first figure out what it means to work with whitespace properly

A problem with that: What is 'properly'? The current solution is one take on how to do it properly.

kergonath · on July 31, 2023

> "case-insensitive but case-respecting" by default file systems

That is actually the best policy. Case sensitivity is unnatural and weird; we don’t think by default of readme, Readme, and README as being 3 different things. But then, when the user chose a file’s name, it’s better if the file actually has that name, and keeps it consistently regardless of how the user interacts with the file system (hence the case-preserving). The case-preserving aspect has no effect on anything outside the kernel. For all intent and purposes, it is case insensitive for any userspace code.

ElectricalUnion · on Aug 1, 2023

> Case sensitivity is unnatural and weird;

Case sensitivity is naive and simple, case-insensitive systems are complex and require Case Mappings [1] and Normalization [2] and those are ugly, unless you want a half-assed system that is case-insensitive only when you only use ASCII.

[1] https://unicode-org.github.io/icu/userguide/transforms/casem...

> There are different case mappings for different locales. For instance, unlike English, the character Latin small letter ‘i’ in Turkish has an equivalent Latin capital letter ‘I’ with dot above ( \u0130 ‘İ’).

> In general:

> case mapping can change the number of code points and/or code units of a string,

> is language-sensitive (results may differ depending on language), and

> is context-sensitive (a character in the input string may map differently depending on surrounding characters).

[2] https://unicode-org.github.io/icu/userguide/transforms/norma...

SAI_Peregrinus · on Aug 1, 2023

I disagree. It's really weird that "I" and "i" are somehow the same character (but the former is used at the start of a sentence or when it's the entire word only), or that "מ" (U+05DE) and "ם" (U+05DD) are the same character but the latter is only used at the end of a word. They're visibly distinct! Only a minority of languages have multiple cases, most scripts are unicameral (having only one case).

kergonath · on Aug 1, 2023

I am not sure I follow. Are you advocating for doing away with capitals entirely?

SAI_Peregrinus · on Aug 6, 2023

No, I'm saying that case insensitivity is almost always a bad idea, especially as a default. It's useful (sometimes) for search, but even then not always what you want. Case often carries semantic information, it's not purely syntactic.

xmcqdpt2 · on Aug 1, 2023

i support this. capitals are basically useless anyway!

JadeNB · on July 31, 2023

> Case sensitivity is unnatural and weird; we don’t think by default of readme, Readme, and README as being 3 different things.

I believe you don't, but I certainly do.

kergonath · on Aug 1, 2023

I believe most normal people don’t. I am pretty sure that is the case, because the default behaviour of both Windows and macOS is to consider the three as the same name, it has been that way since the beginning, and nobody seems to have any problem with that. What would be weird to most people would be if the case changed for unknown reasons, which is why both are case-preserving.

kergonath · on July 31, 2023

That’s a typical software engineer solution, though. In the real world, putting spaces in file names is much more natural than avoiding it at all costs. We find it distasteful only because we have PTSD from using such dumb tools.

The real solution is something that does not split a variable on spaces, or that offers some control about how it splits things. These exist, there is no excuse to stick to bash in this day and age.

We really don’t need another layer of obfuscation between GUIs and the underlying file system.

saurik · on Aug 1, 2023

You could make the same argument about putting slashes in filenames (or, I think, even colons on old versions of Mac OS), and yet we accept that limitation generally without question. FWIW, I generally make sure most of my shell scripts support spaces in filenames--and am in fact more pedantic about it than almost anyone I have ever met--but for a developer tool as part of a build system or whatnot? No: developers should know better and I have no sympathy if they insist on spaces in filenames as it just isn't worth the productivity loss to everyone across the entire stack.

foundart · on July 31, 2023

As a MacOS user, I have long wished that Apple would make it possible to configure a custom file naming policy for the the dialog boxes presented when saving files.

In my dream world it would enable changing "The shell and its crappy handling of whitespace _ Hacker News.html" to "The-shell-and-its-crappy-handling-of-whitespace_Hacker-News.html"

ajross · on July 31, 2023

> Even if [parsing command arguments via whitespace boundaries] was a simple or reasonable choice to make in the beginning, at some point around 1979 Steve Bourne had a clear opportunity to realize he had made a mistake. He introduced $* and must shortly therefter have discovered that it wasn't useful. This should have gotten him thinking.

First off, no, that's not a bug. The alternative is some variant of "here's a data structure to hold the list of arguments you need to manipulate manually". And that's how program execution works in "real programming language" environments (even things like PowerShell).

And it sucks, which is why we all continue to use the Bourne shell after four decades to solve these problems. What we really want is "here's a string to execute just like I typed it; go!". Sure, that's not what we think we want. And we might convince ourselves that mishandling based on that metaphor is a bad thing and write too-smart-for-our-own-good blog posts about it to show how smart we are.

But no, we don't want that. We want shell. If we didn't want shell, we wouldn't write shell scripts. And yet, here we are.

Again, four decades outweighs any amount of pontification in a blog post. If you aren't willing to start from a perspective of why this system remains so successful, you're probably not bringing as much insight to the problem as you think you are.

yyyk · on July 31, 2023

>Again, four decades outweighs any amount of pontification in a blog post.

Many more people used MS-DOS/Windows during that time.

>And it sucks, which is why we all continue to use the Bourne shell after four decades to solve these problems.

>But no, we don't want that. We want shell.

People continue to use Linux and Unix and interactive shell for good reasons. Shell as scripting shell is there simply because it's available, a form of Stockholm Syndrome I guess.

ajross · on Aug 1, 2023

> Many more people used MS-DOS/Windows during that time.

But not since. No one writes new win16 or DOS or batch files. People still write shell scripts. Every day. Despite the existence of a deluge of supposedly better solutions. And if your explanation for why shell is so terrible doesn't have a way to explain that, I argue it's an incorrect explanation.

"We all have Stockholm Syndrome" is just ridiculous, btw.

yyyk · on Aug 1, 2023

>But not since >No one writes new win16 or DOS or batch files.

More than you'd think (CMD files are still supported). Fortunately the syntax is too basic for it to be used too much. However, EXCEL+VBScript runs the financial world. Is VBScript a good language? Shell and VBScript have some half-decent insights for their time, but they have remained stuck while programming languages have improved. The reason people write less CMD files - because Microsoft improved with Powershell.

kergonath · on July 31, 2023

Plenty of us use Linux with something better than bash (zsh, oil, fish, whatever). I know I have for quite a long time now.

yyyk · on July 31, 2023

Linux with other shells is fine. It's just that Bourne is a poor scripting language with way too many footguns. Unfortunately there's plenty of bash shell scripts we are likely running because some people don't know there are better options, or because they are worried about dependencies and write to nigh-lowest denominator.

kergonath · on Aug 1, 2023

> because they are worried about dependencies and write to nigh-lowest denominator.

Yeah, here’s a strong culture of writing ultraportable scripts in case our precious script ends up running on a SunOS box somehow. There are cases where it’s important, but not the dozens of small-ish scripts we end up relying on daily.

I entirely agree about the rest of your comment.

mbwgh · on July 31, 2023

Or they stick to bash to keep their skills honed for when they have to understand some third-party script, or you ssh into some system and can't just quickly install fish first.

foul · on Aug 1, 2023

>Many more people used MS-DOS/Windows during that time.

Most of them suffer for a decade, then (if they stay under Windows) they mostly find Python or Autohotkey.

liendolucas · on July 31, 2023

Ah, double quotes handling can be another nightmare as well, can't make this work:

    function backup {
        local SOURCE=$1
        local DESTINATION=$2
        local ID_RSA=$3

        # VALID_SSH is hard-coded for testing.
        local VALID_SSH="yes"
        local REMOTE_SHELL=""

        if [[ "${VALID_SSH}" == "yes" ]]; then
            REMOTE_SHELL='-e "ssh -i '${ID_RSA}'"'
        fi

        rsync -av \
            --no-perms \
            --links \
            $(if [[ "${VALID_SSH}" == "yes" ]]; then echo "${REMOTE_SHELL}"; fi) \
            "${SOURCE}" "${DESTINATION}"
    }

As it is when calling the function:

    backup "." "backup@redstar.local:" "id_rsa"

rsync complains with:

    Missing trailing-" in remote-shell command.
    rsync error: syntax or usage error (code 1) at main.c(438) [sender=3.1.3]

But if I echo the `rsync` instead of calling it, its output is perfectly valid it actually runs as expected:

    rsync -av --no-perms --links -e "ssh -i id_rsa" . backup@redstar.local:

Been fiddling yesterday and today with no luck, it must be something very subtle. Tried a bunch of different things from SO suggestions but none seem to work. Meh.

o11c · on Aug 1, 2023

As a rule, `printf %q` or `${var@Q}` are very useful when building up quoted command strings.

But your main problem is that `REMOTE_SHELL` is a string, when you need an array. Though I suppose you could make it a string if you used `eval` around its whole use.

fragmede · on July 31, 2023

Have you tried

    REMOTE_SHELL="-e \"ssh -i ${ID_RSA}\""

already?

liendolucas · on July 31, 2023

Yup, same result. I've tried many small variations around the escaping of the double quotes, all with the same result. I haven't kept all of them but definitely tried your approach.

fragmede · on July 31, 2023

Oh it's the

    $(if [[ "${VALID_SSH}" == "yes" ]]; then echo "${REMOTE_SHELL}"; fi) \

Change that to

    $(if [[ "${VALID_SSH}" == "yes" ]]; then echo ${REMOTE_SHELL}; fi) \

and have

    REMOTE_SHELL="-e \"ssh -i '${ID_RSA}'\""

liendolucas · on July 31, 2023

Update, this seems to work:

    function backup {
        local SOURCE=$1
        local DESTINATION=$2
        local ID_RSA=$3

        local VALID_SSH="yes"
        local RSYNC_OPTIONS=(-av --no-perms --links)

        if [[ "${VALID_SSH}" == "yes" ]]; then
            RSYNC_OPTIONS+=(-e "ssh -i ${ID_RSA}")
        fi

        rsync "${RSYNC_OPTIONS[@]}" "${SOURCE}" "${DESTINATION}"
    }

I got lucky and found this information with an explanation at: https://www.mail-archive.com/search?l=bug-bash@gnu.org&q=sub...

So, is it a good advice to always use bash arrays to construct arbitrary options for commands?

saurik · on Aug 1, 2023

Yes. I read this thread in shock and quite some level of dismay that you weren't using an array :(... I am very glad you came across the correct way to code that :).

liendolucas · on July 31, 2023

No, is not :-). With that approach the "-e" gets eaten, producing:

    rsync -av --no-perms --links "ssh -i id_rsa" . backup@redstar.local:

But, you can re-insert it in the if with:

    $(if [[ "${VALID_SSH}" == "yes" ]]; then echo -e "\055e ${REMOTE_SHELL}"; fi) \

and removing the "-e" from the REMOTE_SHELL var. Then... Same error as before!

fragmede · on July 31, 2023

Clever. Stopping to think about this, I think setting

    REMOTE_SHELL=""

when it doesn't want to get used, and then inserting it every time, instead of using the conditional

    rsync ... ${REMOTE_SHELL} ...

should also work. If it's empty, then it won't add anything, but if there's something then it'll get passed in.

Izkata · on July 31, 2023

Auto-splitting variables is for passing arguments to a command you'll be using multiple times, or that you need to change under some condition, for example:

  RSYNC_ARGS='--progress --human-readable --human-readable'
  if some_condition; then RSYNC_ARGS="$RSYNC_ARGS --exclude-from=some_file"; fi
  rsync $RSYNC_ARGS foo bar

I can't think of a use for $* though and would guess it probably existed before $@ and is still there just for backwards compatibility.

stabbles · on July 31, 2023

That's equally problematic... '--progress=yes please' with whitespace runs into problems.

In Bash you can use arrays though:

    args+=('--progress=yes please')

Izkata · on July 31, 2023

See last point about backwards compatibility. Arrays did come later.

The array style probably should always be used nowadays (don't forget you also need to insert it into the command like ${args[@]}), but I learned before it was common (possibly before it was a thing, I'm having trouble finding exactly when basic arrays were added to bash) so I still use the string style when I know I won't need it.

jks · on July 31, 2023

shellcheck can remind you of most pitfalls

hoherd · on July 31, 2023

Every time one of these "shell sucks" articles is posted I read the examples and think "shellcheck would catch that". Shellcheck is the main reason I have not migrated to zsh, or more generally it's because zsh doesn't have a syntax linter.

More and more I find that confidence is one of the best goals you can have in every software and system project, and any tool or process that can increase your confidence is going to make your life easier. Shellcheck gives you that confidence when working with the shell. Never write shell code without it.

arp242 · on July 31, 2023

It should be said that many of the worst footguns shellcheck warns about simply don't apply to zsh – you don't need shellcheck to the same degree that you do with POSIX sh or bash.

The first "wrong" example in this article for example (cp $i /tmp) is perfectly fine in zsh, and the complex 'mv "$i" "$(suf "$i")".jpg' is just 'mv $i $i:r.jpg' in zsh.

$* and $@ are identical in zsh, and both are also arrays. There is no way to use it wrong really, and no weird '"$@" with quotes is a special case that's handled different from everything else'.

Most "shell problems" people experience are "bash problems" (or "POSIX sh problems", but for many they're more or less synonymous); certainly pretty much all the problems listed in this article are.

This always proves to be a controversial opinion, but I will forever keep insisting: just don't use POSIX sh or bash if it can at all be helped. You most likely don't need the portability, and installing zsh (or fish, or oil, or whatever) is easy enough in most the cases because you're the one controlling the environment.

wpm · on Aug 1, 2023

That’s why it’s called bash: you gotta bash your head against the desk real hard sometimes before the damn thing does what you want.

ape4 · on July 31, 2023

Yes but you get messy code

Smaug123 · on July 31, 2023

Turns out that you get at most one of "readable" and "correct" with shell scripts!

pluto_modadic · on July 31, 2023

this reminds me that zsh auto-escaping \$ and \& on basically all my shells messes up pasting curl commands and other payloads... or doesn't and still looks them up in weird subshells.

no such problems in bash.

arp242 · on July 31, 2023

> zsh auto-escaping \$ and \& on basically all my shells messes up pasting curl commands

This must be something you configured, or added in a plugin or something, because it's not the default behaviour, and AFAIK it's also not a simple setting.

diarrhea · on July 31, 2023

As far as I remember, if you put quote(s) first, then paste into them, this doesn’t happen. That would then be strictly better, as zsh catches the unquoted special characters, preventing disaster, but doesn’t interfere if quoting is in place.

Brian_K_White · on Aug 1, 2023

Making the assumption that I'm not intentionally pasting a command is wrong. So if it doesn't do that by default, that's good.

kergonath · on July 31, 2023

That’s very weird. I paste commands all the time without any issue like that. Is it a setting you changed, or some strange behaviour of your terminal emulator?

tpoacher · on Aug 2, 2023

This article is obvious bait. It has to be.

You choose to expand a variable and then complain about not wanting the expansion?

Or about the behaviour of clearly documented special variables doing exactly what they're supposed to instead of something another special variable does? What are they unhappy about exactly, they don't like the symbol that was chosen??

These are all things you find out within minutes of reading the manpage. (one of the best written manpages out there, in fact).

Heck, this isn't even a "cannot whitespace" problem. It's expanded vs unexpanded tokens. Which is a feature of the language. Because it's text/stream based (not "despite"). 35 years of shell experience? I call bait.

wodenokoto · on Aug 2, 2023

Just because something is documented doesn't mean it is not crappy. I agree with the author, that just like how | aren't interpreted in `echo $myvar` neither should spaces.

I also agree that `mv "$i" "$(suf "$i")" is unnecessarily cumbersome.

tpoacher · on Aug 3, 2023

Indeed. Which is why you should use the built-in parameter expansion provided by bash for easily removing a suffix, instead of an external program.

The thing about spaces (and expansion more generally) as I understand it is this. At some point the bash designers had a choice how to represent expanded, unexpanded, and verbatim strings. One could argue that verbatim should have been the default, with expanded using single quotes, and unexpanded using double quotes, such that you would only get expansion if you really wanted it. This might have been my choice if I had designed it.

I'm guessing, however, that the designers quite reasonable went for the opposite logic, which was, if you use a $ in your string, you're probably after expansion, rather than a verbatim $ symbol (and you can always escape it in the unlikely scenario that you did want a verbatim one). So presumably they chose expanded as the default because it would be the most common, leaving single quotes for verbatim instead.

unshavedyak · on July 31, 2023

Funny, i was just discussing migrating Plex files from Windows to Linux, and if it requires persisting whitespace in the filenames. My position was that "Linux" handles spaces fine, but if i can avoid dealing with escapes in filenames that would be nice.

Curious to read this article when it recovers from the HN load.

rantingdemon · on Aug 1, 2023

Or you can use powershell .

Joking aside, as a relatively newbie to the star nix world and liking it - shell scripting in Linux/Unix is broken.

The object handling approach that powershell provides just seem to be a better approach.

YesThatTom2 · on July 31, 2023

> I think what bugs me most about this problem in the shell is that it's so uncharacteristic if the Bell Labs people to have made such an unforced error.

Bell Labs fixed it with the shell called “tc”. They can’t be blamed that you ignored it.

https://www.scs.stanford.edu/nyu/04fa/sched/readings/rc.pdf

mjd · on July 31, 2023

I didn't miss it. I was an rc user for many years.

spc476 · on July 31, 2023

How about using non-break spaces in filenames? It worked when I last tried it (https://boston.conman.org/2018/02/28.2).

seeknotfind · on Aug 1, 2023

I would expect `suf foo.html` to return `html`

deafpolygon · on Aug 2, 2023

PowerShell is worth looking into (and can be a viable shell on Linux).

JadeNB · on July 31, 2023

> for i in *.jpeg; do

> mv "$i" $(suf "$i").jpg # two sets of quotes

> done

Doesn't this need another set of quotes, `mv "$i" "$(suf "$i")".jpg`, or else it'll just fail in a slightly different way when asked to `mv "bite me" bite me.jpg`?

oneshtein · on July 31, 2023

Yep.

  $ echo $(echo "a     b")
  a b
  $ echo "$(echo "a     b")"
  a     b

stabbles · on July 31, 2023

read one more sentence of the article

mjd · on July 31, 2023

The article says that in the very next sentence.

JadeNB · on July 31, 2023

So it does, thanks and sorry. I got so puzzled wondering what special case I was missing that would handle this correctly that I couldn't move on until I realized why what I was sure was wrong wasn't, even though, as you say, moving on would have let me realize that what I was sure was wrong was.

mjd · on Aug 1, 2023

BTW, regarding this: https://www.reddit.com/r/math/comments/chhtx/comment/c0smh0c...

* That article was #4 in a series. I did add a correction to article 3 the same day that 4 was published, and very soon afterward (same day, perhaps) I published a detailed followup. A few months later I added cross-links between all the articles. (See https://blog.plover.com/math/i-5.html )

* I'm sorry I didn't reply to your email at the time. I'm not sure that I received it. But I just checked, and I did get at least two messages on the topic that I didn't reply to, one from someone named Briggs and one from someone named Barlotti. If either of these is you, I apologize. I do try to answer blog-related email.

JadeNB · on Aug 1, 2023

> * I'm sorry I didn't reply to your email at the time. I'm not sure that I received it. But I just checked, and I did get at least two messages on the topic that I didn't reply to, one from someone named Briggs and one from someone named Barlotti. If either of these is you, I apologize. I do try to answer blog-related email.

Neither is me, but it's no problem! You have often responded to my blog-related e-mails (my pessimistic Reddit comment that I didn't expect a reply was because it was probably my first time writing to you, and most bloggers without comments don't respond), and I have no proof that I sent this, so I might well have thought I did but left it sitting in draft, or something.

> * That article was #4 in a series. I did add a correction to article 3 the same day that 4 was published, and very soon afterward (same day, perhaps) I published a detailed followup. A few months later I added cross-links between all the articles. (See https://blog.plover.com/math/i-5.html )

I do see you mentioning in part 3 that there are many more additive-group automorphisms of ℝ, even continuous group automorphisms, than you had originally expected, and also mentioning (there and explicitly in part 5) that continuity is needed to say that there aren't even more. But I think that I probably did not make my point very clear, or else I am still misunderstanding.

I think (but can't say for sure—I refer to “the linked article”, because I missed that it was just a link to the math tag, so that I was just seeing whatever article was current at the time) that I was referring to https://blog.plover.com/math/i-4.html, which still seems to say:

> … the only automorphisms of the complex numbers are the identity function and the function a + bi → a - bi.

Here the context makes clear that "automorphism" means (at least) "field automorphism". It is this statement that I was disputing. You mention a streamlined argument that 1 is sent to 1 and −1 to −1, hence i to ±i; but that doesn't allow us to conclude—it only says that f(a + bi) equals f(a) ± f(b)i, and we are left to wonder what f is doing on ℝ. (The same issue crops up in the same way in part 3, which again seems more or less to conclude with the observation that f(i) = ±i.) Without continuity, we need not have that f restricts to the ‘standard’ embedding of ℝ in ℂ, and so have more than two field automorphisms of ℂ; but the construction of discontinuous automorphisms here is even scarier, using not just a Hamel basis of ℝ over ℚ but a transcendence basis (https://en.wikipedia.org/wiki/Transcendental_extension#Trans...) for ℂ over the algebraic closure of ℚ. A cute consequence that, to me, is proof enough that transcendence bases are a worthwhile concept: the field ℂ(X) of rational functions in one variable over ℂ admits a field embedding into ℂ; or, more precisely and to some people cuter, the algebraic closure of ℂ(X) is isomorphic to ℂ.

It is certainly true that, all possible fanciness aside, there are only two ℝ-algebra endomorphisms of ℂ, where the definition of an algebra map requires it to take multiplicative identity to multiplicative identity. In particular, we find that every ℝ-algebra endomorphism of ℂ is an automorphism, which sounds fancy when you say it with that many syllables.

mjd · on Aug 1, 2023

Hey, I do that too for the same reason.

You have my sympathy, it is super embarrassing when it happens.