Hacker News new | past | comments | ask | show | jobs | submit login
Import – A Simple and Fast Module System for Bash and Other Unix Shells (import.pw)
129 points by goranmoomin on July 16, 2020 | hide | past | favorite | 52 comments



  # The URL is downloaded once, cached forever, and then sourced
  import "https://git.io/fAWiz"
Call me old and cranky, but I thought this naive fad of piping the unchecked contents of URLs to the shell would die off eventually once people got clued in to how dangerous it is. At a bare minimum the contents of any shell script coming from a remote source should be manually reviewed the first time, and then verified to have not changed by comparing against a locally stored cryptographic hash.

This is no better than curl with some automatic caching. At the end of the day why not store your dependencies along side your scripts and then not worry about it changing unless you change it yourself?


Piping a url into bash is not more or less risky than downloading a windows binary, installing a snap, installing a npm/composer/whatever package with post-install hooks, downloading a binary for your Mac.

It's impossible to read every line of code we execute (many of which are closed source anyway).

Downloading a large script and reading it before running it in bash is also hardly good security measure.

It's much more practical to teach people how to recognize trustworthy sources, and how HTTPS works. Or you can switch to a platform like iOS where there's complete control, sandboxes and reviews.


The risks involved with executing code are legion, and wanting to promote cultural change in the direction of better safety isn't invalidated by pointing out failings in other areas.

> Downloading a large script and reading it before running it in bash is also hardly good security measure.

> It's much more practical to teach people how to recognize trustworthy sources, and how HTTPS works. Or you can switch to a platform like iOS where there's complete control, sandboxes and reviews.

Downloading bash scripts and inspecting them before running is definitely a good security measure, as is running untrusted software in a VM or container where possible, and generally promoting caution when running anything acquired from the Internet. Even if the source is considered trustworthy there is still the possibility that it has been compromised. npm, pypi, and even the Linux kernel source tree are all examples of this.

In this particular situation the use case is extremely weak. The problem is solved by incorporating dependencies locally, as I pointed out above.


This is all fair, it just sometimes feel people are a little too focused on running bash scripts, when something like `npm i create-react-app` downloads 200K lines of code, all of which can autorun. 'Read everything you download' is just not incredibly helpful. From all the code we trust and not read is just a drop in the ocean.


I totally agree, and never understood the focus on curly bash script. People seem to be ok to 'git clone; make; sudo make install' something, but hesitate to pipe a remote bash script that comes from the same author, and often is hosted in the exact same repo... At least, nobody ever specifically pointed out how dangerous the make scenario is.

There are a few minor things one should do when writing such a curly bash script, but overall I, personally, don't worry too much about it when I encounter them, I just do my usual security assessment and risk mitigation (which usually does not include reading the code -- that's just not practical).


I think it's the programmer's equivalent of "get off my lawn you kids" or "if you make that face for too long, it'll get stuck like that forever".


> Piping a url into bash is not more or less risky than downloading a windows binary, installing a snap, installing a npm/composer/whatever package with post-install hooks, downloading a binary for your Mac.

I agree. All of these are risky operations where I vet the source before running the code. (and for that reason I prefer to stick to software in my distro's repos, when possible).

Sometimes I am able to vet the source directly, like with bash scripts. I prefer this. Other times, I have to make do with verifying the reputation of the distributor.


Still, downloading and executing code can be guarded by trusted signatures, as used by deb etc. You can protect such an "import" statement by restricting it to signed files, at the very least. That allows you to "trust" a source without reading every line of it. Not infallible, but much better than "download and run this thing from some obfuscated URL and assume it hasn't been hijacked"


> Downloading a large script and reading it before running it in bash is also hardly good security measure.

I don't know how large you mean, but things written in the shell are usually very conducive to a quick scan. You're operating at a high level of abstraction so underhanded code is forced to be more obvious. It's actually extremely easy to find code that is at least suspicious enough not to run (but not necessarily malicious) so there's plenty of reason to still do it as opposed to trusting leaps of faith.

I guess if it's several thousand lines then it's not worth reading, but what script that's worth running is?

The other thing this misses is that you're running unsigned software, which despite some protesting I've seen here is certainly better than blind trust. That's what you're getting out of most binaries on your system, and other packages being pulled down. Piping a url into bash is more risky than running something unsigned, always.

HTTPS is also a pretty poor integrity enforcement tool. Sites are breached all the time making it trivial to change the code being served. You might not have anyone changing it on the way, but if you're drinking bad water straight out of the tap it doesn't matter.


> Piping a url into bash is not more or less risky than downloading a windows binary, installing a snap, installing a npm/composer/whatever package with post-install hooks, downloading a binary for your Mac.

Wrong. Even windows binaries are signed nowadays. When you use a package manager, you cannot be targeted (assuming the package manager uses https) so you get at least some safety from sheer numbers. A URL, otoh, can easily be personalized. I could easily serve you a specific malware whereas everyone else and GitHub just sees the benign code.


> I could easily serve you a specific malware whereas everyone else and GitHub just sees the benign code.

Could you, though? For a GitHub URL?


I could obviously not change the content of a GitHub url. Well, at least not for a reasonable cost and effort. But I can easily create a "curl this and execute it" line in a README.md and have it serve malicious code just to you.


As far as I can tell, the URLs don't have to be GitHub URLs, you can provide any URL for people to "import"


That's true, but I read parent as specifically talking about Github. In general, if it's on a server I control, it's more doable. Still not easy, I guess – how do you tell me apart from everyone else who fetches that same dependency? But it may be doable.


The dhall language does https imports, but with two important differences:

1. The language is _pure_ and _total_, so it will _eventually_ terminate and cannot perform side-effects 2. You can specify the SHA256 of the referenced resources inline, next to the URL. This gets you caching and breakage warnings.


   # The URL is downloaded once, cached forever, and then sourced
   import "https://git.io/fAWiz"
Where is this program declaring the above content's expected SHA256 to check that the expected content is being received, and not some altered version?

Think about it. You write the program with the above #import, and at the time, it fetches a certain piece of code. You see in your cache that it's the right one.

Then you deploy the script to a fellow user. They run it, and the import fetches something different from the same URL. The script breaks. You're scratching your head: hey, it works on my machine, why not yours? Or worse: the replacement code is malicious not simply breaking. The user's files are gone or whatever.


Don't get me wrong, that's would be a nice feature, but I think it's a little optimistic that people would actually use it and ask for more security outside of TLS. I mean the most popular (n)vim plug-in managers just clone arbitrary git repos with no other verification.

Code signing or hashing is vanishingly rare and a huge burden on the end user who simply, unfortunately, not care in the slightest. The "good enough" root of trust is that the server hosting the code hasn't been compromised.


Far from this being rare, every major GNU/Linux distro checks the hashes of downloaded material it depends on. Failing to do this threatens integrity/stability of the system and provides an open door for malware.

Vim package managers pulling git repos can easily check out a specific hash; if they are just blindly following branch heads by name, that's just dumb and nobody should emulate their behavior.

TLS is neither here nor there because there isn't much of a privacy concern. Plaintext HTTP works fine for fetching some piece of FOSS, if you check the digest to verify that you're getting what you're expecting to be getting.

On the contrary, the assurance that the HTTPS host key has not changed is not relevant. It does not amount to (is something other than) the relevant assurance that the URL content has not changed.

Moreover, there may be numerous hosts which provide that exact same content. If you have the digest, and the original host stops providing the content reliably, you can confidently switch to a different mirror, which could be a HTTPS host you have never contacted, or even a HTTP one.

The trust that the server hosting the code has not been compromised is misplaced and foolish. The person in charge of that content could be someone who thinks it is a good idea to change the content of a software release (to sneak in one more bugfix) without changing anything in the URL. Thus two of your own users who think they are using the exact same version of your software will fetch the same foolib-1.2.tar.gz URL a day apart, and get different files, which breaks the integrity of your own release. Without hashes, this goes undetected.

There is no "huge burden" here. All the module tool has to do is to make it mandatory for the fetching commmand to have a digest argument. If the digest doesn't match the content, all it has to do is fail that command, and issue a diagnostic. The diagnostic can include the actual hash that would work, and point to the file and line number where the failing import is located.

The user relies on hundreds of imports that they didn't write themselves, and which require no attention at all in the happy case that they fetch what they are supposed to fetch.


I don't think this detracts from your point but despite being technically true, it doesn't imply what you think. Linux package managers really verify the signature of the hash which is downloaded from the same server and then compare it against the package. Code signing is the real root of trust here, not hashing.

You can't escape the total insecurity of git cloning because a malicious actor will just create a branch with the name of the desired hash and every client will happily clone it. You are pretty much always trusting the server unless you clone and then verify the content. Plus pretty much all "language" package managers and brew do the "dumb" thing since you only usually specify name and maybe version in your dep file.

I'm not saying that this isn't a good idea, but there's a long long list of examples where extremely popular code managers/downloaders that simply don't care about this.


> Linux package managers really verify the signature of the hash

> Code signing is the real root of trust here, not hashing.

Hashing is the root of that specific trust which says "this is the same thing it was 17 months ago when we first fetched it".

(It is not deductively implied; it is predicated on the difficulty of producing a rogue file that has the same hash.)

Maybe it was a fake then, loaded with malware; that's a related, though different concern.

The fact is that I can inspect something, like a module of shell script code, to my satisfaction to be free of problems, and debug all my stuff with it. The hash then gives a lot of assurance that it's the same thing I validated before.

> You can't escape the total insecurity of git cloning because a malicious actor will just create a branch with the name of the desired hash and every client will happily clone it

If a branch or tag can be created that has a name indistinguishable from the syntax of a hash, that's a disturbing ambiguity in git that should be arguably patched away. It is tangential to this topic.

I believe, perhaps mistakenly, that after the repository is cloned and checked out (by any means: hash or branch/release tag), a validation can be made that HEAD points to the expected commit, e.g. using

   git rev-parse --verify HEAD^{commit}
That should defeat the ruse.

Language managers are based on a vested interest in popularizing a language, by maximizing the amount of activity occurring in its "ecosystem". If every dependency were pinned down by a hash, such that the upstream developers have to beg everyone to update to a new hash every time they want to publish some change, that would put a huge a damper on the party. Language package managers are not motivated by stability, reliability and security. I'm not in favor of language package managers, and none of my work or side projects depend on them.


It's not so much about security as about reproducibility. What if this is being used in scientific research?


What if it's being used by your bank, government site, or embedded system in your TCP/IP router, or automobile?


Dhall does this for import caching.


It can't just be something that works inside the cache. When an item is newly introduced, such that the cache has no traces of it, the very first fetch has to already validate that it's the right article.


Impressive…

How people manage to poison even old-school shell scripts with useless bloat.

If you want to source remote scripts that badly (despite this being against policies of almost every company), better keep it simple:

    telesource() { f=~/.cache/${1/\//} ; [ -f "$f" ] || curl -SsLo "$f" "$1" && source "$f" ; }
So you can invoke it like that:

    # The URL is downloaded once, cached forever, and then sourced
    telesource https://git.io/fAWiz
What does the name ``import`` even mean? Does it anyhow relate to sourcing? Or even shell scripting at all?

Now I'm waiting for a full JIT ECMAScript VM in pure portable Bash. Impress me even more ;)

Nonetheless, that's what it looks like to me. I'm just not fond of spilling idioms from one technology to everything else its users touch. That's much like our IT equivalent of Midas touch.


> What does the name ``import`` even mean?

import name-clashes with ImageMagick's import, which comes pre installed in lots of Linux distros.

Issue (by me): https://github.com/importpw/import/issues/35


Color me old, but wtf, the JS script kiddie obsession of never writing a line of code that can be packaged in a module has now come to shell scripting?!


Hello old, nice to meet you. I am skeptical.


I think that this is bad idea and it increases non-working scripts(what if url stops working? ) . Also, it increases malwares.(what if url pointed to is changed to malware script?)

Golang's package management is very good, but this isn't.

Why can't they just paste the required script in main script itself? What's the necessity of import here?


Never change below your feet unless your machines are ephemeral, constantly changing, and thus do not have a permanent cache.

Do not use this in production folks.


The bare minimum for saying "never change" involves a cryptographically secure checksum of the content, and even then it's not guaranteed.


I did a similar thingy a while back:

https://github.com/alganet/coral/blob/master/doc/spec/requir...

It also does references (`require `something from inside modules) and some stuff. It is missing the download component though.

I've also worked on https://github.com/Mosai/workshop, a previous attempt at shell tooling (please do not use it!)

Also, there was a popular https://github.com/pixelomer/bashpm at some point.

Most stable thing would probably be: https://github.com/shellfire-dev

Here are my thoughts working in this "hobby" for the last 6 years:

- External tools are bad for shell portability and performance. It costs more to invoke "cut" than to do a loop and a couple of parameter expansions. Tools like cut, sed and similar are not suited for small strings, they are for large files.

- ksh93 and ksh2020 support is possible, see: https://gist.github.com/bk2204/9dd24df0e4499a02a300578ebdca4... These kshs are fast, I want them as supported engines.

- Pipes are bad for performance. Local variable buffer is faster in all shells. There is a proof here somewhere: https://github.com/alganet/shell-proto

- We need to create a shell "dialect". Leave a lot of stuff out to simplify its syntax, so we can write a shell parser written in pure shell. This is a crucial point towards a better shell ecosystem.

I have some research related to these issues, I'd love to collaborate in something. Standards, maybe? I think the shell community needs to talk as much as the early JS community did in the past.


I hate the way a lot of tools nowadays assume it's okay to just create a new directory in the user's home. For instance, this tools create ~/.import-cache. Why doesn't it use a subdirectory in ~/.cache? What's the next step? ~/.import-config? And then ~/import-tmp? This is just getting out of hand. There are places where your tool can store config and cache data, use them.



Exciting idea!

For me the application of this that I'm most interested in personally is for dotfiles, configuring the interactive shell. I have a fair amount of useful stuff in my own bashrc that I've developed over the years, which people who see me work often want to use too (and I happily share). I've wished for a good way to share each logically-independent customization in its own separate chunk -- so that people can pick and choose, and also combine what they like from me with their own existing config and chunks they've picked up elsewhere.

For zsh, oh-my-zsh seems to have been extremely successful, and IIUC this describes a key technical contribution at the core of how it's worked so well.

So I'm hoping that this thing can play part of the role of oh-my-zsh, for bash! (And for other shells too, sure.)

I'll definitely be interested to read more of the details of how it works, and to play around with it.


You might be able to make your own "chunk modules" system, by sourcing from .profile something like ".shell_modules//.sh", and then use GNU Stow [0] to selectively install, from a centralized directory, whatever modules you're interested in.

Each module would have to go to its own subdir (the '*' part in my example), but that's a good thing as it means that each module could contain several files.

[0]: https://www.gnu.org/software/stow/


(Before using this for real, I'd certainly want to be able to track the exact version I'm getting of a dependency, with a hash, in a "lockfile" sort of file that I can keep in version control. That's not mentioned in the demo on the front page - but if it doesn't already have that feature, I hope it will grow it.)


As https://news.ycombinator.com/item?id=23865244 mentioned, the link should be able to point to anything, including a specific commit in version control.


As mentioned elsewhere, Dhall has a similar import-from-URL feature (with cryptographic hashes), and it is/was popular to use IPFS URLs, since they're content-addressed (i.e. changing the content will change the address, so each address will always point to the same data)


If you're pinning a shell script to a specific hash, why not just keep the script itself in version control?


I'm thinking especially of the use case of sharing for people who have not been accumulating dotfiles forever and haven't set up some kind of way to manage their dotfiles in version control. (Heck, I don't myself have a system that I love for that, though it meets my basic requirements in that I can always ask Git exactly what changed.)

So "add this one line to get my prompt" is then a much lower-friction suggestion than "copy this whole little script".


Nice!

Who says that you have to download from arbitrary URLs? If you have to use public modules you are free to copy the source to some internal server first. It's just a single file!

It looks like modules can't import other modules so this shouldn't get too out of hand.


this makes me uncomfortable...

reasons:

1. i dont believe package systems help bash. we already have ansible for systems ... if this goes the way of npm, bash will get bloat and chaos.

2. shells are a system process thats usually very privileged .. i see alot of abuse potential since it interprets everything you source.

3. see 1, and think performance on medium sized scripts, or if an expensive "module" is loaded.

4. versioning chaos, killing old scripts ... in the unix and linux space, some decade olds scripts are still used.

this would expose weaknesses ... but also destabilise a lot.

edit: 3,4, also thank you regardless of my opinion, your code and execution look and feels great.


Seems to be failing at the moment: https://imgur.com/a/UEbJBKA


It was the same for me the first time I tried, but the "View on GitHub" link at the bottom still worked (and now the main page works too).


At my last employer (large US bank you’ve definitely heard of), they had something like this internally. Tools/scripts were packaged up as modules and you could/would “module load foo”. It knew what OS you were running, so Windows users could do the same thing for cross-platform tools.

It was very handy and I’m looking forward to trying this as a substitute.


This would be perfect if it also included a bundler/linker to build single-file executables.

I also find it amusing each time the argument against using URLs as import specs comes up. Go already does this, we do it on the web with <script> tags, and now Deno is doing it. It's proven to work as long as you can reasonably trust the URL.


Your scientists were so preoccupied with whether or not they could &c


what's wrong with

    source <file_path>


Not webscale /s


[flagged]


...huh. I would have expected to pin to a commit (by hash), but that shortener link has nothing like enough address space to be doing that, so yeah that's a poor arrangement. But you could point to a commit, which I would think is quite good.


To be fair, it is an immutable shortened url, but it could still point to a random site that then redirects to a git url (when it's being benevolent). Definitely better to point directly to a git commit hash, though still a bit sketchy regardless.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: