Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think one of the larger issues with curl | sh is what could happen in the event of a network outage or early termination on the connection.

For example, if you're downloading a script that has a line like this

  rm -rf ~/.tmp/foo/bar
But the HTTP connection was lost before the entire file was downloaded and `rm -rf ~` was the end of one packet and `/.tmp/foo/bar` was the contents of the other (lost) packet, you're screwed. The incomplete script will still be piped to sh and it's game over.



I had something like that happen once, but it worked out in my favor. I had misconfigured BIND, which I had intended to run just as a caching name server for my network, and it was listening for outside connections too. Some variant of the Lion worm found it and used a BIND bug to get onto my system.

It sent my password file to someone in China, started a scanner to look for other systems to infect, downloaded a .tar.gz file that contained a root kit to hide itself, unpacked the .tar.gz, and ran the install script contained therein to install the root kit.

Or rather, it tried to. I had ISDN at the time, and had noticed the modem lights heavily blinking even though I was not doing any internet activity. This confused me, and I pulled the plug on the modem. Turns out I pulled the plug while it was downloading the .tar.gz. It got most of the file, but not quite all of it. It lost the last file inside the archive--which happened to be the install script!

Without the install script, it could not install the root kit, and that made getting rid of the worm a heck of a lot easier.


This can be mitigated by wrapping your whole script in parens, as in:

  #!/usr/bin/env bash
  (
    # real work
  )


Why don't shells NOT execute a command if there's no trailing EOL character, to mitigate this very problem?


Why should shells encourage this behaviour (piping unknown stuff to them?)

You should at least download, verify the size and checksum (if available), take a peek at it, and only then run it.


Because you might be piping known stuff to them. Receiving a partial input stream is not necessarily reliant on networking.


OK, but the detection of partial vs complete script is not as simple as 'does last line have EOL'? There are various builtins that require an end token, like case/esac, if/fi, etc. Do these work properly when truncated at an arbitrary line?


Those control structures do work properly - the shell reads ahead until it finds the end token, and fails if it's absent.

It is true that there remains the problem of potential truncation exactly at the end of a top-level line, but I contend that "it stops running here" is a much easier thing to reason about (and, strictly, could always happen if hit with a SIGKILL anyway) than "does the meaning of this line change if we cut it off in a weird place".


In at least some of those cases, I would expect curl to exit with an error and the pipeline to abort.


All bash sees are bytes coming in on stdin, and eventually an EOF. It neither knows nor cares what caused the EOF.


Sure, but if the server response is not pipelined (as is probably often the case), then bash should never see anything.


HTTP pipelining is about reusing a TCP connection for multiple requests. It doesn't influence when curl outputs data and wouldn't apply here anyway. I don't think there's any mode which would cause curl to buffer the entire response before writing any of it.


Yes, I'm aware of how HTTP pipelining works. It was a poor choice of terminology. My point is that by default curl does buffer some of the response. And if the connection was terminated before the first buffer was output, then I would expect this to result in an error which would abort the shell pipeline.


Yes, in some cases curl won't produce any output, like if the web server is down, or the connection fails before anything is returned. And yes, it would also happen if curl buffers some of the response and then dies. I don't really see why that's interesting.


The default buffer size is typically the page size, which is typically 4096 bytes. I would expect a large number of these scripts to be less than 4096 bytes meaning curl would output nothing before producing an error and the partial script would never be evaluated.


That's the default buffer size for pipes, which won't matter here. When curl terminates, whatever's buffered in the pipe will be flushed. The only thing that could prevent downloaded data from being received by the shell would be internal buffering in curl, if it does any.


Good point. curl doesn't do any internal buffering. I was thinking that the pipeline should be aborted if the curl exits with a non-zero status, but of course this is not the case.


Yeah, it would be nice if there were a way for a part of the pipeline to signal that something bad happened and everything should stop. Ideally, some sort of transaction system so the script is guaranteed to run fully or not at all. But instead we have this crazy thing.


Some scripts also detect this and are written so that there is no code executed before the file is not complete. Of course, that's a minority.


Definitely including a checksum and validating that before executing would be ideal.


To elaborate, this is quite easy: you wrap the entire contents of the script into a function definition, then call the function as the last line.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: