I think one of the larger issues with curl | sh is what could happen in the event of a network outage or early termination on the connection.
For example, if you're downloading a script that has a line like this
rm -rf ~/.tmp/foo/bar
But the HTTP connection was lost before the entire file was downloaded and `rm -rf ~` was the end of one packet and `/.tmp/foo/bar` was the contents of the other (lost) packet, you're screwed. The incomplete script will still be piped to sh and it's game over.
I had something like that happen once, but it worked out in my favor. I had misconfigured BIND, which I had intended to run just as a caching name server for my network, and it was listening for outside connections too. Some variant of the Lion worm found it and used a BIND bug to get onto my system.
It sent my password file to someone in China, started a scanner to look for other systems to infect, downloaded a .tar.gz file that contained a root kit to hide itself, unpacked the .tar.gz, and ran the install script contained therein to install the root kit.
Or rather, it tried to. I had ISDN at the time, and had noticed the modem lights heavily blinking even though I was not doing any internet activity. This confused me, and I pulled the plug on the modem. Turns out I pulled the plug while it was downloading the .tar.gz. It got most of the file, but not quite all of it. It lost the last file inside the archive--which happened to be the install script!
Without the install script, it could not install the root kit, and that made getting rid of the worm a heck of a lot easier.
OK, but the detection of partial vs complete script is not as simple as 'does last line have EOL'?
There are various builtins that require an end token, like case/esac, if/fi, etc. Do these work properly when truncated at an arbitrary line?
Those control structures do work properly - the shell reads ahead until it finds the end token, and fails if it's absent.
It is true that there remains the problem of potential truncation exactly at the end of a top-level line, but I contend that "it stops running here" is a much easier thing to reason about (and, strictly, could always happen if hit with a SIGKILL anyway) than "does the meaning of this line change if we cut it off in a weird place".
HTTP pipelining is about reusing a TCP connection for multiple requests. It doesn't influence when curl outputs data and wouldn't apply here anyway. I don't think there's any mode which would cause curl to buffer the entire response before writing any of it.
Yes, I'm aware of how HTTP pipelining works. It was a poor choice of terminology. My point is that by default curl does buffer some of the response. And if the connection was terminated before the first buffer was output, then I would expect this to result in an error which would abort the shell pipeline.
Yes, in some cases curl won't produce any output, like if the web server is down, or the connection fails before anything is returned. And yes, it would also happen if curl buffers some of the response and then dies. I don't really see why that's interesting.
The default buffer size is typically the page size, which is typically 4096 bytes. I would expect a large number of these scripts to be less than 4096 bytes meaning curl would output nothing before producing an error and the partial script would never be evaluated.
That's the default buffer size for pipes, which won't matter here. When curl terminates, whatever's buffered in the pipe will be flushed. The only thing that could prevent downloaded data from being received by the shell would be internal buffering in curl, if it does any.
Good point. curl doesn't do any internal buffering. I was thinking that the pipeline should be aborted if the curl exits with a non-zero status, but of course this is not the case.
Yeah, it would be nice if there were a way for a part of the pipeline to signal that something bad happened and everything should stop. Ideally, some sort of transaction system so the script is guaranteed to run fully or not at all. But instead we have this crazy thing.
For example, if you're downloading a script that has a line like this
But the HTTP connection was lost before the entire file was downloaded and `rm -rf ~` was the end of one packet and `/.tmp/foo/bar` was the contents of the other (lost) packet, you're screwed. The incomplete script will still be piped to sh and it's game over.