One of my worst was `source ~/.bash_history` via accidental tab completion. I was expecting .bash_profile. The nasty part was, I couldn't kill it because of all the mess (system instability) it was creating. A couple of 'cd ..' calls and a 'rm -rf *' ended up biking some root directories.
Could you add 'rm -rf' to histignore? I remember it works with patterns but can't remember if you can use it for specific commands. I'm not on my linux box now so I can't test it.
My lazy way to do something similar with Gnome Terminal is to make an "SSH profile" with a different background color. This helps reduce my mistakes.
Usually I'll just do things like df on the wrong machine and wonder what's going on with my disk space, but I've made a few less happy errors in the past.
Unrelated to anything but: Your terminal colours seem a bit brighter than the standard ones in Snow Leopard. Are Terminal's colour capabilities improved in Lion, or are you using that SIMBL plug-in, or...?
That is indeed Lion running there. See the full screen icon on the top-right and smaller buttons on the top-left. I haven't played with it yet but there is much more control over colors in Terminal.app in 10.7.
I find a prompt better than a status line for reminding one of where one is, much like including the CWD in the prompt.
I use a similar style:
[alfie] ~$
[rupert] ~$
etc. for my SSH connections, but not for local shells, even though I also use screen. I also use color-coded prompts for selecting particular PATH and other environment variable setups for different branches of our corporate dev tree, to remind myself which compiler etc. I'm using.
I really don't see how these are connected. You could easily have one screen window open and type a command into it absent mindedly without realizing it's not your laptop's shell.
I always hated remote firewall changes.
I usually had two terminals open. In one, I'd run my "cya" script that slept for a minute, restored the old firewall rules, slept another minute, then did a reboot. In the other terminal, I'd run the new firewall rules. If I didn't screw up, I'd go back to the other terminal to stop the "cya" script. If I did screw up, the worst case was a few minutes of downtime that I'd hope nobody would notice.
The script was: sleep 1 minute, then restore rules, sleep 1 more minute, then reboot.
That first rule restore was so I could conceivably get things back in order before the reboot. I had the final reboot there in case the new firewall rules somehow killed my ssh session. This never happened, and I might have been able to ssh in again at that point, but I wasn't going to count on it.
in at least tcsh, "set noclobber" will help with this. when you try to overwrite a file that exists (usually from doing > instead of >>) you will get an error that the file exists instead. if you really want to overwrite the file, you have to use ">!".
`cp -r folder backup` turns out folder was a symlink. Then I messed up my script and deleted all of the contents. Backup was destroyed with the original since I copied the symlink instead of the directory. Luckily I had just setup a slave server and was able to copy 95% of the files from there.
Recently I did a `rm -rf /directory/` instead of a `rm -rf /directory/directory2`. Once again luckily I had real backups.
Every-time I screw up, or a system has problems (stupid hard drives) the belief that backups the most important part of a system is reinforced. It basically doesn't matter what you do if you have proper backups you can recover.
The catch there is that no backup is truly a backup until it is tested.
Unfortunately, even that is not enough, in the long run.
You have to periodically retest your backups, and transfer them to new media as they age.
It's also a good idea to store backups off-site (preferably in multiple geographically-dispersed locations).
And, it almost goes without saying that the more frequently you do backups, the less data you'll lose when you actually have to restore from them.
Before long, it's a full time job just to keep the backup system humming along smoothly, testing and retesting backups, and transferring them from old media to new.
Of course, this problem gets a lot harder and more time consuming as the quantity of data you need to backup/restore grows.
I keep reading about the crazy amounts of data generated by projects like the LHC, and my mind boggles at what the challenges in doing backups of that amount of data must be like.
Here's a compendium of tips for avoiding mistakes like these, largely culled from other comments here, but some from the article.
1. Never `rm `, especially `-rf`. If you must, `rm -rf ../tmp/` or similar. You do not want this command to be in your history. Especially in the form `rm -rf * & cd; trn`.
2. Set up backups early on. Use `git` or similar for as much as possible.
3. `noclobber` (or not setting CLOBBER in zsh) may help. I found `noclobber` irritating because it also refused to append to nonexistent files.
4. Take three deep breaths after typing and before executing a `shutdown` or `rm -r` command, or typing a password at a password prompt. (Did you just give your intranet server password to whoever has backdoored your weekend-project VPS?)
5. Keep backups in a format that you can readily verify is usable. `rsync` with `--link-dest` is handy for this, as is `git`.
6. When referring to a directory that you believe exists, don't say `foo/bar/baz`. Say `foo/bar/baz/.` so that whatever command doesn't try to create the directory `baz` if it doesn't exist yet.
7. Color-coded prompts or terminal windows by server.
8. Tab-complete to ensure that things exist.
9. Use `sudo` instead of root shells.
10. Keep your config files in source control. RCS is okay, but Git is better.
Nice idea. Usually you see blog posts with the latest success or the code you are proud of but showing mistakes and errors is not that common. I'm dreaming to see the academic papers where instead of all the wonderful results you'll see the errors path to reach the final result.
> Usually you see blog posts with the latest success or the code you are proud of
I'm observing the exact opposite. While the official project sites show indeed only success stories, I'm observing a trend that those sites increasingly add a "blog" section where the developers talk freely about their experiences and mistakes.
> I'm dreaming to see the academic papers where instead of all the wonderful results you'll see the errors path to reach the final result.
The worst I've done is unplug the internet connection of an ISP, who happened to be a competitor of the company I was working at. Not a good day at all...
First day at college the lecturer says "As soon as you've finished editing your programmes, type 'rmcobol < a.txt > a.out'".
So, having finished first, I type "rm cobol < a.txt > a.out", and I spend a while wondering what "cobol not found" means before I realise what I've done.
i once did `sudo mv . /var/www/` when in root... i had been copying files to my webserver.. before i knew it my connection had closed and i couldn't ping the server, after running to the colo i find i had no backups my rsync had been failing for the last couple of days and i had failed to check the logs, after pulling and old copy of the site from what i think it was one of the developers laptops, i was able to get the site running, old and w/o the latest db, after a while i mounted the drive and for my surprise everything was still there, lesson learned always check your current path... i always find myself just typing as fast as i can and sometimes while switching from tty's i loose track of where i am...
My worst was uninstalling libc on a linux machine. I had been going round and round trying to get the right packages and versions on the machine and ended up getting it in a bad state and for some reason got it into my head that it'd be a good idea to uninstall libc and install a newer version. Note that when you do so (at least on debian) you will not just be given a y/n prompt, you have to type in some sentence such as "Yes, I know this is a very bad idea." at the prompt before proceeding. When I'd reached that point I figured why the hell not, it'll be curious to see what the result is. The machine became pretty much unusable after that and I ended up just reinstalling from scratch, IIRC.
On Linux killall command kill processes by name (killall httpd). On Solaris it kill all active processes.
As a young programmer on my first real programming job, an online stock broker in 1999, I did this. I had been running Linux since -95 and was familiar with Solaris from college. I had no idea about this though. I was so ashamed, but I didn't face any dire consequences.
I will never forget my lesson (then again, I will probably not be managing Solaris anymore).
I have also done variations on "ifconfig eth1 down" (or messing around with iptables) on a remote computer.
I'll cop to the same - though thankfully it was my workstation and not a production server (and more than a decade ago).
I would imagine just about everyone from the same era who was newly exposed to administering both sunos/solaris and linux probably did the same thing. (the longbeards who already new Sunos/solaris would already know better)
But seriously.. what a stupid command. What was the practical value of "killall" on solaris? seriously?
I've been burnt the first time I used 'rsync'. It seems natural that when you are in a computer and you do "something another_computer" you are in the 'client' and the 'another_computer' is the 'server', like when you check out an svn or whatever code locally from a server. rsync has this backwards from this convention, I added the -delete option to 'merge' files (so it didn't copy the ones I already had locally) and I ended up deleting files in the 'server'.
Heck, I've used rcs in the past five years because the server didn't even have cvs installed. It's not exactly like you need advanced branching/merging, atomic commits, and a distributed system for a one line change to /etc/hosts.
One of my worst errors was trying to find a function definition while I was coding on a c++ project. I managed to execute 'grep operator> *'. First I found it strange that the grep operation didn't return anything. Then I started opening my files, and they were suddenly all empty. It took a few minutes before I realized exactly what I had done... (Of course, I hadn't committed anything the previous 5 hours.)
I did almost the same thing, except someone created a symlink named "" (just an asterisk). I "ln -l"'d the current directory, saw the weird broken symlink, and without thinking did "rm ". The slow motion double take immediately after hitting enter was like a shot from a movie. "NOOOOOO!".
You missed the part where they said they were on a windows box trying to copy a file to their home directory. Instead of copying the file to the home directory, it made a file called ~.
It also means I miss out on the embarrassment of having my Solaris box blare a train whistle over the onboard speaker for the entire department to hear, regardless of what my volume settings are, or whether I have headphones plugged in...
On the systems I used to maintain, every single configuration file was generated by a script. If you make a mistake, you just re-edit and re-run the script.
Actually, it is there (Typing UNIX Commands on Wrong Box). If anything, it's a classic lesson to log out of the root shell as soon as you're done with the task that required it. Tellingly, that lesson is missing from that list.
I ended up restoring from backup.