Hacker News new | past | comments | ask | show | jobs | submit login
Recovering deleted files using grep (atomicobject.com)
168 points by atomicobject on Aug 19, 2010 | hide | past | favorite | 47 comments



Make sure your output file is on a different filesystem! Otherwise, it might be saved in the newly-freed blocks of the file you're trying to recover.


Better yet, mount the hard drive read-only (on a different computer if necessary).


Similar option: grab a Helix CD and boot from that (and use a USB key etc. to copy files to)


The author's intent is to write enough of the surrounding context that you recover it first time.

Still, it raises two questions:

What about fragmentation?

Why don't have a GNU safe-rm yet that moves files to the (freedesktop.org specified) trash location to avoid this?


Because you told it to remove the file instead of moving the file? I don't see why we need a safe-rm on the command line.

File managers already implement a trash function as per the freedesktop.org spec: http://www.ramendik.ru/docs/trashspec.html


> I don't see why we need a safe-rm on the command line.

I think this is hilarious. :-) Throughout Unix/Linux/BSD history, there is a steady series of essays, lamentations, wails, and gnashings-of-teeth regarding the recovery, attempted recovery, or irretrievable loss of really important data that got somehow mistakenly rm'd by some admin.

...and, every single time, someone says, "Shouldn't this be made safer?", and every single time someone else says, "Nope, rm is doing exactly what it's supposed to! Just be more careful!"

As if the huge volume of arcane commands and various scripting languages disguised as configuration files weren't proof enough that the mass of Unix/Linux/BSD admins and developers all share a common streak of masochism, we also seem hell-bent on ensuring that we have tools which can -- and eventually will -- bite us in the ass.

For my part, I think that having some form of undelete option standard in every file system is as obvious as keeping backups.


The problem is not Unix as much as it is the work habits that Unix users have developed. The rm command is hard core, and yet everyone (including me) uses it regularly. It would be much smarter to create a command named "trash" or "del" or whatever to instead move files to a trash folder. Then "empty-trash" could actually use rm.

Alternatively, just slow down a little bit before using rm, especially when operating as root. Understand that it's (intended to be) permanent. Use echo first when using rm with a splat in order to ensure you're actually deleting what you expect to delete.

The question, "Shouldn't this be made safer?" is irrelevant. At some level, you have to have an rm command. If users decide to use it regularly, then it's up to them to "Just be more careful!" The smarter thing would be to create a workflow that doesn't rely on using rm at all. Why whine and complain (not you, I mean users in general) about an operation that can be easily changed?


I never bother with "safe" deletion even on the desktop (Windows 7); I don't even have deletion prompts. For the once or twice a year that something gets mistakenly deleted - or more likely, overwritten - restore from backup does nicely.

Assuming you have backups, of course. Which you'd be insane not to.


If the file was on an ext3 filesystem, you can use ext3grep, written by Carlo Wood (http://www.xs4all.nl/~carlo17/howto/undelete_ext3.html)

(Grepping your hard drive for file fragments is suggested in the ext3 FAQ - http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html)


> To help prevent this problem from happening in the first place, many people elect to alias the rm command to a script which will move files to a temporary location, like a trash bin, instead of actually deleting them.

Whatever happened to backups?

To help prevent this problem...

KEEP A BACKUP.


The kinds of files I most often regret rm-ing are the temporary files I have created myself as a step in a process, then deleted after I had moved onto the next step, not realizing an error had crept into the processor and that I would have to run it again on the source files (which are now, conveniently, gone.) Backups don't solve this problem, because the files themselves are never more than an hour old. A "trash" folder, however, fixes this perfectly: the semantic is that the file no longer has any place it "belongs," and may be purged if you successfully complete the project, but may be needed again if the project must be "rewound" to that step.

However, you're right that making rm(1) express move semantics isn't the right solution. Maybe if the filesystem had a "BEGIN TRANSACTION" command that you could ROLLBACK...


Storage is cheap.... Why remove intermediate files at all? If you don't want to permanently remove files, then why use rm? Just create a "del" command that moves deleted files to a trash folder. You can then make part of your backup routine be to empty the trash after performing the backup (since that file would now be available in the backups).


You're right—it's more of a "these files are in the way, and I'm sure I'm done with them... so it shouldn't hurt to just type those two little letters and reclaim the storage..."

Actually, that sounds like exactly the cognitive dissonance people had when they first started using Gmail. Perhaps filesystems need an "Archive" folder as well? Not even a Trash folder—because people want to empty a Trash folder—but rather just an enforced (and shell-supported) directory where things go when you don't have any reason to keep them, and therefore have no place to put them?


Btrfs to the rescue!


TxNTFS!


My backups don't run on a minute-to-minute basis (dunno about yours), so it's totally plausible that I can spend all day working on a particular file and then mistakenly nuke it somehow, and it won't be retrievable by the most backup standards.


I bought a TimeCapsule and pointed TimeMachine at it on my Mac, so I have hourly backups. Losing an hours worth of work is annoying, but considerably less annoying than losing a days worth of work.


Been there, done that. I rm -rf'd a bunch of important files once, and at the time grep was giving me "memory exhausted" errors. I was able to use strings to grab all of the text of the disk, and then wade through the results with vim.

I guess this is a pretty common problem. The blog post I wrote about it in 2005 continues to be the most searched-for entry point on my site: http://csummers.com/2005/12/20/undelete-text-files-on-linux-...


    cat /dev/mem | strings | grep -i llama


Hmm... I'm getting an error on that one.

    cat: /dev/mem: Operation not permitted
Edit: even as root


If I recall correctly, that's a bug that is preset on a particular kernel from 6-9 months ago.


It's not a bug:

x86: introduce /dev/mem restrictions with a config option http://lwn.net/Articles/267427/ "This patch introduces a restriction on /dev/mem: Only non-memory can be read or written unless the newly introduced config option is set."

Command-line access to /dev/mem in Ubuntu http://superuser.com/questions/39583/command-line-access-to-...


Oh cool. Thanks.


Sad :(

I was looking forward to catting for llamas.


Huh. Good to know.


My memory is full of llamas!


Where the author says conservative, he means liberal.

(From afar, I understand my Colonial cousins' struggle with these two words.)


Well, I appreciated your joke, anyway.


I've been using this method since i first learned about raw disk access (dev files) and grep.

I think it should be mentioned that this will work properly only if the file was not fragmented - Which will usually be the case in EXT3 unless you are using almost all of the space in the drive, but may happen frequently if you are using a FAT file system (which is used a lot in USB disks).

Also, If you just deleted a binary file this method will be problematic as well, and in that case you can use a tool like photorec to scan the disk and even limit it only to the free space on the drive - which reduce the time it takes to go over a disk and can detect all kinds of binary file types (uses the magic number of the file to detect the type).

Like other people mentioned here before, you should recover all the data to a different partition/disk than the one you are trying to recover a file from.

With that said - recovering data is a tedious and error prone process, so if the data is worth enough(and for some silly reason you don't have a backup) you should:

A. turn off the computer immediately after you've discovered the loss of data (to reduce the chances of overwriting anything important)

B.Give the computer/disk to a professional to recover (because you obviously aren't one since you don't keep backups)


Fortunately point A on Linux can be substituted with mount -o remount,ro /


Or, if you want to really delete a file, use #shred filename command

#man shred SHRED(1) User Commands SHRED(1)

NAME shred - overwrite a file to hide its contents, and optionally delete it

I especially like the -n option!


Except that shred is not guaranteed to work on many (most?) modern filesystems. From `man shred`:

       CAUTION: Note that shred relies on a very  important  assumption:  that
       the  file system overwrites data in place.  This is the traditional way
       to do things, but many modern file system designs do not  satisfy  this
       assumption.   The following are examples of file systems on which shred
       is not effective, or is not guaranteed to be effective in all file sys‐
       tem modes:

       * log-structured or journaled file systems, such as those supplied with
       AIX and Solaris (and JFS, ReiserFS, XFS, Ext3, etc.)

       * file systems that write redundant data and  carry  on  even  if  some
       writes fail, such as RAID-based file systems

       *  file  systems  that  make snapshots, such as Network Appliance's NFS
       server

       * file systems that cache in temporary locations, such as NFS version 3
       clients


It works fine on default EXT3. The only thing journaled is meta-data. You snipped that part out. More from man shred

In the case of ext3 file systems, the above disclaimer applies (and shred is thus of limited effectiveness) only in data=journal mode, which journals file data in addition to just metadata.

In both the data=ordered (default) and data=writeback modes, shred works as usual.


Via `reiserfsck --rebuild-tree`, you can also do that for ReiserFS partitions. Have worked very reliable for me. Only problem is that it doesn't always recover the filename and/or the directory structure (depending on how long it is ago that you have deleted it).


Just don't do this if you've at some stage backed up another reiserfs filesystem inside your reiserfs filesystem with 'dd'.

the rebuild tree trick mistakenly sees entries in the dd'd copy as being files in the parent file system, and then sprays them all over your drive.


Better than burying them off in the woods somewhere.


One of my most mememorable cluster fucks was recovering a database using strings on the disk. The customer ran repair table and ended up with a very small table :) . It was tedious but felt awesome actually getting a large part of the data back.


I've also used this technique. I even wrote a script with progress bar to do it, which is linked to at the end of:

http://www.pixelbeat.org/docs/disk_grep.html


I used this method once...the file created gets pretty huge but you can even manually sift through it for lost code if you know roughly where it ended up!


Excellent Linux hack. I hadn't ever heard this before.


It works on all systems where you have raw access to the disk. And it isn't really that fancy if you think about how it works and how file systems work.


Yup, it's not very fancy, but in the end, serves its purpose and can really save your work.

the last part, about using an alias for rm is something that I've never thought about it and now I'm gonna use always on my servers.


Actually, I think that the real great hack here is to alias the rm command to a trashbin script (as suggested at the end of the article)


The danger of aliasing the command itself (the bare 'rm') is that you come to count on the safety of the alias. Then you work one day on a friend's or coworker's machine and...BOOM.

What I do instead is make a nearby (and simple) alias. For example:

    rmi='rm -i'


This can be achieved using trash-cli


Clever stuff, thanks for sharing.


frequent automatic backups and version control are your friend




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: