"While I know of some really good cloud providers, such as rsync.net and Tarsnap, I recommend that you never trust cloud providers blindly."
This is very good advice.
That being said, humans need heuristics and shortcuts to aid in decision-making. I hope the fact that rsync.net has been doing this work since 2001 is helpful in that regard.
...
"There exist some really cool open source backup solutions such as Borg, Restic and duplicity, but you should never rely solely on these "complex" solutions. These tools work really great, until they don't! In the past I have lost data to duplicity and other tools."
I think this is very good advice as well - and that is coming from someone who has whole-heartedly endorsed 'borg' as a backup tool and regularly recommends it. It is the "holy grail of backups"[1] after all ...
It's also a magic black box for most users and would be difficult to work out failures.
The safest and (in my opinion) most useful workflow is to back up your data locally to some kind of NAS or fileserver using plain old rsync and then back up that fileserver to rsync.net (or whomever) using the fancy borg tool.
Now you have quick and simple local restores but still have a backup in the cloud that requires zero trust in the provider.
> That being said, humans need heuristics and shortcuts to aid in decision-making. I hope the fact that rsync.net has been doing this work since 2001 is helpful in that regard.
This is implied by the 'blindly' part. Searching "cloud storage provider", seeing rsync.net listed and picking it with a thrown dart would be blind. A quick search to see that it's been around for a while and doesn't have any crazy horror stories attached is part of becoming informed.
In case you didn't notice or realize or recognize, GP is the founder of rsync.net (I'm not saying this to assign any ulterior motive for that comment).
> Everything can look really nice "on paper" but you don't know what goes on behind the scenes. I have worked with a lot of different people and I have seen too much crazy shit to fully trust anyone with my important data. A cloud provider may have the best of intentions, but sometimes all it takes is a single grumpy employee or even a minor mistake to do a lot of damage.
OneDrive and Google Drive are both pretty cheap. Is there anything wrong with keeping a backup of your important data in one of them? At a certain point you have to live your life and take a chance. Sure, I never made it to Italy, but I had a 100% safe backup system for my files said noone ever.
> Free Git hosting such as GitHub, GitLab and others can also be utilized for data that you don't mind storing in public. GitLab and other providers does provide free private repositories, just don't rely fully on that.
At this point, it's clear the author is looking for arguments to make. Of course you're not going to dump all your stuff into a GitLab repo in the cloud. You're going to clone it on multiple machines! My important work stuff is under version control and cloned on multiple machines in multiple locations. If that's not good enough, I'll live with the consequences.
Time to time I read about people who are randomly banned/locked account by google. I had 1.5TB of my precious memory of my daughter all the way back when she was born,
But I keep a local backup on my old mac, basically a bunch of external hard-drive but it's a pain in the ass to manage thse.
The odds of you losing the master data at the same time as being locked out of your account are astronomically low, and if they don't coincide, then you're not in any trouble (e.g. if you get locked out of your account, then you immediately make another backup e.g. on a USB, and if your main copy has something happen to it, you immediately restore).
I definitely agree that getting locked out of your Google account is a massive risk in general (as well as morally bankrupt on their part), but I don't think it's a problem for the specific case of backups).
I trust major cloud providers like Google or Microsoft to protect my data far more than I'd ever trust a bunch of retail hardware I plugged together and configured myself.
They have entire gigantic teams of employees dedicated to security and privacy and protection from threats. I couldn't replicate that even if I wanted to.
If someone wants to steal and leak your sensitive data, they'll have a much easier time getting into your home hardware (whether over a network or physically or both) than they will getting it out of Google Drive, provided you have 2FA and good passwords you keep memorized.
Yes, at home storage is more at risk for a targeted attack. However, cloud storage is more at risk for a general attack.
It's dead simple for a waiter to steal your CC#. Yet that's likely not going to happen as they'd lose their job and run major risks at getting caught by the police.
On the flip side, a big company like target, even though they have a wealth of experts hired to prevent it, has lost millions of CC#s. That's because they are a nice juicy target.
It really just comes down to which risks are more or less likely for you the individual.
I trust my cluster of retail hardware because the effort for a hacker to pull data from it is a lot higher than the actual value of the data stored there.
I'd say home storage is more at risk even for a general attack if you're using retail hardware with default configurations, like a NAS -- it's easy to scan the entire internet for vulnerabilities. Plus it tends to be vulnerable even to non-traditional attacks like Bitcoin ransoms. Whereas cloud storage is custom and patched and up-to-date and monitored... your consumer hardware mostly isn't any of those.
And there's a huge distinction between retail corporations leaking CC#'s, vs cloud storage providers. Securing credit card numbers is not Target's core competency. While securing personal and corporate data is a core competency for major established cloud providers.
Nah, those general attacks require 2 steps. Getting past the router and finding the NAS.
Sure, you could pull off both exploits, but it's not really likely.
The most vulnerable to those attacks likely aren't operating NAS's in the first place. Very few people are (which decreases the likelihood of attack). I'd imagine most of us on HN are regularly patching all of our home hardware. That makes us far less likely to be susceptible to those sorts of general attacks.
If an attacker is looking for something juicy and general, they are far more likely to try and pull off a general attack against someone's laptop or phone. That's where the pool of users is much more broad.
They don't give a shit about your data though. There are lots of horror stories of people losing all of their photo's of the past decade(s) because Google decided to block their account with place to get in touch with them and now way to restore anything.
Tech companies aren't your friend, they are faceless corporations with nice services. When they fail you though, they fail you big time, unless you have solid back-ups.
> They have entire gigantic teams of employees dedicated to security and privacy and protection from threats. I couldn't replicate that even if I wanted to.
That maybe so, but they won't accept any liability for losing your data either. In the end, we're left to fend for ourselves.
How common an occurence is that? How often is an unimportant, middle class person's data at risk, really? Enough that you'd want to spin up your ZFS storage?
Hypothetically, let's say I had my entire life on Google. I have a unique password for it, backed up by 2FA, without the SMS/Authenticator fallback. What's the long term consequence? Google knows everything about me? They already do anyway. Someone can steal my printout of the backup codes?
I don't ask this to stir shit. I genuinely have these sorts of discussions with friends and family when I try to tell them that privacy is important, and I fail absolutely at convincing them of it.
Everyone has their own opinions on this and their own threat vectors for their own personal situation. The following is my opinion based on my own situation which I believe to apply to the average person:
I think it is safe to assume that Google does some sort of data mining on the data you upload. If that bothers you, self hosting everything isn't your only option -- you can also encrypt everything before uploading to your Drive. Duplicity is one such example that I use.
Despite this, I still don't rely on Google Drive, not for privacy reasons but because of Google's history of disabling people's access. If your Google account is banned at no fault of your own, there is a possibility you could lose all access to those files. Even if you did nothing wrong, you will never in a million years get a human to review your case.
I have had this happen to me, but thankfully it didn't affect anything other than Google Pay. I used it twice for a family member to reimburse me grocery money and Google decided that they were ceasing to do business with me anymore, they would mail me a check, and they told me to not contact them again.
So, everything I have on Google Drive is synchronized to another paid storage service (mostly photos since I don't believe Google Photos has a very good open source self hosted alternative).
Google can simply decide to revoke your account and delete your data. I've seen a number of first-person accounts online of people who rubbed some tech giant the wrong way (or were just suspected of doing so, or were characterized as such by some ML model) and only afterward realized how much they stood to lose.
> How common an occurence is that? How often is an unimportant, middle class person's data at risk, really?
My online accounts have been compromised 16 times in the past 5 years, according to https://haveibeenpwned.com/ including sites like Android Forums and Linux Mint Forums. There are plenty of other better known platforms on there, too, so it's safe to assume that most of the data on said sites would have also been accessible to the attackers.
In contrast, my current self-hosted software accounts have been compromised 0 times in the past 4 years. Maybe 1 time, if you count a throwaway node's Docker socket being exposed to the network accidentally and a crypto miner getting launched on it.
Why is that? Because although many of the online platforms have dedicated security specialists (hopefully) and manage to fight off thousands (or more) attacks daily, all it takes is one good attack to compromise thousands (or more) users and their data in one large batch. Furthermore, those are far more of an interesting target to attackers, possibly due to financial incentives.
Unless easily automatable (like the aforementioned Docker crypto attack), attacking self-hosted software is far less lucrative. It would probably be far easier to hack John Doe's Nextcloud or ownCloud instance, yet the financial gain from that would likely be far lower than stealing a bunch of different users' data on a lesser known and less secure cloud platform of some sort, and selling it or doing something else.
To that end, i see two strategies for protecting one's data:
A) make your defenses good enough to be able to stand up to targeted attacks, which is truly feasible in large orgs and cloud platforms
B) make yourself a less lucrative target, by self-hosting some software and making hacking you sufficiently hard, so that most automated attacks will fail (use key pairs for SSH, use fail2ban, SSL/TLS with something like Let's Encrypt, use Docker Networks if you need Docker so that nothing apart from 80/443 of your ingress is actually exposed to the outside / or just use your firewall for the services that are not containerized, though then you also need to think more about user permissions etc.)
Oh, and use 2FA where possible (especially in regards to the online services) and use something like https://keepass.info/ for managing passwords - to have them be sufficiently long and different for every site or platform that you use.
> My online accounts have been compromised 16 times in the past 5 years, according to https://haveibeenpwned.com/ including sites like Android Forums and Linux Mint Forums.
That's not the same as OneDrive - the entire budget of those organizations is probably a rounding error compared with Microsoft or Google's security spending. It was once reported that Microsoft spends over $1 billion a year on security.[1]
If my servers start mining crypto, I've been pwned by script kiddies. If my data becomes available online and is thus available on the previously mentioned site, I've been pwned by more sophisticated attackers. Whereas if we're thinking more along the lines of NSA or Mossad, they are already in my systems and I just have to hope they're in a good mood.
On that note, none of my self hosted mail server related accounts seem to have been leaked so far, or at least haven't been made publically available.
Apart from that, one can also set up alerts for every SSH login, should fail2ban fail for some reason. On app level it becomes harder, to the point where it's often not worth the effort to introduce alerting. Maybe just blanket ban IP ranges that you don't expect to use at ingress level.
Data stored in Google Drive etc should be encrypted and split into 50 MB chunks or something like that to hide metadata and mitigate the risk of leaks. Better backup tools have been offering this for a long time.
Darn, I was hoping this would be an article about organizing one's files. I was really in the mood for reading about that, then spending the rest of the morning reworking my own system.
Same. I get lost in researching everyone else's file organization and other workflow methods. I always feel like mine is a dog's breakfast, scattered across many incongruent, un-coordinated drives, clouds, shares, etc.
Something tricky is how to bootstrap restoration of backup. If you have lost "everything", how do you get it back?
For example, if you use borg to backup remotely via ssh, you will need ssh keys as well as a passphrase for the encrypted backup. Where do you store those to make sure you have them if your computer is gone? What I did was create a self-extracting restoration script, which embedded everything needed. This is also encrypted, and synced to many places. The idea is, as long as I have the passphrase for that, it takes care of the rest.
I keep the most important pieces on printout in a firesafe lockbox. I don’t have any eventuality for if the firesafe goes at the same time as online failsafes, but I feel like the best approach for those kinds of situations is nihilism.
Check with your bank. If you have enough accounts, or the right kind of accounts, they may give you one for free.
It's usually listed included in the long list of benefits that almost nobody ever reads.
Each year I get a bill from a major American bank in the amount of $0 for a safe deposit box I don't use because I have a certain amount of money in one account.
It's a good discussion. I've taken a different path but with the same ideals.
1. All my live, working data is in Dropbox. I've been using my (paid) Dropbox account for over a decade. This step isn't really intended to be a backup per se, but you get the backup behavior "for free" because this is how I keep my secondary machine in sync with my primary. My Dropbox folder is also replicated to a 3rd computer in my home as another level of redundancy.
2. Everything in my home dir, especially photography, is ALSO covered by Backblaze. Having some backups elsewhere is mandatory; not enough people really understand how important this is. Should my house burn down, I still have data.
3. My primary system is a Mac, so I use Time Machine. TM is the only backup on this list I've ever actually used as a backup. When our home was robbed a number of years ago in a quickie smash-window-and-grab affair, they got my laptop. I went out and bought a new computer, plugged it into the TM drive, and in an hour or so I was right back where I left off. Hard to beat that. Even my app windows were in the same place.
4. Periodically, I take a full clone of my main machine's drive using a drive imaging tool (Mac specific; I use SuperDuper) and **store that drive at a friend's house**. This probably only happens a few times a year a this point. I should do it more often.
That tertiary computer I mentioned in step 1 is also the home NAS server / home media server. It holds the photo archive in a large outboard disk. Backblaze covers that disk, and Time Machine on that computer keeps the outboard disk backed up as well. This data is mostly static, so the images I've taken of it and stored elsewhere don't need to be updated all that often (ie, just when I migrate prior year photo data to that drive).
I put everything in a Resilio Sync folder, and keep a full sync on at least two devices (a home NAS and a cloud seedbox). Resilio Sync handles pretty much everything. You instantly get hot backup, have files immediately available on every device you have, and if you have a phone you can download any file on demand, etc. Unlike other file synchronization methods such as `rsync --delete`, it keeps a version for every file modification and moves the file to Archive when it is deleted, so you can't lose data. Also, you get encryption without the headache by using the "encrypted folder key". This syncs an encrypted copy so other devices can sync from that device.
I use resilio sync together with a NAS (which is still large enough to hold all my files, I wonder how long that'll stay given how much 6k BRAW footage I'm shooting), and I also sync to a Google Cloud Storage bucket from my NAS (moving items to Archive storage class if they haven't been touched in a while, significantly saving cost).
> Not only does encryption during data recovery make everything much more difficult, but should you pass away, your family members might not have the skills required to access the data.
Terrible advice regarding encryption. Really if you have important data that your family needs at the time of your death you should have a plan for that as well. Not avoid encryption because it’s “too hard”.
Exactly. Encrypt with a simple utility, and write down the password and store in a safe place and let someone know. In my case, I have a friend who is very technical and I can trust to assist my spouse (who is not technical) with some items if need be. Part of my hand-written instructions are his phone number/email, and to call him if there are issues decrypting or restoring data.
> While you might consider doing a full encryption for both your personal laptop and/or desktop, in case one of these gets stolen, you should avoid encryption on backup and storage when it really isn't needed because encryption adds yet another layer of complexity.
Wrong. Not encrypting your backups when your daily use systems are encrypted makes no sense. Seriously. Why go through the hassle of figuring out a password, when one can steal the unencrypted off site backup?
You want simple encryption? Me too! Use dm-crypt/LUKS. It's been in every modern Linux distro for at least 10 years. If you can plug your encrypted external drive into a live/freshly installed Linux desktop and a encryption prompt comes up, you're in luck!
I agree with your general sentiment, but I'll provide one redeeming counterpoint anyway: your laptop (that you carry around to places) is at higher risk of theft than an external hard drive that you keep in a storage locker.
I use borg backup to sync all our files (from ~5 machines) to an old PC which synchronizes everything to the cloud. Borg has deduplication, compression, encryption, and saves several versions of the files which I think is crucial. Also all text-based data are on Git like software, papers, etc.
Good point. I am indeed synchronizing the borg file. I have however my script run borg check before it synchronizes to the cloud. It would be better to synchronize the source files but the encryption and deduplication are what's keeping me from doing that.
I'm a digital nomad, and I decided I wanted to store all of my data as securely as possible without relying on Google/Amazon/Microsoft/Apple. Here's my setup, in case anyone finds it useful.
I store all of my data in Nextcloud hosted on a VPS. It's virtually impossible to guarantee the security of data on a running VPS, so my sensitive data is also encrypted with Cryptomator, so I don't have to trust my VPS host (but I also chose my VPS host carefully with data privacy and security as my main criteria). My host makes daily backups of the VPS off-site, and I also backup my Nextcloud data directory daily to Tardigrade/Storj DCS via their s3 API using Restic. An advantage of Storj DCS is that it's geographically distributed, so you're insulated from natural disasters. I also sync all of my Nextcloud data to both of my laptops, and use Restic to make another backup snapshot on an external USB SSD once a month. My Docker configs for Nextcloud are stored on GitHub, but I wouldn't have trouble recreating them if I somehow lost them.
For my own personal risk model, I think my sensitive data is pretty well protected from third parties, and I think all of my data is reliably backed up accounting for multiple types of failure. My biggest vulnerability is probably my password manager, where all of my encryption keys and passwords are stored.
If anyone has any suggestions on how I can improve my setup, or any potential problems you see, I'd love it if you share!
Wondering something, if you are using cryptomator and a VPS - so you don't own your data and use a higher level encryption - why not using the cloud storage services (i.e Dropbox, icloud etc.) ?
There are two reasons I don't use Dropbox, iCloud, etc.
First, I want to be able to use open source, audited clients that allow me to access my data in a transparent and flexible way on all of my devices. I don't want to use a closed source sync client that doesn't give me full control over how my data is synced.
Second, and more important, if you don't control your encryption keys, you don't control your data.
Storing my data on a VPS doesn't mean that I don't own it. I still have full control over it, and if the VPS disappears, I still have redundant copies.
I put my 1Password vault on Dropbox specifically because I can install both apps via iOS store on a new phone and bootstrap my setup with just 2 passwords.
iCloud is actually too hard to recover in case of hardware loss, I don't recommend it to anyone.
I feel like I have a fairly solid backup system for all of my important files. They are protected against elementary disasters and cyber attacks alike.
However the only thing I struggle with are my phone photos. I recently caved in and started using iCloud Photos due to the convenience of having all my phone photos alwas available, searchable and tagged, even if the library size exceeds my phone‘s capacity.
Does anyone know a reliable and automatable way to back up this iCloud photo library on a self hosted server?
I'm also interested in this. I have iCloud for Windows installed on a VM with lots of attached storage, but it doesn't seem to persistently/reliably download photos unattended.
I suppose another way to do it would be with a Mac and then periodically backing up the local Photo Library, but that still leaves the photos tied up in Apple's proprietary library format. Plus you need a spare Mac just laying around and always on.
I spent extensive time evaluating this problem, and tried a whole bunch of different things:
Nextcloud, Resilio, iCloud, etc etc etc.
Honestly? Just using Google Photos is probably the best bet for most folks, whether you’re on iOS or something else. I personally picked OneDrive for my use case since it is both more performant for large data sync and 6 TB + Office 365 for a family plan beat out the alternatives on pricing.
Eh... I'd definitely do a Google Takeout to verify everything's OK before you delete your originals.
You may think the downsampled image is sufficient for lossy backups, but older images stored in "high quality" in my personal GP library were almost wholly stripped of metadata, including when the photo was taken.
(Many of my users have suffered from this as well, which is why I built tag value inference into PhotoStructure to try to help spackle over these metadata holes).
That’s good advice for those that care, and I should have mentioned it, especially for HN crowd. I definitely noticed this problem when I did Google Takeout on my photos dating back to the Picasa days.
I still think that for most people (and on HN, where we care more about fighting data entropy, there might be significantly lower overlap with the majority), Google Photos is the best option.
I do this with Syncthing (not just photos, but all phone files), a very customizeable file sync tool, it can do trash bin and versioning similar to cloud storage providers. As I heard sadly it doesn't work so well in iOS due to its restrictions though.
I'm in the self-hosted camp as well with two Synology NAS boxes in two different countries that all of the family's computers synchronize to using Syncthing. Each of the boxes runs a local backup to an external drive nightly and one of them also runs a backup of a subset of (really important) folders to Backblaze. This uses Synology's proprietary backup tool (HyperBackup) which I intend to replace with an open source solution (most likely rclone but I'd be interested in suggestions). As an additional measure I rotate the external drives in one location weekly. So far it proved quite reliable when switching machines (in combination with git-managed dotfiles and stow) and accessing data on demand. I also make a full image of my laptop on an external drive more or less once a month to enable quick restore in case my OS gets hosed. One problem I still only semi-solved is synchronizing iPhones. Right now we just synchronize photos using Synology's DS File and I do an iTunes backup using a Windows VM which is clunky. High hopes for libimobiledevice here but I've had no time to properly research it yet.
All of the above requires some work but it's fun more than a nuisance. Probably not that great of a solution when I'm no longer around though.
For the past few years, I've created a photo book of the best family pictures from the past year. I store these photobooks in a fireproof safe. I think that's the easiest way to pass down pictures since it's paper doesn't require decryption or passwords.
This summer I forgot the PIN of my iPad, because I didn’t use it during all summer. And it’s only 6 digits. I had, however, made note of the PIN elsewhere so I was able to unlock it. I have similarly forgotten PINs and passwords in the past that I didn’t use for a while. I recommend unlocking your safe at a regular interval, so that you remember the combination, if it uses a code lock. And even if it’s a key lock it’s probably a good idea to do it too, so you know where the key is.
>Not only does encryption during data recovery make everything much more difficult, but should you pass away, your family members might not have the skills required to access the data
For your browser history, sure. But I for one make sure I rsync stuff to a Raspberry Pi4 with an external disk, at my parents place every now and then. In the event of my demise they can unplug it and have all our family pictures and videos. I'm really afraid that I make things to complicated for the people that I leave behind.
I dabble in "life storage" and your comment made me think that some sort of executable shipped alongside backup locations to read the data, if in some deduplicated backup form, seems valuable.
Eg Camlistore/Perkeep had the premise of using JSON to store data. However some random person isn't going to write code to parse all that data, pull files out, etc. A lifeboat .exe might be interesting.
Though doing it in the most simple, least configurable, least breakable way seems.. necessary. Yea some baked in UI would be cool, but more moving parts means less likely to work.
I have the same setup for the same reason as you. Only difference, I have my setup at home. As you say, in the event of my demise, I want my relatives to be able to grab the drive and have easy access to the pictures and videos.
I do pretty much what he does, but use rsync - I rsync up my laptop twice daily to a ZFS fileserver, that fileserver is backed up to the cloud and rsync'ed to a networked hard drive located on the opposite side of the house from the fileserver. Every once in a while I swap out that hard drive with another one, so I have 2 copies of most of the data, with the most up to date data in the cloud.
I used to use rsync to make incremental backups, but now I count on the cloud backup to keep versions.
I use encfs to encrypt the backups (despite its security weaknesses, it's "good enough" to keep my data safe if a hard drive is lost/stolen).
I don't worry about this:
Not only does encryption during data recovery make everything much more difficult, but should you pass away, your family members might not have the skills required to access the data.
If I die, the only thing my family will care about is the pictures that are on a google photo album, no one is going to care about the RAW source images on the hard drive, no one is going to want to look at the mail archive from my job 10 years ago. My wife is certainly not going to maintain a DLNA server to watch videos that I ripped from disks, she's just going to load the original blu-ray/DVD. And not even I ever play any of the CD's that I spent hours ripping to MP3's - I just stream music from Spotify.
Back up everything of yours at least once daily. Use a cronjob so that it won't matter if you can remember or not. I don't trust the 'Cloud' after several disasters involving DropBox when travelling out of the country.
I treat my data like I treat my gold: "If it isn't under your own personal control, you don't own it at all".
Store your files such that you are not backing up unnecessary stuff - such as the system itself because that's what system re-installs are for.(Apart from your various config files which are included in the daily backups)
I divide my stuff into four main categories: config files and installed-package lists, databases, personal stuff, and "write once, keep forever" archives.
The config-files and installed-package lists allow you to reinstall your system itself from scratch in less than an hour, in most cases.
Databases are self explanatory. Dump the databases at least once daily, reload it when your system crashes.
Personal /home directory stuff is restricted to personal documents and stuff that I am currently programming. That is it. Nothing else. What else would you need? That small amount is generally held down to less than 10 gigs.
For all else, (ebooks, videos, photos, music, etc, etc, etc) there is the archive directory. That is stuff that needs to be kept, but isn't actually private and personal.
The first three categories are included in my main daily backup as a full snapshot.
The last category, the archive directory, is backed up as rsync changes. This goes to several separate backup media. On many days, there will be no changes at all and the rsync backup takes less than a couple of minutes.
Thumb drives.
Financial stuff, copies of legal documents, large libraries of PDF books, reports, etc., etc.
And then back the main drive up to FOUR thumb drive backups.
Reasons:
(1) USB drives are huge now. (2) Convenient - can move from one PC to another easily. (3) Convenient - I do NOT have time to mess with other solutions. (4) Safety - valuable financial stuff almost always OFFLINE reducing hacker vulnerability. (5) All PCs eventually fail - I'm old and have a collection in my basement - plus I have two fairly new ones awaiting repair in my home office right now.
Only disadvantage - I do have multiple legal copies of Microsoft Office on multiple PC to make moving thumb drives around practical.
I'll add - drives from different brands. I had a mirrored RAID 1 fail at one point (in the '00s, spinning magnetic hard drives), two identical drives, and they failed within days of each other. One drive died, I ordered a replacement, and before the replacement arrived the second drive died. The shock and disbelief I felt of my foolproof RAID solution going belly-up on me! And I still can't believe I didn't turn off the machine while I waited for the replacement to arrive.
Make sure you periodically refresh your backups that are stored on flash memory devices. Depending on the media, cell voltage drift can cause data corruption that overwhelms the built-in ECC in as little as 5-7 years.
Yes they do. I used pen drives for backups after I found one that was like 10 years old and still had my old cv on it. I stored my backups on two identical pen drives I stopped doing this when I went to backup and discovered one of the pen drives was corrupt and lost all its data.
They are super cheap and fairly reliable but not quite reliable enough for backups.
That is my understanding. I've heard numbers around 10yr +-5, but I've never seen a study on it. I wonder if storing them zip-locked in a freezer would help (but I guess you could just use a magnetic hardrive :D )
That the number I've personally experienced (I also replied to GP).
It's an interesting idea to deep-freeze your drives, but the thermal stress from modern frost-free freezers that automatically dethaw periodically may negate the benefit of chilly bits...
If you don't have that much to store, and are worried about data durability, I'd suggest looking into M-disc instead.
> These tools work really great, until they don't!
Huh? Whats the problem? How is ZFS any different because its lower level?
> ZFS without ECC memory is no worse than any other file system without ECC memory.
You really do need ECC memory unless you are ok with your pool becoming corrupt every 6 months or so. Im just saying this from my experience of running a ZFS server without ECC, the data wasn't critical so I left it like that for 3 years before replacing the memory with ECC now its fixed.
ZFS doesn't need ECC but it does benefit from it. If your pool was becoming corrupt every six months, look to a shitty drive controller or cabling first.
I've run ZFS for quite a few years now on laptops and lower-end machines that can't use ECC memory and I've never had corruption, unrecoverable files/pools, etc.
I stored 2tb of actively used data on zfs without ecc ram from freebsd 8 to 12. I had no noticeable corruption. I only post this because there is a weird assumption that not using ecc ram is a death sentence for your zfs data.
I use ecc ram now and I think it's the proper way to do zfs, but let's not pretend you forfeit your data by not using ecc.
I am happy that worked for you but there is a huge difference between a 24 x 4tb pool and a single 2tb drive.
2tb is not a lot of data you would need to fill the drive 4 times a day before you would get close to hitting the single bit error rate, a pool of my size is guaranteed to hit that error rate just because of the volume of data.
If we are talking about running zfs on a desktop then yea you are right you don't need ECC, but if you are running a storage server for data you want to keep then you absolutely do.
I was so tempted to jump on this with my anecdata of successfully running ZFS without ECC for a time, but that was in the context of a dirt cheap home tinker box.
If you care about data integrity enough to use ZFS, just bite the bullet and use ECC. If you care about data integrity at all, use ECC. If you don’t, what are we even doing here?
The bit about ZFS was a really flawed argument, saying it's just as complex but in an unavoidable way so that's fine for some reason:
> One could argue that ZFS is complex as well, but that is on the filesystem level, a level on which you cannot avoid complexity no matter what you do.
I also have a ZFS NAS and a separate backup server for the NAS that only gets turned on once a week. I disagree about his recommendation to avoid encryption. I keep the NAS and backup server drives encrypted. It’s not a big deal to unlock the drives at boot with a password and I’ve never had any issues with it. If someone breaks in and steals one of the servers, at least they won’t have access to years of all of my documents backed up.
This author’s setup looks good, but I think most people would get burned out by the complexity unless you really like managing servers, automating things with scripts, and navigating the little quirks that pop up with the hardware and software involved.
A complex solution isn’t necessarily better if you find yourself not investing the time needed to maintain it. I think most people would be better off trusting one of the online backup services rather than spending thousands of dollars building and maintaining a pair of backup servers. If you’re worried about losing data, signing up for two separate backup services could still be cheaper than building and running two separate NAS servers.
Start with easy backup solutions first to get something going. Then consider the more complicated options later.
I have something simpler than yours, with a small Proxmox server as my NAS (no ZFS) and backups stored to a second disk. Occassionally I will backup the contents of that machine to external drives.
In the future I plan on doing something similar to what you have, with a powerful ZFS server and my current server accessing the main server using a read-only interface as a second backup. It does take work to maintain, and I don't look forward to the day when I have a full server failure and need to scramble to replace the unit, restore from backups, and all that. Ideally I would have a pair of servers to handle that eventuality... and down the rabbit hole we go.
Ultimately, I guess this all is a tradeoff - I know many engineers who swear on their cloud services, and they simply don't have the time and experience to maintain dedicated servers. While Synology devices and FreeNAS make things simpler, that's for a best-case scenario as you need the technical knowledge to deal with the issues that will eventually pop up. Honestly the cloud is what I recommend for all but the most technical folk, with the addition of external disks off-site for the most critical files.
> most people would get burned out by the complexity unless you really like managing servers, automating things with scripts, and navigating the little quirks that pop up with the hardware and software involved.
I didn't see the author's stated OS they use for their NAS, but depending on what it is, yes. {Free, True}NAS are dead easy to use as long as you stay inside the confines of the GUI, and hugely reliable. Anything else is no man's land. Running ZoL and you did a routine kernel update? Enjoy your new read-only ZFS pool!
I have a setup very similar to the author's, with a bunch of Debian VMs on top of Proxmox. Can confirm that even with Packer and Ansible handling base images and configuration, weird edge cases pop up that take quite a while to track down.
> {Free, True}NAS are dead easy to use as long as you stay inside the confines of the GUI, and hugely reliable.
I’ve used FreeNAS (now TrueNAS). Great at first but eventually I started hit snags and bugs. I still use TrueNAS but I wouldn’t say it’s 100% reliable, especially across upgrades.
> Anything else is no man's land. Running ZoL and you did a routine kernel update? Enjoy your new read-only ZFS pool!
FWIW, I have a separate ZoL server that has been bulletproof, even with frequent upgrades. I did, however, run into a kernel issue with TrueNAS after an upgrade that caused some hardware issues about once a week. My options were either to downgrade to an older TrueNAS until they fixed it (still not fixed) or buy new hardware.
I thought the same about Free/True NAS until I lost a ton of stuff. Be careful with these. Also, for big backups the performance has been terrible. I carefully followed their guide and the write speed is just dismal.
I have a Synology NAS. All cloud files sync to it (Dropbox and Gdrive, the latter using "real" files, not just their .gdoc format), all backed up nightly to Synology C2 in the cloud, encrypted.
Photos, which are in the "absolutely irreplaceable" category: all of the above but also have Mylio running so have external drives and 2x desktops with this data on them.
Having copies in at least 2 physical locations is important, because RAID/on-site backups won't help you if there's a fire/flood/theft that destroys everything.
Back in school I used to have 2 Synology NAS, one in my dorm, one in my office, each running two drives with RAID 1. I had a gigabit link between my dorm and office (in 2008-2014!) thanks to MIT's awesome network.
Now I live in Silicon Valley where even in 2021 I can only have a shitty 15 megabit Comcrap uplink so that setup doesn't work anymore.
What I do now is a 4-bay Synology NAS.
- The first 2 bays are RAID 1, they are the "live" volume
- The 3rd bay is the "nightly backup" volume and syncs to the "live" volume every day late at night. That way if I accidentally delete something or mess up something I can look at yesterday's version at any time, very conveniently.
- The 4th bay is usually empty, but once every 2-4 months I bring a drive from a storage closet in a different town, plug it in, sync it, and put it back in storage. Car is still faster than the internet in Silicon Valley. And that also allows me to have an airgapped backup, just in case there is ever a ransomware attack on Synology that destroys all the online drives.
Good advice, thanks. I actually have an unused Synology NAS which I moved from because it was slow. I wonder if I can use HyperBackup or similar and put this one down in my garden office. That might add another layer of "distance security"...
HyperBackup does this weird nonstandard "hbk" bullshit and I don't understand it, and I'm afraid of anything brand-specific.
Between two Synology NAS I use Shared Folder Sync instead.
Within the same NAS, sadly Shared Folder Sync doesn't work (even though there's no reason it shouldn't work) but you can set up an rsync task in the Task scheduler e.g.
Being an Apple chap, I have found that Time Machine works really well. I use a Synology 5-disk NAS. It has a ton of capacity, and allows me to increase capacity, if needed.
My important stuff is in the cloud.
I have a nightly "hot backup," that is a CC clone of my main and dev drives. It used to be every 4 hours, but I found that to be overkill.
I almost never need anything more than the "hot backup."
Every good backup practitioner has an origin story. That one time they lost data to a clicking HDD, or to a scratched CD, or to a typo. The pain of never getting that data back. Galvanizing pain that catalyzed a resolution never to lose data again.
Never trust someone (including yourself) to truly look after your data unless they know this pain first hand.
I currently trust Restic with basically all of my long-term backups, which, according to the author, really isn't a thing I should do.
However, I'm still somewhat confident in my strategy as I backup my all of my data to two entirely different repositories, one of them backed by Google Cloud, and another by the server sitting in my pantry. So one of these repositories could get irrecoverably corrupted and I still wouldn't lose any of my data. With cloud storage becoming so cheap I've also thought about adding a third repo.
Of course this would not protect me from a hypothetical bug in Restic that corrupts all my repositories before I notice, so maybe I should also add another auto-backup solution into the mix.
Doing manual things like moving data to external storage seems like a robust strategy, but I really don't trust myself to do something like that nearly often enough for it to be useful.
I have used restic for quite a while. Once in a while, I test that I can restore my backups. That's an important step that lots of people miss.
I had a client that asked me to setup their system. I setup the system, they got a tape drive and I had them rotate tapes daily. There was a cronjob to tar everything to tapes.
It was great until their hard drive failed and I found I had a typo where it was only backing up the current folder, not the whole drive. Needless to say, they found a new IT provider and I learned an important lesson. If you haven't tested your backups, you have no backups.
I'll agree with the author partly on software such as rustic/duplicacy - they are not a good solution for long-term archiving, nor are they marketed as such. I keep a network share that I mirror the home folder on my computers to daily, but I take periodic snapshots of the share with duplicacy and copy the snapshots to Backblaze B2.
The comments seem to be conflating Google Drive type cloud storage with more general S3/B2 object storage. I wouldn't rely on Google Drive and friends for backups - typically those have pretty short version histories as they are intended more for file synchonization with backup being an afterthought.
> I'll agree with the author partly on software such as rustic/duplicacy - they are not a good solution for long-term archiving, nor are they marketed as such.
I self host a Nextcloud server and I run duplicity each night to back this up to an offsite location. I use incremental backups because I have a data cap. From there each month I duplicate the most recent versions of the Duplicity directory onto a cold hard drive at the same location.
I was surprised to see a couple people in this thread say to not rely on Duplicity for this. What could I be doing better?
My understanding is that there are two issues. First, modern deduplicating backup software like borg/restic/duplicacy store data in a repository in unique chunks. This avoids the issue that incremental backup software like duplicity have where they can create long chains of incremental changes which is slow to restore and increases the likelihood of errors on restore. Second, both deduplicating and incremental backup solutions aren't suggested for long-term archiving as they chop your files into lots of little pieces and the chances of not being able to read the repositories 10 years down the road are high. For that reason it's good to have a local backup in a simple, standard format like tar/zip or just a folder. As an example, see criticism of the Perkeep software [1] which is marketed as long term storage, but uses chunking deduplication for no particularly good reason.
Got it. So perhaps after the Duplicity files have been incrementally uploaded to the remote datacenter, for the cold storage backup of those files rather than simply duplicating the Duplicity files I should unpack them and then rearchive them into a single flat encrypted archive.
I am hoping to do a Duplicity Full Backup rarely due to data caps. I am hoping I can unpack the incremental Duplicity archives and only if it fails verification will I do a Full Backup.
I'm near a very similar setup. Syncthing for file replication, storage/backup to ZFS TrueNAS, then a secondary backup to another ZFS system that's usually offline.
Can someone point me to the flaws of my method? I simply use Google Drive.
Me and my partner have multiple devices (tablet, iPhone, Android phone, multiple Mac and Windows laptops) and we just sync folders to our desktop. We just store everything we don't want to lose in Drive. We share folders we both need. Photo's we take on our phones are automagically backed up in the cloud, music and movies are streamed.
My house can burn down overnight and we won't lose any valuable data.
The flaw is that you depend on Google. You can not trust them at all, for anything. You can use Drive as a redundant fifth backup either for data that you’ve encrypted yourself or for things that you don’t mind being public, but not as the only place that has your data. Things that could very well happen: you get locked out with no recourse; people at Google browse your data; your data gets mined; your data gets lost; Google messes up and other people get access to your data; Google gets a subpoena and those photos that you thought were perfectly normal are now evidence of a manufactured crime.
Makes you solely dependent on Google. That's not really a problem, but as we've seen many times on HN and elsewhere, there's always the random chance that Google's automated systems can decide that your account meets some unknown criterion to be locked, deleted, or otherwise preventing you access.
It's rather unlikely, but so is the chance of your house burning down.
That's generally the reason people recommend the 3-2-1 rule. By having multiple separate backup solutions, you're hedging your bets. But as is commonly the case in computing, you have to play the convenience vs security/reliability game here and decide where the line between your time and convenience lies against the reliability of your backups and what risks your willing to take.
Although I'm not sure of the rate, people have reported getting banned from google services without notice for seemingly no apparent reason. And there is little recourse to unlocking their account.
It's something I personally don't worry about, but something to consider when relying on their services for backing up data.
I do the same with Dropbox. I've been doing this for years and it's saved me a number of times. I don't know if there are flaws with this but it's worked better than anything I've cobbled together myself.
I've been using btrfs and btrbk for a few months now and really enjoy it. I love that snapshots are easy to make, and btrbk is a fairly simple way to schedule taking snapshots and transferring them to another machine, as well as set up a retention strategy. I agree with the author that simpler is better -- if I lose data/delete a file I want back, it's as simple as mounting the snapshot and grabbing what I need.
I use dropbox, and a rsync script to a portable hard drive around once a month. I travel a lot, so i keep hard drives at several different places, which i think spreads the risk fairly well.
The only reason I'm willing to sync my data to dropbox though is that it's all encrypted with cryfs. Works really well normally, although with an m1 chip it's been a bit of a pain.
I got two 1TB harddrives Western digital and iOmega. WD scarred me, I had all my family photos in it, childhood, sister'marriage. WD usb pin socket is sort of weak, one time when i pulled the wire, the holder came with it. It was a scary situation. Finally repaired it.
Moved it to cloud. That's also scary. Anybody knowing your email can lock you.
My approach is 6 years old and I use Time Capsule. The best ever solution for me. You set it up and you don't case anymore. It's Apple specific unfortunately. Don't know if there is a similar solution for Windows.
What's the best way to encrypt files, GunPG, openssl, ccrypt, some AES256 command line utility, or some backup can do it automatically? I do want to encrypt part of my backup before upload them to the cloud.
I really like this guy's web page source. Do a source view, it's all very neat and tidy, you could just read that and not feel you're losing much in terms of experience.
I currently have a Synology RAID6 and I use a cloud backup service. But I’ve been thinking about adding another layer in the form of off-site tapes. Any recommendations welcome.
I have tapes as my last line of defence (ZFS snapshots, backups to NAS, online are my earlier lines of defence).
The hardware itself is relatively simple but (at least in my experience - I tried quite a few options but they were all missing at least one key feature) there's no good "set and forget" software solution out there in the same way as there is for online backups so you're going to spend a long time scripting it all and tweaking it properly. You can build a solution on top of whichever other tools fit your needs that way. I have a set of scripts which create my backups in single large files on disk ready to be spooled out, then I use mbuffer to actually put the data onto tape.
In terms of hardware, 1 or 2 generation old LTO drives are often available for a decent price in good condition on eBay. You'll also need either an SAS or Fibre Channel interface depending on what drive you buy. As far as I can tell they're pretty much all supported by Linux these days.
Given the complexity of scripting and potential for user errors, testing that you can restore properly is pretty well essential, of course.
I took the deep dive into tech trying to fix the 2x cd drive someone gave me for my x86 pc. Part of the passion I have for tech is learning and digging deeper, and reading about topics I haven't learned yet to the point where I get a bit of malaise when I'm not learning something new. I say this as I'm repurposing an old pc to a full time nas and learning about zfs pools. There's always a simpler, faster, more expensive solution out there that takes the complexity out of the equation, but part of the fun is the complexity for me.
This is very good advice.
That being said, humans need heuristics and shortcuts to aid in decision-making. I hope the fact that rsync.net has been doing this work since 2001 is helpful in that regard.
...
"There exist some really cool open source backup solutions such as Borg, Restic and duplicity, but you should never rely solely on these "complex" solutions. These tools work really great, until they don't! In the past I have lost data to duplicity and other tools."
I think this is very good advice as well - and that is coming from someone who has whole-heartedly endorsed 'borg' as a backup tool and regularly recommends it. It is the "holy grail of backups"[1] after all ...
It's also a magic black box for most users and would be difficult to work out failures.
The safest and (in my opinion) most useful workflow is to back up your data locally to some kind of NAS or fileserver using plain old rsync and then back up that fileserver to rsync.net (or whomever) using the fancy borg tool.
Now you have quick and simple local restores but still have a backup in the cloud that requires zero trust in the provider.
[1] https://www.stavros.io/posts/holy-grail-backups/