I personally run syncthing on several devices, and don't worry about the cloud. It's self-hosted, devices replicate files between themselves, and there's no real limit other than hard drive space. It runs on just about anything too; several of my backup systems are Raspberry Pis.
It can be a bit weird to set up initially, and is a lot less magical in the interest of putting you in control for privacy reasons, but the flexibility added is pretty useful. I have a music folder that I sync to my phone without needing to pull the rest of my backups along with it, since they wouldn't fit anyway. Several of my larger folders aren't backed up on every single device for similar reasons, but some of my really important smaller folders (documents, photos, regular backups of my website's database) go on everything just because it can.
Unfortunately, in my experience, Syncthing's versioning mechanisms leave much to be desired compared to what I'm used to from Dropbox. AFAIK all of Syncthing's versioning schemes only keep versions of files that have been changed _on other devices_, and not those that have changed on the device itself, whereas what I'm looking for is an option to keep a synchronized version history for all files on all devices, and the ability to more intuitively roll back and roll forward the state of any file to any revision without having to mess with manually moving and replacing files and reading timestamps (better yet would be the ability to do so for entire directories, but I realize this would probably be very difficult to accomplish across devices in a decentralized manner).
That's my primary use case for Amazon Drive. I have a robust rsync of the workstations and laptops to a NAS, and then to a second (incremental-only, no delete) NAS. Works great, but if the house burns down, or if someone breaks in and steals the computers, I want to ensure there's a copy somewhere.
If that is your main concern you could always put it on an external drive and put it a bank safe deposit box. I've thought about doing that for at least the very important things, perhaps even printing some important pictures too.
A good scenario is building a backup server/nas solution that you can put in a little cubby at your friends place. There's trust involved that you're not using their internet to hack the government, and you have to be mindful of their bandwidth/power costs. So not a rackmount server or even a tower, but something much smaller and very appliance looking. A nuc sitting atop a wd passport or their "my book".
If it provides them a benefit like an in-house plex server, even better.
I've moved mostly to syncing through Syncthing for my devices too, but I'm curious what people use for sharing files with others and accessing files through a browser on machines you don't control?
There's Siacoin, a cryptocurrency/blockchain built around the idea of decentralized encrypted p2p storage.
They're dirt cheap to store as of now: median contract price is $12/TB·mo, but network storage utilization is currenly only 2%, so actual deals settle on about $2/TB·mo. Downside is that exchange rate of their coin is highly volatile, at least was during last month.
Do these decentralized storage networks provide any guarantees in terms of durability, redundancy and availability? I've been looking into Siacoin, Filecoin, Storj and the like, but lack of clarity around some important concerns have so far prevented me from taking them seriously as a backup solution:
1. Performing a restore in a timely fashion on a large dataset seems like a tall order if these networks don't impose any minimums for the upstream bandwidth of the hosts.
2. Files can completely disappear from the network if the machines that are hosting them happen to go dark for whatever reason, which seems to be a much more likely occurrence for some random schnub hosting files for beer money than it would be for traditional storage providers that have SLAs and reputations to uphold.
Maybe these concerns are unfounded, and some or all of these networks already have measures in place to address them? I'd appreciate it if someone more familiar with these networks could enlighten me if that's the case.
In addition to redundancy, Sia has the concept of collateral, which is basically money locked in a smart contract that says "I'm willing to bet this money that I'm not going to lose your files". I.e. Hosts lose the money if they fail to store your files.
Different hosts have different amount of collateral, and it's both an important security measure as well as market mechanism.
Also, Sia is completely decentralized (unlike StorJ for example), so it can't be intervened with by anyone which might result in lost files.
Speaking as a Sia developer, I can address your concerns.
> these networks don't impose any minimums for the upstream bandwidth of the hosts.
Sia today primarily handles that through gross redundancy. If you are using the default installation, you're going to be putting your files on 50 hosts. A typical host selection is going to include at least a few sitting on large pipes. Downloads on Sia today typically run at about 80mbps. (the graph is really spiky though, it'll spike between about 40mbps and 300mbps).
We have updates in the pipeline that will allow you to speedtest hosts before signing up with them, and will allow you to continually monitor their performance over time. If they cease to be fast enough for your specific needs, you'll drop them in favor of a new host. ETA on that is probably ~August.
> Files can completely disappear from the network if the machines that are hosting them happen to go dark for whatever reason
We take host quality very seriously, and it's one of the reasons that our network has 300 hosts while our competitors are reporting something like 20,000 hosts. To be a host on Sia, you have to put up your own money as collateral. You have to go through this long setup process, and there are several features that renters will check for to make sure that you are maintaining your host well and being serious about hosting. Someone who just sets Sia up out of their house and then doesn't maintain it is going to have a very poor score and isn't going to be selected as a host for the most part.
Every time someone puts data on your machine, you have to put up some of your own money as collateral. If you go dark, that money is forfeit. This scares away a lot of hosts, but that's absolutely fine with us. If you aren't that serious about hosting we don't want you on our network.
> but lack of clarity around some important concerns have so far prevented me from taking them seriously
We are in the middle of a re-branding that we hope introduces more clarity around this type of stuff as it relates to our network.
This is the one I've got my eye on - once the marketplace boots up on both sides, it's going to be hard to compete against it. I suspect some day even the big providers like Amazon and Google will sell into these kinds of marketplaces.
For data storage, you need error encoding. Sia does that, but you pay for it. So for 1TB of data, you upload 2TB to the network (that's how Sia is configured) and at the current $2.02/TB per month, that's $4.04/TB, which is more expensive than Glacier. Glacier charges funny for downloads but Sia charges for downloads too.
I assume that if you wanted to store ~2.5TB like we're talking about, you'd be paying more than $4/TB, because 2.5TB is 10% of the total of all data currently stored in Sia, currently 24.5 TB. (By comparison the major cloud providers are undoubtedly in the exabyte range of actual data stored. Or for another comparison, you could comfortably hold 24.5 TB of storage media in one hand.)
Sia promises to be cheap because you're using unused bytes in hard drives that people already bought, but that's exactly what Amazon, Google, and Microsoft are already doing, except their data centers are built in places where the electricity costs less than what you're paying. Plus they don't charge you extra for data redundancy.
In that case, Sia provides an avenue for an new company with access to cheap electricity to compete with Amazon, Google, and Microsoft without investing a cent in marketing or product. They will just plug in and start receiving payments, and strengthen the network and lower the price in the process.
Another cool thing is Sia lets hosters set their storage and bandwidth prices, so specialized hosts will likely pop up. For example one host might use tape drives, set cheap storage cost and expensive bandwidth cost. Clients can prioritize as desired. SSD servers with good peering can do the opposite.
The real interesting part will be when you can create one-time-use URLs to pass out, which connect directly to the network - effectively turning it into a distributed CDN.
The $2 / TB / Mo we've traditionally advertised as our price included 3x redundancy. The math we've done on reliability suggests that really you only need about 1.5x redundancy once you are using 96 hosts for storage.
The network prices today are less friendly, though that's primarily due to market confusion. The siacoin price has doubled 6 times in 6 months, and there's no mechanic to automatically re-adjust host prices and the coin price moves around. So hosts are all currently advertising storage at these hugely inflated rates, and newcomers to Sia don't realize that these aren't really competitive prices.
Though, I will assert that even at our current prices it's not price that's the primary barrier to adoption. It's some combination of usability, and uncertainty. Sia is pretty hard to set up (it's around 8 steps, with two of those steps taking over an hour to complete), and a lot of people are not certain that Sia is truly stable enough to hold their data.
You can't compare to Glacier. S3 is a more comparable product. And obviously redundancy is already in the price, or did you think there's no redundancy?
From what I understand, your client does the error encoding and pays for raw data storage on the network, rather than trusting the network to do error encoding. You can configure the encoding to whatever you want, you just end up paying more for more redundant encodings.
I used Backblaze for several years before closing out my account in 2012.
Initial backup took a long time. There was no easy way to prioritize, for example, my photos over system files. I ended up manually prioritizing by disallowing pretty much my entire filesystem, and gradually allowing folders to sync. First, photos, then documents, then music, etc.
Eventually it all got synced up and it was trouble-free... until I tried to get my data back out.
The short version of the story is that a power surge fried my local system. I bought a new one and had some stress when it appeared the BB client was going to sync my empty filesystem (processing it as a mass delete of my files). I managed to disable the sync in time.
Then I discovered there was no way to set the local BB client to pull my files back down. Instead, I had use their web-based file manager to browse all my folders and mark what I wanted to download. BB would then zip-archive that stuff which would then only be available as an http download. There was no Rsync, no torrent, no recovery if the download failed halfway, and no way to keep track of what I had recently downloaded. Also, iirc, they were limited to a couple of GB in size per file. (which didn't matter because at that time, the download would always fail it the file was larger than __MB (I don't remember the exact number. 100MB? 300? Also hazy on the official zipfile size limit)
So I had to carefully chunk up my filesystem for download because the only other option BB offered was to buy a pre-filled harddrive from them (that they would ship to me).
I felt like Backblaze was going out of their way to make it hard for me in order to sell me that harddrive of my data. I felt angry about that and stubbornly downloaded my data one miserable zipfile at a time until I had everything.
Once I was reasonably sure I had everything I cared about, I closed my account and haven't looked back.
[Edit to add] This was at least 5 years ago. No doubt their service has improved since then.
I would think that for a full restore you might be better off with their restore by mail. Note that if you move your data off the drive they ship then send it back they refund the charge for it.
I use Backblaze but haven't had to do a restore yet. It appears their current limits are 500gb per zip file. They also have a "BackBlaze Downloader" utility (Mac & Windows) that has the ability to resume interrupted downloads.
It looks like styx31 linked to B2 which is a separate service from their backup service that's closer to S3 or Google Cloud Storage. With that you can use rclone which should avoid the issues you encountered, though at higher cost if you have a lot of data (there are per-GB storage and download fees).
My restore experience was also poor with Backblaze. The download speed was slow. If I had to restore an entire drive it would have take me many days to download the entire thing.
I switched to Arq with Amazon Drive as the storage backend.
I feel like a luddite but I have three backups at home (PC HD, 2 rsync'd USB drives I bought several years ago) and one off-site backup (encrypted HD in locker at work). Far cheaper afaict than any cloud backup.
I think this is a good basic and relatively low-tech strategy.
Do you do versioning? As in what happens if your files are silently corrupted e.g. by accident or by malware? Rsync would overwrite your files, and you might even overwrite your off-site backup when you connect it.
My main reason for going beyond such a set-up though is that it takes time, effort and remembering to sync the off-site backup by taking it home, syncing and putting it back. And during that time all your data is in the same place. If something happens to your home during that time (break-in, flooding, fire...) you're out of luck. Unless your rsync'd drives are also encrypted and you just switch one of them with the off-site one for rotation.
I have a Raspberry Pi at my parents home (which has r/w to the disk attached to my fathers Airport extreme), it rsyncs every night with the server in my basement (which has all my data on 2 disks.) It also syncs my parents data back to me. It works well but I still need to add a feature to email me if syncing somehow halts or errors out. I use "rsync -av" (over SSH), so nothing is ever deleted.
It could be overwritten though. A good backup protects you from more than just destruction at the primary site. There are various relatively efficient ways to arrange snapshots when using rsync as your backup tool.
Also, remember to explicitly test your backups occasionally, preferably with some sort of automation because you will forget to do it manually, so detect unexpected problems (maybe the drive(s)/filesystem in the backup device are slowly going bad but in a way that only affect older data and don;t stop new changes being pushed in).
Versioning backups seems like a must. Encrypting malware is a thing and has been for a while, just like rm -rf type mistakes which are subsequently propagated automatically to "backups".
Another thing that I do with my backups is making it so that the main machine can't access the backups directly and vice-versa. It is slightly more faf to setup, adds points of failure (though automated testing is still possible), and is a little more expensive (you need one extra host) but to significantly so.
My "live" machines push data to an intermediate machine, the the backup locations pull data from there. This means that the is no one machine/account that can authenticate against everything. Sending information back for testing purposes (a recursive directory listing normally, a listing with full hashes once a month, which in each case gets compared to the live data and differences flagged for inspection) is the same in reverse.
This way a successful attack on my live machines can't be used to attack the backups and vice-versa. To take everything you need to hack into all three hosts separately.
Of course as with all security systems, safe+reliable+secure+convenient storage of credentials is the next problem...
Crashplan isn't incompatible with NAS. You can either mount a share and run it from your workstation, or run it directly on the NAS itself. The core of the product is Java so it runs on just about any architecture to boot.
Coming from someone who tried to do this setup, it wasn't worth it. CrashPlan's client isn't something you generally would want to run on your NAS, it takes memory proportionate to the amount of data on your disk (and a fair amount of RAM, at that) and unless you're running an GUI on your NAS it's impossible to configure without a huge headache.
You can run it from your workstation, but if you've got a reasonable amount of data on your NAS then the memory issues will bite you again. Something like Backblaze B2 is more expensive, but I'd rather pay $10/mo to backup the 2TB of data on my NAS (growing every day) and use CrashPlan to backup my computers only.
> CrashPlan's client isn't something you generally would want to run on your NAS, it takes memory proportionate to the amount of data on your disk (and a fair amount of RAM, at that) and unless you're running an GUI on your NAS it's impossible to configure without a huge headache.
CrashPlan's client is able to attach to a headless instance [1], but the RAM requirement does mean that it's only really usable on NASes with expandable RAM.
I used Crashplan for 3 years on a Synology NAS. It's a disaster. Every time there was a Synology upgrade, the CP headless server would stop working, and you'd need to reinstall, re-set the keys, etc.
After 10 ou 15 times doing this, I got rid of Crashplan entirely, migrated my backups to Amazon Drive, and never looked back.
Given the lack of decent options, seems the best choice will really be to pony up the $180 for 3TB that Amazon will start charging next year...
If you were paying $60/year for 2-3T of cloud storage then Amazon was subsidizing you. Even Glacier would cost $120/year for 2.5T, and Glacier is so cheap that everyone's trying to figure out how they could possibly sell Glacier and still be making money.
Why is CrashPlan incompatible with NAS? I am running it on a headless Ubuntu server and it works just fine (you just need about 1GB of RAM for every TB of storage).
Sure, but B2's pricing isn't too expensive anyway. If I had all 7TB of usable space filled up on my NAS it'd cost me $35/mo - that's easily doable, even for a digital packrat like myself.
The Glacier storage class on S3 would probably better if you like Amazon and are okay with Glacier's price. Backblaze's B2 is pretty cheap too, and has a nice API.
Google Cloud Nearline storage is rather cheap, doesn't have as much limitations as Glacier and is AWS API compatible so NAS backup software works with it.
Re Crashplan & NAS... I've managed to get NAS back up to work. Are you certain on this point? I am going to double check my set up.
I have the MacOS CrashPlan client configured to back up a variety of NAS shares when the NAS is powered on and the share is mounted. Only about 4 shares, and I made a point to mount them and leave them mounted until the sync completed.
The shares are cold storage, so once synced, they stay virtually unchanged.
Ok, Google Storage Nearline is still $10 a month for 1 TB. That's $120 a year vs $59.99 a year for Amazon Drive not including Google bandwidth which could be significant.
I feel like the comment about bandwidth got ignored—you only pay for egress bandwidth, which basically means you only are paying high bandwidth fees if you lost all of your data and it's an emergency, at which point they seem pretty reasonable because you just lost your house in a fire or something like that. Uploading is free (well, you pay your ISP).
Most of the time, people only need to restore a few files from backup because they were accidentally deleted. The bandwidth costs for a few GB here and there are pretty cheap.
I've been thinking that a p2p backup solution (encrypted storage, storage cryptocurrency, occasional random requests to make sure they're still around) would work. I guess these guys: https://storj.io/. $15/TB of storage, $50/TB of bandwidth. and competitors: https://news.ycombinator.com/item?id=13723722
Make a tool that makes photo files out of data files. In its simplest form you just need to make the according header files (added benefit: additional meta data can be easily encoded). Because as a Amazon Prime photo storage is for free and unlimited.
I was always amused when I warned people on /r/datahoarder against abusing the service because Amazon would inevitably put an end to it. I was always told that I had no idea what I was talking about and was given many rationalizations about why Amazon wouldn't care about users storing dozens or hundreds of TB of files on the service.
Indeed, and it's in their right to stop offering that when the period you payed for ends.
Its understandable from their point of view to offer unlimited and be awesome but not expect this kind of usage that is not sustainable. So they made a mistake and are correcting it.
It's hard to see it as a deliberate strategy to pull in users and then charge them more when they are "locked in"
Possibly because there actually wasn't any limit. Maybe if a handful people were exceeding $LOTS TB, they don't care, but if 60% of users exceed $LOTS TB, the service becomes unsustainable. In this case, the service really is unlimited (there genuinely is no limit that you're not allowed to go over), and if you wanted that effect, advertising a limit would be net negative — a high limit would encourage the "too many users use a lot" case and lead to the same result we get now where the plan has to be canceled for unsustainability, and a low limit would defeat the purpose.
> At the same time, I don't get, why would you encrypt your "Linux ISO's"? Let the AWS dedup do its job, don't abuse it, and everyone is happy.
Because if you are a self-proclaimed data hoarder, do you have the time to sort through and selectively classify your hoard to "encrypt this ISO don't encrypt that tarball" on a file-by-file basis across many terabytes?
How much would be saved by deduping anyway? If they're not deliberately making it easy/redundant, even if you got 300TB down to 100TB or such, a single order-of-magnitude reduction doesn't fundamentally change the economics of "unlimited."
I store a bit of data at home (only ~20TB). Really easy to sort. There are plenty of apps that do it for you. This extension with those keywords in filename goes to this directory. Others to another dirs.
I only have my pictures and personal data in AWS cloud, encrypted. They way I set it up? Point rclone to relevant directories and skip the rest.
As someone completely unfamiliar with this space, this prompted me to do some reading into this rclone issue. I'll record it here for anyone else similarly curious.
It seems that as of a few months ago, two popular (unofficial) command line clients for ACD (Amazon Cloud drive) were acd-cli[1] and rclone[2], both of which are open source. Importantly the ACD API is OAuth based, and these two programs took different approaches to managing their OAuth app credentials. acd-cli's author provided an app on GCE that managed the app credentials and performed the auth. rclone on the other hand embedded the credentials into their source, and did the oauth dance through a local server.
On April 15th someone reported an issue on acd-cli titled "Not my file"[3] in a user alleged that they had received someone else's file from using the tool. The author refered them to amazon support. The issue was updated again on May 13th with another user that had the same problem - this time with better documentation. The user reached out to security@amazon.com to report the issue.
Amazon's security team determined that their system was not at fault, but pointed out a race condition in the source for the acd-cli auth server (sharing the auth state in a global variable between requests...) and disabled the acd-cli app access to protect customers.[4]
In response to this banning, one user suggested that a workaround to get acd-cli working again would be to use the developer option for local oauth dance, and use rclone's credentials (from the public rclone source).[5] This got rclone's credentials banned as well,[6] presumably when the amazon team noticed that they were publicly available.
To top this all off, the ACD team also closed down API registration for new apps around this time (which seems to have already been a strenuous process). I suppose the moral of the story is that OAuth is hard.
I hope this (and the many more examples) put a stop to this "unlimited" bs. You can't say people were abusing a service that throws that keyword for marketing reasons.
That is very selective of them. While their marketing materials said "unlimited", people chose to ignore the ToS which stated that they wouldn't tolerate abuse and that abuse was basically whatever they determined it to be.
Yes.. but them not having an upper limit doomed "the rest of you" from the beginning. Is anyone surprised some would do that? Is Amazon? Should they be? Of course not..
Corporations see "complicity in an illegal act" as a negative utility far larger than the ultimate lifetime value of any single customer. So, when you do something illegal (even if for dumb reasons) and use a corporate service to do so, you've got to expect that said corporation will immediately try to distance themselves from complicity in that act by terminating your account with them. This is one of those "inherent in the structure of the free market" things.
So, first of all I think you're focusing on the wrong thing.
The whole point of an unlimited tier is to attract large numbers of outsiders who don't want the cognitive burden of figuring out $/GB/month and estimating how many GB photos they'll need to store.
What we're talking about here is that they got some customers like that, but they also got a small number of customers taking them for a ride, call them 'power users' the kind of customers who (as we see elsewhere in these comments) won't stick around if the price changes.
There's nothing wrong with these power users storing huge amounts of data at subsidised price, just like there's nothing wrong with Amazon changing the pricing. They just decided to stop subsidising that behaviour and probably take a slight hit on a conversion rate somewhere.
As for your question about 'private' storage, it's a grey area. Privacy isn't absolute, especially in cases where a company is by inaction helping you breaking the law (whether you agree with the law or not). Companies work very hard to distance themselves from responsibility for their customers actions and don't want to jeopardise that by letting it get out of hand
> Privacy isn't absolute, especially in cases where a company is by inaction helping you breaking the law (whether you agree with the law or not). Companies work very hard to distance themselves from responsibility for their customers actions and don't want to jeopardise that by letting it get out of hand
How does this work with Google Play Music (you can upload up to 50k songs for free and listen to it "on the cloud")?
I think you are focusing on the wrong thing. Corporations don't care about the law any more than individuals do. Laws and regulations are just guidelines if you are determined enough to get your way. Look at all the Uber stories. Pretty sure people here still like Travis for his tenacity no matter what you say about his morality.
I think we often forget that humans wrote the laws we have today. They didn't come to us in stone tablets down the mountain top. At the end of the day, these laws don't matter. They are not written in stone so as to speak. We should always strive to do better. Intellectual property is a sham. I mean think about it. I think there is legitimate intellectual property, the trademark.
I think it is wrong for me to sell "Microsoft Windows" (even if I wasn't charging any money) if I had modified the software and added malware into it. But me watching a movie or reading a book without paying royalties does not hurt anyone.
Please think about it. Just because something is legal does not make it right and just because something is illegal does not make it wrong. We need to calibrate our laws based on our image and not the other way round. We write the laws. The laws don't write us.
> Corporations don't care about the law any more than individuals do.
I'm struggling to find a connection between the points that I made in my comment and the points in your reply. Suspect we have some miscommunication here... my own comment wasn't spectacularly well filtered.
I'll bite on these though;
> Laws and regulations are just guidelines if you are determined enough to get your way. Look at all the Uber stories.
Don't conflate civil or criminal law with the work of regulatory bodies, who in my experience with the FCA and OFT are very open and collaborative without any need for "tenacity".
Uber work very hard on marketing and competition, but they are allowed to succeed to regulators who WANT them to succeed despite their amoral hussle, not because of it. Regulators in my experience (the FCA and OFT specifically) are very open and collaborative. They understand that markets move on and regulations sometimes lead and sometimes follow.
> Please think about it. Just because something is legal does not make it right...
So, I'm assuming from this comment that you're quite young. Just for you information; I suspect most folks on HN are already aware of the delta between legality and morality.
I'd also recommend thinking about the subjective nature of morality, and the causes and malleable nature of it.