Hacker News new | past | comments | ask | show | jobs | submit login
Tarsnap – Online backups for the truly paranoid (tarsnap.com)
147 points by tete on May 25, 2013 | hide | past | favorite | 132 comments



Tarsnap works really, really well. Just make a consistent snapshot of your data (I'm using UFS snapshots), point Tarsnap at it, and you're good to go.

The documentation is thorough, and Colin (the owner/operator/author) responds quickly to emails.

Finally, compression and deduplication is amazing:

  [nick@home ~]$ sudo tarsnap --print-stats
                                         Total size  Compressed size
  All archives                               348 GB            76 GB
    (unique data)                             34 GB           6.3 GB
Yep, I've backed up 350GB of data, but since most of it is duplicated, I pay for storing 6.3GB. Win.

One word of caution though - this isn't a mainstream consumer backup service. If you lose your keys you lose your data. No chance of recovery. So make sure you back those up properly too, ideally in a different geography.


Just make a consistent snapshot of your data (I'm using UFS snapshots), point Tarsnap at it, and you're good to go.

You're using the --snaptime option, right? It's necessary when you're backing up a filesystem snapshot in order to work around a race condition with them -- if a file is modified, the filesystem snapshot is created, and then the file is modified again, all within a single time quantum, it can trick Tarsnap into thinking that the file hasn't been modified later (which triggers an optimization of "this must be the same blocks as it was last time" in place of the usual "read the file and split it into blocks" behaviour).

Finally, compression and deduplication is amazing:

Well, if we're going to be posting statistics here...

                                         Total size  Compressed size
  All archives                               269 TB           121 TB
    (unique data)                            177 GB            72 GB
That's 269 TB of data backed up from my laptop, deduplicated and compressed down to 72 GB. This is what I get for taking a backup of my entire home directory every hour...


> You're using the --snaptime option, right?

Yep, emailed you about this in fact, and appreciated your detailed response.


Ah, that was you -- I remembered sending an email about snaptime recently but couldn't remember who it was to (and HN user names don't always correlate anyway...)


Excuse my ignorance, but what does duplicated in this context mean? Multiple copies of the same file?


Multiple copies of a file, or parts of a file. These can be spatial (your archive contains an SVN checkout, so there's an extra copy of every file in the .svn/pristine directory) or temporal (you take regular backups, and many of the files haven't changed very much from one backup to the next).


Ah okay, thanks. Another question: Why wouldn't compression take care of that? Isn't the point of compression to compact as many repeating sequences as possible?


Deduplication is a form of compression, yes. Most forms of compression are "local" however -- looking to match data against bits from within the past few MB -- so they won't detect duplicated data spread across entire archives.


In order for compression to work, the data must be 'solid'. Which means to add or remove something from the archive, you must reprocess the entire archive. This isn't a very good model for backups, especially when you have lots of them. (As an aside, .zip files aren't 'solid', so two copies of a file won't compress well. This is also the reason why most archives on linux are done through tar, to create a single stream of data).

Tarsnap uses variable blocks (in such a way that inserting into the middle of a file creates minimal differences). If a new block is detected during a backup, it only needs to send that block, as the rest are already stored on the server. It also means that new archives can refer to the old blocks stored, allowing each archive to be independent, and unused blocks removed when the last archive using it is deleted. Before sending, blocks are compressed then encrypted, so compression can't really help since you might not have all the old data locally.

You can also deal with backups using a master and incremental diffs. This doesn't work well with the tarsnap model, as archives are no longer independent.


why Tarsnap pricing is defined in terms of picodollars per byte rather than dollars per gigabyte: Tarsnap's author is a geek. Applying SI prefixes to non-SI units is a geeky thing to do.

I find that so amazingly annoying. To me it says "yeah, I know many people might find it hard to get their head around the units I defined, but I don't really care about that because I find it cool." We have standard units for a reason, because people can immediately get the scale of something in their mind. With this, you can't. I went to their site open to what they were selling, but I'm very turned off by this.


Note that "they" are one person, Colin Percival. That's all Tarsnap is; one person, some backup code he wrote, and data stored on Amazon S3. As the sole owner and only person working on a product aimed at a niche audience, I don't think it's unreasonable that he has a little fun with the way he runs it. If this kind of thing bothers you, this service probably isn't for you.

But also, you picked out only one of the three reasons he listed (http://www.tarsnap.com/picoUSD-why.html); the other two are also important:

If prices were listed in dollars per GB instead of picodollars per byte, it would be harder to avoid the what-is-a-GB confusion (a GB is 10^9 bytes, but some people don't understand SI prefixes). Picodollars are perfectly clear — nobody is going to think that a picodollar is 2^(-40) dollars.

Specifying prices in picodollars reinforces the point that if you have very small backups, you can pay very small amounts. Unlike some people, I don't believe in rounding up to $0.01 — the Tarsnap accounting code keeps track of everything in attodollars and when it internally converts storage prices from picodollars per month to attodollars per day it rounds the prices down.

And finally, the price in dollars per GB are also prominently displayed, right after the price in picodollars per byte. So really, you're just being bothered that he's having a little bit of geeky fun even though it has absolutely no effect on you.


Well,

    Storage:	300 picodollars / byte-month
    ($0.30 / GB-month)
    Bandwidth:	300 picodollars / byte
    ($0.30 / GB)
right on the home page.


I get how it could seem too self-induldgent, but I think that was mostly meant tongue-in-cheek. The real reasons are right below:

If prices were listed in dollars per GB instead of picodollars per byte, it would be harder to avoid the what-is-a-GB confusion (a GB is 10^9 bytes, but some people don't understand SI prefixes). Picodollars are perfectly clear — nobody is going to think that a picodollar is 2^(-40) dollars.

Specifying prices in picodollars reinforces the point that if you have very small backups, you can pay very small amounts. Unlike some people, I don't believe in rounding up to $0.01 — the Tarsnap accounting code keeps track of everything in attodollars and when it internally converts storage prices from picodollars per month to attodollars per day it rounds the prices down.

Plus, as others have pointed out, prices are listed in standard units (dollars per GB) just below the oddball ones.


Then don't use it if you find that a simple conversion is "annoying". The author is catering to an audience that, bar none, wants transparency and privacy in their backups. He's done a phenomenal job of this. It's no loss if he's out a user such as yourself.


But the price is also given in $/GB-month, so you can at least figure it out using units that people will probably find more convenient. I couldn't see the quoted comment on the opening page, which is probably lucky, because it annoys me more than simply giving the prices in picodollars/byte. (Pricing it like that at least reveals that the pricing is actually per byte, rather than being, say, $0.30/GByte with some kind of rounding.)

More generally, I've always found it to be a good idea to be at least somewhat circumspect if you're going to have some kind of an asocial relationship with somebody (e.g., getting them to give you money). It's impossible to foretell what people will get annoyed by, so you might as well give them as few things to get annoyed by as possible. (I suppose this is the "better to keep quiet and be thought a fool..." principle, in a way, though obviously foolishness is not the precise issue here. Oh well. I don't claim to be original.)


Yeah but on the homepage it says in brackets the cost per GB. $0.30/GB.


Interesting: in my browser, this comment is both greyed out and upvoted to the top of the list. Does this mean that comment placing is determined by upvotes, and comment greying is determined by downvotes, but those two processes are independent of each other?


Comment placement is by both new-ness and upvotes. So a post that is downvoted immediately sinks faster, but doesn't immediately go to the bottom.


Also average comment score of poster.


As an alternative, I use Arq continuously on all my computers and I highly recommend it (Sorry I'm on my iPhone and won't be able to give a link). It lets you use your own AWS credentials for backup and you can encrypt the data before it is sent to AWS.

The issue I have with Tarsnap is that the data is still at the hands of a small operation, as far as I can tell, and honestly I'm afraid we won't get our data if something happens to the guy. This is fine of course for many services, but data backup is inherently as mission critical as it gets. The whole reason for it is reliability, assurance and redundancy. It is not a nice to have, it is for many people the only place they fully trust to keep their data forever.

I wish Tarsnap had an innovation that made it possible to use it with one's (or an organization's) own AWS credentials. An on-site mode, if you will. Otherwise it has always seemed to me like a great piece of software.


Have you discussed your reservations with cperciva (the guy behind Tarsnap)? I'm sure he'd be happy to address them if you just ask.


Haven't thought of that tbh. I only have a single reservation that I mentioned in my above post. Not to put any words in his mouth, but I'm not sure there can be a solution to it given his current architecture of the system. It is inherently a multi-tenant SaaS backup service...


Well, there's only one way to find out if there's a solution -- contact cperciva and ask him :)


I've just sent an email to Colin about this. Will edit my comment as soon as I have a response.

EDIT: Wow, got a response in less than 5 minutes:

It's not something I'm looking at doing right now. The way the Tarsnap server side is designed, in order to keep costs low (and performance high), data is aggregated between multiple Tarsnap users and stored in S3 as large chunks; keeping each user's data segregated would add a lot of additional complexity and cost.


I have the same thought. I'm using backblaze at the moment but am actively looking to move to either arq or tarsnap. I like that arq is in your own account and the file format is open so you can work on it yourself. Also, storing to glacier means it's dirt cheap. It's reassuring to here someone having a positive experience with it. Backblaze ha been a mixed bag for me.


Have you considered Crashplan? I've had positive experiences with it. The only downsides are that the client program uses a ton of RAM and there's no API.


Arq is fantastic! :)


I just started using Arq myself, and it's perfect for what I need. The Glacier backup is the killer feature. Tarsnap is really nice, but it would bankrupt me. I have 300+ gigs of photos and video (I have children and I'm a total tool with my camera, I know). That's $90 a month for Tarsnap vs $3 with Arq using Glacier. For $90/month it would probably be cheaper to rent a machine somewhere and just use rsync.


http://www.hashbackup.com has dedup, compression, encryption, and lets you use your own storage: AWS or compatibles, rsync, ssh, ftp, imap, local dir, mounted remote dir. Disclosure: I'm the author.


All the data is encrypted before it ever leaves your machine. Not even cperciva should be able to read it.

You can also create a write-only key. If you run tarsnap from a server which gets pwned, the attackers can't touch the existing backups. Don't be the next Astalavista[1].

[1] http://joncraton.org/blog/49/analyzing-the-astalavista-hack


With Crashplan, all data is encrypted before it leaves your machine if you use a private key. The pricing is MUCH better and it's not a single man operation.

If you're paranoid about it being closed source, you can make a quick script to encrypt sensitive data, copy it to another folder, then sync that encrypted folder online. I do something similar with a small % of my data.

As far as server backups, it's trivial to script a copy to your local machine then let Crashplan sync that.


The thing is that Colin Percival has done genuinely novel computer science, real heavy lifting, to make both strong encryption and smart de-duplication possible in the same service.

So far as I know, nobody else has done that.

In practice tarsnap is cheaper than everything else because of the dedupe.


Why do you say so? Crashplan lets you have your own clientside keys that never leave your computer.

Crashplan does dedupe and compression but has UNLIMITED storage/bandwidth for one price.


Well you have me there. I'll fall back on the fact that Colin's code is available and that he's published papers covering all the maths and computer science that leads up to being able to dedupe without sending stuff to the server or decrypting on the server side.


I would have trusted that more if the tarsnap client had source code available.


The source code is available; it's available under a "shared source" license rather than free software/open source (you can look at it, but not modify it), but it is available for review. https://www.tarsnap.com/download.html

He also has a bug bounty http://www.tarsnap.com/bugbounty.html, and several substantial security bugs have been found and fixed due to the bug bounty (http://www.tarsnap.com/bounty-winners.html). In fact, the first of those, the AES CTR nonce bug, was found before he had offered the bounty program; the bounty program was inspired by that bug, and has since led to the discovery of several other more minor issues.

So, the source is available, and there's a bounty out for discovering bugs ranging from cosmetic issues to major security issues. Feel free to review it and submit any bugs you find!


"At the present time, pre-built binaries are not available for Tarsnap — it must be compiled from the source code." https://www.tarsnap.com/download.html



I believe Tarsnap's only flaw is that it hasn't yet solved the cperciva-gets-hit-by-a-bus problem. Or perhaps I am mistaken?


That is indeed a problem which has yet to be solved. Or a potential problem, rather... I'm rather hoping it will never actually happen. ;-)

Seriously though, it is on my list of issues which needs to be addressed. Bringing in someone else and getting them up to speed on how to run everything is an expensive prospect, though.


Have you considered some sort of "enterprise" variant where large organizations use their own storage backend? Just 1-4 serious enterprise sized customers would cover the salary and overhead of 1+ good engineers. There's a lot of opportunity in that direction. Like any medium sized or larger company that needs to deal with compliance with education or medical privacy regulation, your tech is a great backup solution, and if carefully done doesn't increase your overheads much/ at all.

Otoh, I am just speculating :-)


Setting up tarsnap to use non-AWS infrastructure would be a significant amount of work. Setting up a "private" Tarsnap (but still on AWS) is something I could do for a company needing to store a large amount of data (say, 10+ TB).


doesn't AWS have some special clouds for companies that have compliance needs?

Either way, it might be worth looking into even just the "private" Tarsnap on AWS business direction as a way of growing revenue in a way that isn't tied strictly to data storage volume.

One way to go about this is to ask some of your larger business users if they would be interested in such a "private for them" self hosted Tarsnap variant. I think many of them would love a way to help you have revenues sufficient to support having an additional engineer (or two) working with you, which isn't possible for them to do with your current usage based revenue model.

Point being, theres probably an "enterprise" business model that stays true to your quality goals, but gives you more ahead of time revenue by a substantial amount. For some of your customers, there might be more value in supporting you being able to hire some engineers than there is in the cost savings element of the current revenue model. This can be an ancillary product that isn't the core one, but which still helps you have more resources to make the core better.

Talk with your larger customers, they're probably happy to chat with you given the chance.


You can jerry rig your own using ddar (http://www.synctus.com/ddar/), it's basically the de-duplicating part of tarsnap by itself.


i'm not requesting it for myself, i'm just suggesting ways to boost revenue so he can comfortably hire .


I'm pretty sure you could setup an arrangement where you don't have to pay someone a full time wage, but they step in if something happens to you.

Pretty morbid (but necessary) talking about what will happen if you're rendered inoperative.


Charge more.


OP here. I found them looking for a good backup solution.

They look amazing. Bug bounties for everything (including cosmetic stuff), completely transparent architecture, data deduplication and compression on the fly, they will be up even if two of Amazon's data centers fail, one pays per byte (traffic/store) and for all that they are pretty cheap.


Tarsnap is brilliant. cperciva is a well known and respected HN user too.

(https://news.ycombinator.com/user?id=cperciva)


> cperciva is a well known and respected HN user too.

Irrelevant.


But maybe the fact that he has been a Security Officer of the FreeBSD project for many years is relevant (for those concerned of privacy/security): http://www.nux.ro/archive/2012/07/Colin_Percival_no_longer_S...


Tarsnap is by prolific HNer (cperciva). It's been on HN, literally, hundreds of times: https://www.hnsearch.com/search#request/all&q=tarsnap


I've used tarsnap for a few years. It's good.

Tip: forget everything you knew about scheduling full and incremental backups, because you don't have to. Tarsnap provides logical snapshots and does all the diff magic for you.

(Hence tar ... snap)


Interesting. Tarsnap and rsync.net seem to alternate coverage on HN, and for the longest time I kept forgetting they were different, even though I had vague sense of confusion.

This one is Colin Percival's project.


We hold tarsnap in high esteem and wish Colin the best of luck. Further, we appreciate all of the good work Colin has done for the FreeBSD project.

Glad to see you on the front page.


Nope, I'm not rsync.net, but they provide a great service and contributions to FreeBSD, so they're obviously awesome people.


I understand data is encrypted before it ever leaves your machine, but I certainly wouldn't want encrypted data at-rest being exposed. Which gives me concern about Tarnap's terms: "I may provide information concerning your account and your use of the service to 3rd parties, at my sole discretion, if ... It is requested by law enforcement authorities ..." note - no requirement for a court order or subpoena.

https://www.tarsnap.com/legal-why.html#PRIVACYLAW


Note the last paragraph of that: However, I'm serious about saying "at my sole discretion" — if a law enforcement agency wants information, they'd better have a good reason for asking for it... and I don't consider the NSA saying "we want to have all the information you have, just because we feel like it and someone somewhere might be a terrorist" to be a good reason. Also note that unlike the situation with certain illegal wiretaps, I can't give your data to anyone, because it's all encrypted such that I can't read it.

This situation has never arisen, but if I'm confronted by a police officer and enough evidence that I'm sure they could get a court order, I'd rather be cooperative than force them to go through the courts. This doesn't mean that I'd give them any more data than they would get from a court order -- in fact, quite the opposite, since police tend to err on the side of requesting more than they need when going through the courts, and cooperating could change "seize a server" into "get a copy of the required data".


Why would you comply with foreign government agencies?


If the Libyan, or Iranian, or Chinese, or Russian police come knocking, I probably wouldn't.

Beyond that, it's a judgement call. Lots of countries have agreements to assist each others' police forces in obtaining evidence.


They would all be channeled through your local law enforcement though.

If swedish police comes to you directly, you don't have to comply, but if they go through the proper channels, the request to you, comes from canadian police.


A secure online backup service for Minix -- FINALLY!!


I'd like to support GNU Hurd too, but they make some unusual (but still POSIX compliant!) choices and I haven't had the time to work around them.


I'm thinking of using Tarsnap. Can I absolutely, positively, definitely trust that everything on Tarsnap's end is encrypted to best practice standards and that there is no reasonable way to get to my data (outside of the usual contract provided by encryption I mean)?

I don't have the option to know for sure by analyzing the source code myself so I'll have to trust the popular opionion of Very Smart People here on HN (well, I suppose I could if I spent a non-trivial chunk of the coming year reading up on crypto stuff).


The encryption happens on your end, not Tarsnap's.

The bar you're setting, though, is impossibly high. Can you absolutely, positively, definitely trust that your machine is not rooted and some nefarious entity isn't quietly collecting your every keystroke and snickering in the dark while stroking a white cat?

At that level of paranoia, you're probably best off using a device personally soldered together with hand-selected transistors that XORs all your backups with the white noise collected from your tv (while disconnected from cable, of course).



Well, the guy who wrote Tarsnap is himself a Very Smart Person here on HN, so I'd say the answer is yes, by definition ;)


"a Very Smart Person"

Did he win the Putnam?

spoilers: https://news.ycombinator.com/item?id=35083


Cyphertite may also be of interest: https://www.cyphertite.com/

Client-side encryption and deduplication, with source code. 8GB free, $10/mo for personal unlimited use, 10c/GB/month for business/enterprise. My main reservations are they seem to be based in one datacenter, and don't seem to have support for multiple keyfiles with separate read/write/delete/machine restrictions. Also not in FreeBSD ports :P


Tarsnap has been working really well for us, but one huge downside that we've noticed is how slow it is to restore data from say a 1TB archive.

Sometimes it takes more than 3 hours to restore a customer's 40MB directory.

If we were to have a full HD failure and had to restore the whole 1TB, that would probably take days. Days of downtime for us.

So depending on your situation, this might not be ideal.

I contacted Colin about this a few months ago and he mentioned that he is working on a faster version.


The "legal" section of the site is confusing.

"1. You may only access the service using unmodified Tarsnap client code which I have distributed" -- really? no API and no custom clients?


I imagine that if you want an API you could ask nicely... :-)


One of the things I love about Tarsnap is the bug bounties, which range from $2000 for being able to decrypt user data right down to $1 for cosmetic issues.


quick question here: is there a delay in Recent Activity?

I just signed up and used it on two servers like 30 minutes ago, but I don't see anything in the account activity except the payment info. I'm quite sure my servers sent stuff because I monitored b/w usage


The accounting data updates at midnight UTC -- so yes, there is a delay.


roger, thanks!


Anyone has experience with Duplicity? http://duplicity.nongnu.org/


I use it to save backups of my desktop and laptop home directories to a home NAS mounted with NFS, which later gets synced to another NAS.

It's a decent tool. I encrypt with a separate gpg key, do mostly incremental backups, and a full one every few months. Incremental backups take under a minute on my desktop (100K files, 11G). Full ones are kind of slow (which is why I set it to only do it every few months).

I haven't used the S3 support.


I use it to keep incremental backups of my VPS. It lets you use S3 as your backup destination and uses GnuPG to encrypt all files.


Have used Duplicity for a couple of years now, works very well although tricky to get it to store data other than in the US-East region. Currently backing up about 8 servers with it, mix of Ubuntu & Amazon Linux.


Colin - I dig what you're doing, but every time I go to the Tarsnap website, I'm turned off from using it for all of the reasons that have been discussed here ad nauseum since 2009. I'd love to see you succeed more; I think you deserve it, and I wish you'd just grab it.

see https://news.ycombinator.com/item?id=820705 and https://news.ycombinator.com/item?id=1639277, e.g.


I'm still not sure whether I can trust somebody else with my data, but I'm growing more and more concerned of hardware failure of my own backups. Might try Tarsnap one of these days.


It aledgely encripts all your data at the local machine, before sending it to the server.

Now, of course, if you are truly paranoid, you'll want to review their code first. I don't get why I can't simply mount a volume with encription and write there. Using code that is already on my machine (on the kernel, nonetheless) would make it a much simpler decision.


I don't get why I can't simply mount a volume with encription and write there.

You can (and you could even use tarsnap to back up the encrypted filesystem image if you want), but writing your data to an encrypted filesystem tends to expand the amount of data changing -- in the extreme case, if you create a copy of a file you'll write that many blocks of new encrypted data which needs to be backed up, whereas tarsnap would just say "hey, I recognize all these blocks, it's those ones I backed up earlier" -- so Tarsnap's encrypted backups of a filesystem tend to be many times more efficient than backups of an encrypted filesystem.


Did you just assume the source code wasn't available? It's not linked from the front page of the site, but if you go to the 'Download' page, it's right there for your review:

https://www.tarsnap.com/download.html


I don't get why I can't simply mount a volume with encription and write there.

If you want to mount an encrypted filesystem stored on S3, you might want to try ObjectiveFS. https://objectivefs.com


I would love to have something like the Backblaze client but working with Tarsnap as a backend: you install it and you forget about it. The sensible default configuration is good enough for average joe but you can tweak it if you want.



Well, it is not using Tarsnap as a backend and it seems that you have to add folders on the first launch, that is definitely not what I am looking for :)

Time Machine and Backblaze know how Mac OS is architectured and backup everything but useful files (logs mainly).


+1 for ARQ. Best S3 backup I have found on OSX. Sorry TARSNAP, I will only backup to an account I control.


This is absurdly expensive. If you have a 400GB laptop, fully backed-up and with negligible deltas, you are paying $1440 a year.

(400GB is not an absurd amount, either. I personally would ideally have about that much in my off-site backup.)


You're missing the value-added component here where tarsnap compresses and deduplicates data.


That's not exactly rocket science these days (see bup). What you're paying for with tarsnap is making it totally rock solid and as usual the last 90% of the work is also 90% of the cost.


The point that suxnoll is making, though, is that the cost is not nearly high to begin with, b/c the data is dedup'ed and compressed. You're justifying an issue that doesn't quite exist.

The query that suxnoll responded to supposed that you have 400 GB of data with small deltas. But that's only possible if you're filling your harddrive with files created from random noise from /dev/random and updating all your files monthly by more random noise.


> That's not exactly rocket science these days (see bup)

Actually, combining crypto and dedupe in such a way that the server can never tell what's on the client computer but the client can still reliably pick what's changed and dedupe it?

That's honest-to-goodness computer science. And Colin is the guy who invented this stuff.


Crashplan does that to save you bandwidth and them space but you get unlimited space for the same price.


I did not miss that. If a lot of the data in question is in the form of sparebundles, mp3s, and/or video — for me, it would be — then good luck meaningfully de-duplicating and compressing that.


Yeah. I have 800GB backed up with BackBlaze, and I pay just $5 per month. It's also possible to use your own private encryption key in their client.


I'm a little cautious with BackBlaze now (looking at switching to Arq [0]). I have about 700GB with them but a while ago my backup metadata became corrupted in the storage on their side. I worked through it with them to try to diagnose the issue. I even went the extreme length of buying a whole new mac mini in the hopes it would fix it. N such luck, so I had to reupload the full 700GB to them again. It's not an isolated case either - happened to a friend of mine too.

More recently we'd been trying to get to the bottom of some unusably slow macs (mountain lion). Turns out that the BackBlaze filelist service (that watches for changes to files) is very poorly behaved. Initially we discovered that it fights with apple's mds. Even once we'd fixed that (by stopping mds from watching a load of folders) it still ignores the scheduled backup times so it runs all day. BackBlaze support acknowledged the issue but the only workaround we've found is to have a cron job unload the BackBlaze daemon during the day to stop it destroying performance.

[0] http://www.haystacksoftware.com/arq/


> Turns out that the BackBlaze filelist service (that watches for changes to files) is very poorly behaved. Initially we discovered that it fights with apple's mds

That's really disturbing, does it misbehave and interact with /dev/fsevents directly instead of using the public fsevents api? If so somebody needs to get flogged over that choice.


That's not really a question I have the knowledge to answer (though with some more guidance I'd be happy to look into it). I'm aware of fsevents, and I know for example that dropbox have their own dbfsevents so they can filter out what they need closer to the kernel. My understanding is that it gives you an api to be notified of changes to the fs. Maybe the BackBlaze case is complicated because they backup the whole filesystem by default?

The behaviour we saw was a constant scanning of all the metadata of all the files on the filesystem - not just things that were changing. It seemed that instead of being notified about changes it was relying on comparing to its cache by polling. There were huge folders with years of photos in that hadn't been touched in months being scanned again and again all day.

To be honest, my business partner contacted BB about it and they didn't seem as concerned as we were. It was at that point that for the first time in a couple of years I found myself with the desire to investigate backup services again. It's a shame because BB has been good when I've needed them. A drive died recently with a lot of important data on it (years of photos and music) and they had a replacement to me in a matter of days.

I'm still running their backups at the moment but I also started Arq running last night. It seems to fit my use-case perfectly.

Edit: I just ran it a again to check. The bzfilelist process does an lstat on every file in the system one by one.


> Edit: I just ran it a again to check. The bzfilelist process does an lstat on every file in the system one by one.

Wow, so it doesn't even listen for fsevents? What a terrible design, I understand running a full scan every now and then to ensure you haven't missed anything while the filesystem has been offline (in case it's mounted on another machine), but holy crap.


I've been using http://labs.bittorrent.com/experiments/sync.html for a while and it is as reliable as you need.


Sync is not backup. For backups you also need the ability to look back at the previous versions of your files. For example in your sync solution if a file gets corrupted, all copies of the file will also have the corrupted bytes sync'd. With a backup solution you'll be able to rollback to a previous non corrupted version.


i've been poking round the site and i couldn't see an answer to this question - why do you need to encrypt the communication if the data themselves are encrypted? maybe i am misunderstanding, but it seems like each block is encrypted and the pipe between client and server is encrypted. is it because there are additional interesting metadata (if so, what)? or have i misunderstood?


There is metadata; whether you consider it interesting is up to you. The tarsnap client has to say "I'm machine X, and I want to store a block of data with tag Y"; and when you extract an archive, "I'm machine X, and I want to retrieve the block of data with tag Y". This could allow someone to figure out (a) that it's the same machine, and (b) that you're extracting an archive which contains data which was stored at a particular point in time.

Paranoia means encrypting everything which might be sensitive, even if you can't see any way for it to be abused.


thanks. (i understand the point about metadata; to be honest i was more worried i had misunderstood).


$0.30/GB/month is pretty steep :) You can use your own private keys with Crashplan, $60 a year for unlimited storage/bandwidth.


Crashplan also does dedupe and compression. The only downside is the client program uses a lot of RAM.


I concur. I've been very happy with the price and features, but the ever increasing amount of RAM (the larger your backup, the larger its RAM use) is worrying.


How does this compare to dump | aespipe | s3cmd?


If none of those commands perform dedupe and compression then tarsnap is much cheaper (like 20x cheaper).


It's safe to assume a bz2cat can be thrown in there. And I'm not sure how deduplication can work on encrypted dumps/files?


Tarsnap deduplicates first, then it encrypts.


I see, that makes sense. Any word on how it compares to simply shipping encrypted dumps to S3?


Depends how much tarsnap manages to deduplicate, really.


Truly paranoids are/will/should use Bitcoin or Litecoin. I don't get why pricing is USD$ only, it just seems that cryptocurrencies are perfect for this kind of service.


It's how your data is actually stored that protects it, not whether you've paid for the storage in tinfoil shekels or picodollars.


The "truly paranoids" probably have plenty of bitcoin burning a hole in their digital wallets.


As others have said, this depends on your definition of paranoia. If you're paranoid right here and right now, you can always go to your local 7-11/Walgreens/etc and get a prepaid credit card to pay for this.


A true paranoid would go to a store as far as possible!


... except that serial killers have been tracked on the basis of which areas they avoid (see http://en.wikipedia.org/wiki/Geographic_profiling).

If your opponent uses geographic profiling, you want to pick a point far away from you and then go to stores far away from that point -- then they'll identify that far-away point as being your origin. If your opponent is a game theoretician, on the other hand, you want to pick stores to visit completely at random, in order to avoid providing any information.


I'm considering taking BTC in the future. If coinbase were easier to integrate I would have done it already, in fact.


Perfectly serious snarkless question - how would that work, financially? You're reselling S3 storage with your value-added service on top - obviously you set your prices in a way that pays for your costs, time and generates some sort of profit. Short of adjusting your prices on a daily (or hourly!) basis, how would you be able to accept payment in a currency of such extreme volatility?


This is why I don't accept BTC yet -- unless the exchange rate settles down I'm only going to be able to do it via a service which lets me say "I want X USD; make it happen".

Coinbase comes close -- the only thing they don't do is allow me to specify at run-time how many USD I want; for some odd reason they need you to "create a button" with an API call before displaying that button on your site, which is a pain to deal with when you have variable payment amounts.


Why not add a simple public page to the Tarsnap website that allows the visitor to gift x dollars to the foo@example.com account? That way, bitcoiners can use bitspend.net etc. as a proxy payment provider for the time being.

Not leaking metadata on credit card statements that allows to infer that valuable data is stored at Tarsnap (and approximately how much), would be a practical increase in security: Such a leak might prompt an attacker to allocate more resources towards compromising the Tarsnap user's client computer.


IIRC you can say "I want users to pay $5 for this" and have Coinbase figure out how many BTC that is.


And I don't get why they should use Bitcoin. You're trying to secure your data, not obfuscate your financial transactions.


I'm guessing it suggests that those who are truly paranoid about others snooping into their tarsnap-saved data may want to make their tarsnap payments anonymously as well.



Well, the BC PST is stupid. I find it amazing that during the HST referendum campaign a large number of people said they thought the HST was good, but they were going to vote against it anyway to punish the BC Liberal government... and now we don't have the HST, the the government they wanted to punish has gotten re-elected.


try crashplan with boxcryptor




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: