Show HN: B2blaze – A Backblaze B2 library for Python

ereyes01 · on June 27, 2018

I use rclone's B2 driver (https://rclone.org/b2/) as an rsync-style backup solution of about 1TB worth of my pictures and other media. Also, it's encrypted with my own local key using rclone's crypt module (https://rclone.org/crypt/).

rclone supports multithreaded upload, and even has experimental support for FUSE mounting. However, the sync command gets you Dropbox-like behavior and can be cronned: https://rclone.org/commands/rclone_sync/

I really like the price of B2, I hope it stays low :-)

bad_user · on June 27, 2018

I'm also backing up my photo archive on B2 via rclone and it works great, it costs like $1 for 200 GB which is awesome. But I'm not encrypting the files.

Encryption raises the barrier for both third-parties and family. In case something happens to me, I want the technical barrier to be low enough for my family to discover the backup. Another reason is that in my experience encrypted data is more sensitive to bit rot and bugs than unencrypted data. I'm backing up important work stuff with Arq Backup for example and I've had my archive corrupted once. Not sure if it was the software's fault or the storage.

My rule of thumb is ... if the data should be discovered by my family in case I'm not around, then I won't encrypt it. Photos are not worth encrypting anyway, since a lot of them end up being shared on Facebook, Flickr, Instagram, etc, as photos are meant to be shared, at least with your family.

That said I still expect Backblaze or Dropbox to keep my data private. Not secret, but private and there is a difference.

eddieroger · on June 27, 2018

I've thought about that a fair a lot recently myself, and came to a different conclusion, but like yours as a counterpoint. What I've been considering was going analog and writing down how to get to the few places I've kept my secrets and employing something like a Yubikey to access an admin account on my computer and paper instructions in an envelope in the family safety deposit box. It's not a complex as a two-person thing I read about once, but seems like a nice compromise where I can still feel like I'm protected, but the nuclear keys exist if someone ever needed them.

Where do you draw the line around data that needs to be discovered? I'm thinking about instructions to access things like bank accounts or such that they may or may not already have access to, where I'd want them encrypted but accessible. Not that I've got secret Cayman accounts or anything, but financials are usually things i want heavily encrypted, but do want family access to in case of the worst.

sdhgaiojfsa · on June 27, 2018

I normally think of financials as things that don't need to be heavily encrypted, because we have laws limiting liability in case financial information is stolen and misused. What makes you feel it needs encryption?

scarface74 · on June 27, 2018

Not in all cases and even if thier are legal remedies it takes months to clear it up and while you are clearing up the effects of the fraud, the person who has stolen your identity is committing new fraud until they get caught.

I know people that were arrested for fraud because their identity was stolen and someone else was committing financial crimes in thier name. They always have to carry official police reports with them just in case they get arrested again.

eddieroger · on June 28, 2018

Honestly, I don't know, I've just always held banking info as something that must be treated as such. That said, it's not like I don't use credit cards on the Internet, but something about someone getting my bank account numbers does worry me. Plenty of people who know me well know my mother's maiden name, for example, and those two pieces of info together could spell trouble. I'm also in the camp of unless it needs to be shared, may as well treat it as private, and that includes encryption.

d0mine · on June 27, 2018

If it were true then an identity theft would be called a bank robbery https://youtu.be/CS9ptA3Ya9E

tushartyagi · on June 27, 2018

That's an interesting viewpoint to strike a balance between privacy and secrecy.

Can you shed some light on how you share the photos with non-technical family and friends, given that B2 has no app as such?

I have some experience with AWS/Azure and both of them do not support folders, and the workaround is to have slashes in the filename to create a virtual directory. Is it the same with B2?

bad_user · on June 27, 2018

I'm only using B2 for backup and it's doing a fine job, since from what I understand it also does unlimited history of the files, so in case files get corrupted, theoretically I still have previous versions around.

I keep my photos on Dropbox too, which is how I share them with family, besides sending files over WhatsApp, which is popular these days. But they only provide the history of changes for 1 month, or 3 months for Pro. As has been said before, solutions like Dropbox are not reliable for doing backups without specialized software like Rclone or Arq Backup, that can keep a version history.

My archive is currently less than 150 GB, so B2 is really cheap. I also have an offline backup on a portable hard drive. The idea with backups is that if you have data you care about, then it's a good idea to have at least 2 backups in different locations, made via different software.

> I have some experience with AWS/Azure and both of them do not support folders, and the workaround is to have slashes in the filename to create a virtual directory. Is it the same with B2?

B2 has folders, you can navigate them in the online interface. That said the service doesn't have polished apps available, being a platform like S3. It has no desktop or mobile apps currently. Although if they survive, given its price, I'm sure apps will happen at some point.

gizmo686 · on June 27, 2018

B2 doesn't have folders.

The online interface simply assumes that a slash in the filename should be represented as a folder; and they encourage apps to do the same. I believe they also enforce a max distance between slashes that is smaller than the max filename length.

What this means is that their is no way to, for instance, query what the root directories are, short of listing all files.

If you have a directory, you can list its contents using a prefix search (although the prefix need not be a directory, and this will not just list the toplevel elements)

brianwski · on June 27, 2018

Disclaimer: I work at Backblaze.

>their is no way to, for instance, query what the root directories are, short of listing all files

This is not true! Try this from the b2.py command line:

b2.py ls <bucketName>

That would list all the top level folders. The APIs are designed to support two things: 1) listing all files, or 2) navigating and listing the contents of each folder.

extra88 · on June 27, 2018

> it also does unlimited history of the files

Yes, but you're paying for that storage. If you sync 100GB of photos then locally make a small EXIF data change to all of them and sync again, you're now paying for 200GB of storage. B2 has Lifecycle Rules [0] to help keep versions from getting out of control and the API has methods for handling versions for clients like rclone [1] to use.

B2 doesn't have it's own desktop app but 3rd party desktop apps like Cyberduck use the API work with B2.

[0] https://www.backblaze.com/blog/backblaze-b2-lifecycle-rules/ [1] https://rclone.org/b2/#versions

candiodari · on June 27, 2018

I would like to point out that given [1] in the US that means online services don't agree with your balance. Anything that is hosted is accessible to law enforcement, and potentially to anyone that has a legal disagreement with you.

[1] https://en.wikipedia.org/wiki/Discovery_(law)

james_in_the_uk · on June 27, 2018

In the spirit of sharing details of open source tools that work well with B2, I am using restic and finding it to work well.

jpeeler · on June 27, 2018

Apparently restic now supports using rclone as a backend:

https://restic.net/blog/2018-04-01/rclone-backend

AgentME · on June 28, 2018

+1 for restic. I switched to restic(+B2) from duply+duplicity(+S3) (another backup tool supporting dumb remote storage, encryption, incremental backups, and snapshots) and life is so much better. Duplicity needs the duply front-end for too many basics, it regularly needs full backups to be made and stored, it's not built at all for random access (it takes eons in order to list or fetch a specific file from a specific incremental backup), and it needs a barely documented command line switch in order to not bug out with S3 if you have too many files (why is that option not default? Its default S3 configuration has a max limit on file size. Duplicity splits the archives containing your files, but it has one un-split archive listing filenames or some kind of metadata that it doesn't support splitting, so if that file gets too big, then your backups just start failing). Restic is nice.

artpar · on June 27, 2018

rclone is excellent piece of code and has solved a large part of issue for my own project. I maintain a slightly modified fork of rclone [1] and integrated that in daptin [2] as a server side piece. So I can seamlessly work with any cloud storage for assets/uploads thru daptin.

I wrote up a walk-thru some time back. The changes basically include replacing all the "fatal logs" with "error logs". I keep merging the upstream back regularly.

[1] https://github.com/artpar/rclone

[2] https://github.com/daptin/daptin

[3] https://medium.com/@012parth/daptin-walk-through-oauth2-goog...

asenna · on June 27, 2018

I'll definitely give this a try.

I'm currently backing up about 1.7TB of pictures to B2 from my Qnap NAS. Qnap has a backup app called Hybrid Backup Sync.

The problem is, while doing the one-way upload sync, the Qnap app downloaded a lot of data as well. I got confused why I was showing a lot of 'b2_download_file_by_name' API calls on the Backblaze reports page (600 GB upload resulted in 700 GB download calls).

I've also opened a thread on the forum - https://forum.qnap.com/viewtopic.php?p=673557

Contacted Qnap support and they said a little bit of download is normal but this looks abnormal. Logs are all fine on the Qnap so they suggested I contact Backblaze.

I wonder if anyone else has faced this.

kachurovskiy · on June 27, 2018

Why not use $60 you're paying per year for 1Tb of online storage to buy a 2Tb hard drive and use it for the backup?

ryanplant-au · on June 27, 2018

It's a "cheap, easy, reliable: pick any two" situation.

Cheap and easy: buy a 2 TB drive and keep it at home. If some disaster affects your home -- flood, fire, burglary -- it can take out your data and its backup.

Cheap and reliable: buy a 2 TB hard drive and keep it somewhere else. Keeping the backup up-to-date means regularly bringing the drove home, updating it, and putting it back.

Easy and reliable: pay for a service like Backblaze that automatically backs up all your files to a remote server.

There are other benefits to services like B2 especially, namely being able to access your backed-up files from any device or location, or being able to link people to your files on a high-speed server.

magnetic · on June 27, 2018

Or...

You put the 2 TB drive somewhere else (at a relative's) and keep it updated regularly via network.

That's my set up (but with a bigger drive).

At home, I have the master copy of the data on my file server. Then I have backup #1 that is in the same location and backup #2 that is in a different location.

Both #1 and #2 get updated at night with a "timemachine-like" backup system based on rsnapshot. The network traffic goes over ssh.

Remote backup system #2 cost a UPS, a RaspberryPi and an 8 TB drive, which is about ~$250-$300 total.

The initial sync is best done locally of course, but deltas can generally easily go over network at night.

Cheap, reliable, and (relatively) easy (if you're a geek, that is).

parliament32 · on June 27, 2018

I remember there being software back in the day that did exactly this.

The name escapes me right now, but basically you had to add "friends" in the software, then dedicate a certain amount of HDD space to it. It would then back up your files to your friends' computers, and theirs to yours. Backups were encrypted so your friends wouldn't be able to see your files.

It was a super neat idea, I wish I could remember what it was called so I could see if they're still around...

ajb · on June 27, 2018

There's an open source one called Tahoe: https://tahoe-lafs.org/trac/tahoe-lafs There used to be a company with a more usable similar product called allmydata, but it seems to be defunct.

riobard · on June 27, 2018

What happens if you move/rename a bunch of big files?

beagle3 · on June 28, 2018

Nothing bad, if you use "bup" or "borg"; The latter has better delete support, so is a better choice for rolling backup if you sometimes delete data. "bup" has the advantage that its repo format is git, which makes it easier to hack.

Both use rsync-style deltas to only send changes, but they use a content-addressable scheme like git so renames are a small metadata change record.

Also, both offer ftp and fuse interfaces if you need to access an older backup.

magnetic · on June 27, 2018

Bad things! rsync isn't smart enough (AFAIK) to know that files have been renamed or moved: it just sees files disappearing on one side and appearing on the other side, so the daily delta can get big.

riobard · on June 28, 2018

Yeah that sucks…

I'm looking into using the incremental diff/snapshot feature of btrfs to implement a more efficient solution :P

magnetic · on June 28, 2018

DRBD may be a good solution to this problem, although I haven't spent the time to see what it would take to replicate over ssh, and the kind of traffic that is incurred vs changes in origin.

https://en.wikipedia.org/wiki/Distributed_Replicated_Block_D...

riobard · on June 29, 2018

Interesting! Never heard of this before.

brianwski · on June 27, 2018

Disclaimer: I work at Backblaze.

> Why not use $60 you're paying per year for 1Tb of online storage to buy a 2Tb hard drive and use it for the backup?

We HIGHLY recommend both. There is a philosophy called the 3-2-1 philosophy of backups. You should always have three copies of the data, two onsite, and one remote. https://www.backblaze.com/blog/the-3-2-1-backup-strategy/

billyhoffman · on June 27, 2018

Online/offsite backup is a different use case. They are paying $60/year so that, if their house burns down, gets flooded, disk gets fried by lightning, they still have their family pictures.

Local backup is cheap and fast, and you should do it too. But it doesn't provide geographic redundancy.

kachurovskiy · on June 27, 2018

Another $60 buys a water and fireproof safe for the hard drive. I assume it also helps against lightning :-) I honestly think remote backup is an overkill for personal needs and new risks you get by placing your valuable data on someone else's hard drive are not always internalized.

scarface74 · on June 27, 2018

And you take your non redundantly stored hard drive out in 5 years and you can’t retrieve data on it.

But realistically, isn’t it worth $60 a year to have a constantly backed up hard drive? The alternative is to take the hard drive out of the safe every so often and do a backup and put it back.

beagle3 · on June 28, 2018

I've recently recovered data from a pair of 15GB and 20GB drives I last used in 2001 (and were stored in an ordinary closet of a house that experienced inside temperatures ranging from 5C to 30C over that time, and great humidity/dryness flactuations over those 17 years). There were 16 bad 512-byte sectors on one of the drives, but otherwise all worked.

Modern higher density drives are probably less resilient, and who knows how flash drives will fair after 17 years in the closet - but my experience so far is that HDDs trump backup tapes on every measure including costs except at extreme sizes (at this point in time, into the petabytes).

kachurovskiy · on June 27, 2018

I've hard a few hard drives fail in the past decades but I've always been able to retrieve the data. I'm far less confident in a remote company.

Your point is valid and constant, seamless backup is indeed a good thing to have. Whether it's worth $60/year is one's own decision.

eitland · on June 27, 2018

> I've hard a few hard drives fail in the past decades but I've always been able to retrieve the data

Care to share what kind of procedures you use?

I've recovered some data for friends and employers who want it back but aren't prepared to pay > USD1000 for it but if I cannot connect to the disk I'm lost.

(My tricks: tilting the disk, freezing the disk, leaving ssds powered on but sata unconnected, and even before that photorec and ddrescue etc.)

Note: don't do any of the above if data needs to be recovered at any cost, in that case just contact a data recovery company.

nkellenicki · on June 27, 2018

By putting your drive in a safe, you now have to unlock it, take it out, plug it in, sync, unplug it, and lock it away. Which is a hassle if you want to stick to regular (hourly? daily?) backups.

It is proven that if you introduce friction into process, over time that process will be followed less.

voidmain0001 · on June 27, 2018

The other point to remember is BB is not for archiving. BB states this itself. At least with external drives it is possible to archive.

scarface74 · on June 27, 2018

True, the standard Backblaze offering will erase your backup if you’re not online for six months. If you have an external drive that’s offline for 30 days, they will erase your backup.

A B2 based backup solution costs more but you don’t have those limitations.

voidmain0001 · on June 27, 2018

Accord to Backblaze the retention period is thirty days for the personal plan [1]. Where do you see six months? If a file is deleted on the source device, and the deletion is synchronized to the BB repository, the recovery window is 30 days.

[1] https://help.backblaze.com/hc/en-us/articles/217666628-What-...

scarface74 · on June 27, 2018

Different thing: https://help.backblaze.com/hc/en-us/articles/217664898-What-... If your computer is offline for six months and doesn’t connect to BackBlaze, they erase all of your data.

If you have an externally attached drive on your computer and it isn’t connected in 30 days and your computer is online, they erase that backup of the external drive.

If you reconnect your computer after a month and you don’t have the external drive connected, they erase your backup.

TomVDB · on June 27, 2018

That was my reasoning as well until burglars took my NAS (in addition to the laptops, of course.)

tscs37 · on June 27, 2018

It's much more easy and reliable this way.

I perform hourly backups of my VPS and personal computers, storing it all into a giant repo on OVH and B2. If my house goes up in flames, I have to redo, at worst, 1 hour of work.

Additionally I won't have to deal with expanding to a 4TB drive eventually.

ghaff · on June 27, 2018

I use Backblaze's standard service. I look at it as cheap belt and suspenders insurance. I do onsite backups using Time Machine--along with the occasional sync to another drive.It's offsite and it's a completely independent backup mechanism. $50/year or so is essentially not worth worrying about in this context.

WRT your other comment: Yes, there's some small level of incremental security risk but there's so little that's genuinely sensitive in my storage, I'm willing to take that risk. And, yes, it's probably overkill but for the cost, there are a lot of things I spend money on that are probably unnecessary :-)

brianwski · on June 27, 2018

Disclaimer: I work at Backblaze.

> I do onsite backups using Time Machine, and also Backblaze.

You are doing everything correctly. You are following the 3-2-1 backup philosophy, which is: "3 copies of the data, 2 copies locally, 1 copy remote". Here is a blog post we wrote about it: https://www.backblaze.com/blog/the-3-2-1-backup-strategy/

chillydawg · on June 27, 2018

Hard drives fail and you need them physically to access the data. Online backups allow you access everywhere and are generally much more reliable than a single, local disk.

ereyes01 · on June 27, 2018

That was my previous solution. It worked well for a couple years until the disk crashed and I lost some of my data :-(

cbluth · on June 27, 2018

What do you do when someone spills coffee on your drive?

chowell · on June 27, 2018

Presumably so that the backup is offsite.

gordo4 · on June 27, 2018

rclone is vulnerable to data exfiltration attacks

https://www.danieldent.com/blog/restless-vulnerability-non-b...

howeyc · on June 27, 2018

Am I reading this right? Google/B2/... might send your data to another URL you didn't expect.

Not sure why that matters, or why it's an attack. Since they have your data anyway, as that's the whole point of the service, to store your data on their hard drives. Why go through the trouble of sending it elsewhere? To play games with your data for giggles?

namibj · on June 27, 2018

No, the API can tell your software to send some private LAN files, e.g. some IP-filtered secret NFS store, to an URL of it's choosing (so to itself, or your competitor). This is bad, as long as you don't heavily jail and firewall the software to prevent it from ever accessing anything it shouldn't (need to).

Operyl · on June 27, 2018

I quickly skimmed, but this entire attack is assuming that the attacker has successfully MITMed the API. At that point everything is already nuked, so of course you can fabricate any number of attacks. Did I miss something important?

MacroChip · on June 27, 2018

Cool setup. I'm not a Backblaze customer, so I'm curious. How is your setup better than using their client for personal backup? The webpage mentions threading and encryption.

tomfanning · on June 27, 2018

Backblaze's backup product doesn't support backing up a NAS. It is for backing up a single Windows or Mac computer, and priced as such. They state this policy is to avoid abuse. Fair enough.

Backblaze's object storage product, B2, is priced per GB-month, so you pay for what you use. Fair enough. Because it is charged this way, it is open for whatever creative use developers can come up with.

I use B2 because I'm locked out of using Backblaze Online Backup - and that's fine with me, because it's the right product for the job.

bollockitis · on June 27, 2018

That's interesting. I have been considering a similar setup. Are you running this from Linux?

ereyes01 · on June 27, 2018

Yes, I do all this on Linux, but they support other environments too (most of the environments Go can build for): https://rclone.org/downloads/

weitzj · on June 27, 2018

It seems like nobody mentioned it, yet. Another great product is https://www.rsync.net/ and this just works. There are no bad surprises. You can overshoot your backup limits, and they will send you an email to fix this. But still you have your backup.

Your interface is rsync/scp/ssh.

They give you ZFS snapshots, you can use s3cmd from their machines, so you can delegate uploads to S3 via rsync.net.

Our prior backup setup was duplicity with GPG hitting S3, and this sometimes was flaky for listing the current keys.

Glad I read HN, I heard about rsync.net. They even have/had a HN discount. You should use the search functionality to find other threads.

https://www.rsync.net/products/platform.html

https://arstechnica.com/information-technology/2015/12/rsync...

cristoperb · on June 27, 2018

rsync.net also supports borg[1], which is how I use it. They also have/had a special pricing for borg users[2]

1: http://borgbackup.readthedocs.io/en/stable/ 2: https://www.rsync.net/products/attic.html

rsync · on June 27, 2018

For those that don't know, borg is a backup utility[1] that has been called the "holy grail of backups"[2].

It takes your plaintext files and directories, chops them into gpg-encrypted chunks with encrypted, random filenames, and will upload (and maintain) them, with an efficient, changes-only update, to any SFTP/SSH capable server.

My understanding is that the reason people are using borg instead of duplicity is that duplicity forces you to re-upload your entire backup set every month or two or three, depending on how often you update ... and borg just lets you keep updating the remote copy forever.

[1] http://borgbackup.readthedocs.io/en/stable/

[2] https://www.stavros.io/posts/holy-grail-backups/

alexktz · on June 27, 2018

Great product but too $$$$$$$$$$$$$ for my taste.

coaxial · on June 27, 2018

The Borg/Attic/HN discounted price is a quarter of the regular price IIRC. Well worth it IMHO. They're reliable, answer emails very fast, and are happy to provide technical help should you need it to configure your system.

tbe · on June 27, 2018

Well, it's 3/4 the regular price, and they disable the daily filesystem snapshots. [0]

So it's still 6x the price of B2, which made me go with Backblaze.

[0] https://www.rsync.net/products/attic.html

rsync · on June 27, 2018

Our current, headline price is 4c per GB, per month and the borg accounts are 2c - so it is half-priced. (we're in the middle of a price drop this month - that page still had the old 3c rate on it ...)

The ZFS-created snapshots of your filesystem are disabled - it is assumed that you will handle your retention/point-in-times with the borg tool itself (we don't like doing snapshots of snapshots ...) Also, while you get full technical support for the use of rsync.net in general we offer no technical support for your use of borg.

The assumption is that borg users know what they are doing - and that assumption has proved to be correct.

stavros · on June 27, 2018

I use their Borg discount as well, and am extremely happy with it. I do wish it were cheaper, but I get 150 GB for $50/yr, which is enough for me with careful rationing.

I wish I had a TB for $50 so I didn't have to be so judicious with my photos, but the ability to use Borg is so fantastic that I can't complain.

rsync · on June 27, 2018

Your account is about to get larger ...

We lowered the borg rate to 2c this past month and, as is our policy, existing accounts get enlarged to match ...

So in the near future, your 150 GB account will become ~208 GB in size ...

stavros · on June 27, 2018

That's great news, thank you!

pnutjam · on June 28, 2018

You can get get a 500GB storage server for about that price, a bit more for a TB. I run an Ubuntu instance and use borg to backup my home systems to this server. my affiliate link: https://billing.time4vps.eu/?affid=1881

(straight link: https://www.time4vps.eu/storage-servers/)

stavros · on June 28, 2018

That's pretty useful, thanks!

anc84 · on June 27, 2018

It's almost like they try to be a self-sustaining, viable business instead of burning through VC money with irresponsible pricing that kills fair competitors.

Karunamon · on June 27, 2018

The snark is really not necessary or contributing to the conversation.

AWS Glacier, hardly a VC-backed startup, charges a literal tenth of the cost. Given that most people are going to be holding on to their backups rather than retrieving them regularly, the pricing math works out better even though it's a bit more complicated.

Say you push 2TB up to Rsync, AWS Glacier, and Backblaze B2, and you need that data back a year later.

Rsync will cost you $80x12: $960, bottom line.

Glacier will cost you $8.00x12: $96 for the storage, plus .01 for a thousand retrieve requests, plus 0.01 per gigabyte retreival, plus 0.09 per gigabyte transfer.

$96 + .01 + $20 + $180 = $296.10

Backblaze B2: $10x12 = $120 for the storage, plus 0.01 per gigabyte retrieved:

$120 + $20 = $140

I'm guessing the "startup" dig was directed at Backblaze, but they're actually charging more for the plain storage than AWS, where you're paying more for the bandwidth!

brianwski · on June 27, 2018

Disclaimer: I work at Backblaze.

> I'm guessing the "startup" dig was directed at Backblaze

And ironically, Backblaze is 99% self-funded and doesn't have VC funding and no deep pockets. We're profitable, the only way to stay in business without VC funding.

(Note: we did have a tiny "friends and family" round in 2009 which was 9 years ago. Plus we sold a small percentage of the company to a silent investor who didn't even get a board seat, no votes, no control. 100% of the board of directors are founders of Backblaze.)

rsync · on June 27, 2018

"AWS Glacier ... charges a literal tenth of the cost."

Amazon Glacier and Google Nearline are not comparable products. What we offer at rsync.net is a live, online, random access filesystem - so the appropriate comparison is with Amazon S3.

I believe our current pricing is reasonably comparable to S3 - and at larger quantities is actually cheaper. Also, the borg pricing (2 cents) is cheaper at any quantity.

Karunamon · on June 27, 2018

Fine... but your marketing is literally all about backups. The front page of rsync.net is "cloud storage for offsite backups".

If you hadn't told me this, or if I don't call a human on the phone number (why? this is an immediate turnoff) listed on your cloud storage page, or go read on "open platform" (which sounds less like a tech page and more like a marketing page), I'd never know about it.

cahuk · on June 30, 2018

Speaking of ZFS, I use B2 with zfsbackup-go. Sanoid is making snapshots and zfsbackup-go is uploading them. [zfsbackup-go](https://github.com/someone1/zfsbackup-go)

krn · on June 27, 2018

Hetzner Storage Box[1] is an interesting alternative to Backblaze B2. It's not cloud-based, but provides free automated snapshots, free 1 Gbps bandwidth, and supports FTP, FTPS, SFTP, SCP, rsync and BorgBackup[2].

[1] https://www.hetzner.com/storage-box

[2] https://wiki.hetzner.de/index.php/Storage_Boxes/en

hardwaresofton · on June 27, 2018

+1 for Hetzner, they have an amazing suite of services for excellent prices.

Hetzner's robot marketplace[0] changed my (operations) life.

[0]: https://robot.your-server.de/order/market

smcleod · on June 27, 2018

Be very careful with those - they do not use ECC memory and thus suffer from the many potential security attacks and your software needs to handle the odd memory error.

josteink · on June 27, 2018

> Be very careful with those - they do not use ECC memory and thus suffer from the many potential security attacks

ECC hyperbole much?

When you've decided to put your personal data somewhere in a cloud on the other side of the internet, this kind of stuff should probably be absolutely on the bottom of the list of things you need to worry about.

anc84 · on June 27, 2018

Not true at all, you can simply filter for "ECC" and get offers starting at 26€ for a 6TB of storage, 16GB ECC RAM.

krn · on June 27, 2018

Many larger (64 GB RAM and more) servers from Hetzner Robot in fact do have ECC memory.

Zekio · on June 27, 2018

so exactly like your average desktop? whats the problem it isn't like you share the hardware with others

apkallum · on June 27, 2018

How did they do so?

hardwaresofton · on June 28, 2018

tl;dr - Shifted my knowledge of the hosting market by exposing me to cheap dedicated servers

I had previously thought that dedicated servers were doomed to be too expensive/heavy weight for me. I also felt like most VPS providers charged too much (especially true in the case of AWS -- $10/mo for a t2.micro is ridiculous).

I first found INIZ (http://iniz.com/) and was super happy with them, then someone introduced me to Hetzner Robot Marketplace and I was blown away by the affordable prices (+/- setup fee) and have had one ever since. Hetzner also has a cloud offering that is also pretty great -- slight limits on operating system choice and some other features and you can have very competitively priced machines in a more cloud-friendly fire-up-and-go format.

I wrote about the revelation here (including link to HN thread where it happened): https://vadosware.io/post/fresh-dedicated-server-to-single-n...

Now I have a ~6 Core (12 vCore/hyper-thread) 24GB RAM monster that I can run experiments with for a decent monthly price.

If you go to other providers like Packet, OVH or Amazon, you're going to see way higher prices -- I'm don't have too many requirements so Hetzner worked for me.

pnutjam · on June 28, 2018

I use a time4vps storage server, it's less expensive. I also use it as a borg destination, among other things.

https://billing.time4vps.eu/?affid=1881 (affiliate link)

https://www.time4vps.eu/storage-servers/ (straight link)

urtrs · on June 27, 2018

I created an account with them recently and they asked for my passport or id card for authentication. Is this usual?

deno · on June 27, 2018

Yes, very usual for EU companies. Make sure to blank anything sensitive. If this seems weird, consider how many places require your Social Security in US.

ar-jan · on June 27, 2018

Not in my experience. Hetzner is the only one out of a dozen that I recall asking for ID.

paulie_a · on June 27, 2018

It's generally finance related that requires a social security number. If a hosting provider asked they would get 123-12-1234

benbristow · on June 27, 2018

Yep. OVH (The French) have a pretty strict ID policy too. Nothing to worry about.

ar-jan · on June 27, 2018

I didn't have to provide ID with OVH, just fill out my address.

benbristow · on June 29, 2018

You must have been lucky then.

I had to go through a manual send photocopy of my ID process etc.

Moter8 · on June 27, 2018

Usually hosters do this so they have some verfied name or address in case of abuse.

untangle · on June 27, 2018

> they asked for my passport or id card

This is often for tax reasons -- specifically, whether VAT can be waived.

m_mueller · on June 27, 2018

What do you mean with "not cloud-based"? You mean IaaS rather than PaaS?

krn · on June 27, 2018

Hetzner is a bare-metal company operating since 1997, which hasn't released a cloud-based storage yet.

"Your files on Storage Boxes are safeguarded with a RAID configuration which can withstand several drive failures. Therefore, there is a relatively small chance of data being lost. Please note, however, that you are responsible for your data and there is no guarantee from Hetzner against potential loss of data. The data is not mirrored onto other servers."

https://wiki.hetzner.de/index.php/Storage_Boxes/en#Reliabili...

m_mueller · on June 27, 2018

Thank you - so it‘s just less redundant than e.g. Backblaze (I assume). That‘s an important distinction. See, I‘m not a fan of using buzzwords to describe anything in more detail. „Cloud“, „AI“, „Big Data“, „NoSQL“ etc. is (sometimes) fine to get non technical people interested, but useless to say anything meaningful about a system IMO.

homarp · on June 27, 2018

I can't find any guarantee on reliability for Backblaze, their SLA ( https://www.backblaze.com/company/sla.html ) has

"Backblaze will make commercially reasonable efforts to ensure that B2 Cloud Storage is available and able to successfully process requests during at minimum 99.9% of each calendar month. "

nkellenicki · on June 27, 2018

Can I ask, what makes that setup "not cloud" vs Amazon S3? Amazon doesn't make public their hardware setup, merely that they offer various "9s" of reliability against data loss.

What if, for arguments sake, Amazon's secret setup is exactly the same as Hertzner hardware wise, with Amazon merely putting a number against the reliability that setup offers?

krn · on June 27, 2018

> Can I ask, what makes that setup "not cloud" vs Amazon S3?

"The data is not mirrored onto other servers." says it all.

It's like renting some shared space on a dedicated server with auto-monitoring.

mprev · on June 27, 2018

How is the reliability?

metafunctor · on June 27, 2018

It's colocated hardware, with a thin service layer on top so they set it up for you as a service. The service and hardware are quite reliable, but you can (and will) still lose all your data in case the hardware fails. You have to create your own processes and layering to get to an adequate number of 9s for whatever kind of reliability you're looking for.

krn · on June 27, 2018

After the launch of Hetzner Cloud six months ago, Block Storage and Object Storage might be on their way[1].

[1] https://github.com/hetznercloud/terraform-provider-hcloud/is...

tiatia123 · on June 27, 2018

"but you can (and will) still lose all your data in case the hardware fails. "

Can you elaborate this? I mean besides if the center burns down? They offer "snapshots". If the drive fails, they can not recreate the last snapshot?

JosephRedfern · on June 27, 2018

I suspect these are like ZFS snapshots -- they're not backed up, but allow you to restore your data to a previous point in time (like git history).

chillydawg · on June 27, 2018

I think that basically the guarantee exists as long as the raid array within the box is operating. There's no other safety.

JosephRedfern · on June 27, 2018

Unless I'm mistaken (entirely possible), it's not colocated hardware -- Hetzner own both their data centres and the hardware in them.

metafunctor · on June 27, 2018

Right, I meant to say it's "like colocated hardware". Hetzner own the hardware, but the service guarantees are similar to where you own the hardware. If the hardware fails, tough luck.

iicc · on June 27, 2018

Big power outage in May (at least on some dedicated servers).

https://www.webhostingtalk.com/showthread.php?t=1712750

https://www.golem.de/news/rechenzentrum-falkenstein-serverau...

metakermit · on June 27, 2018

BTW, the Backblaze team themselves officially support a Python implementation of the API wrapper (the B2Api and Bucket classes):

https://github.com/Backblaze/B2_Command_Line_Tool/blob/maste...

As these are used internally in their CLI, there's probably a higher chance that they'll continue to work in the future.

gsibble · on June 27, 2018

Indeed. It's just never been publicized and personally I think the implementation could be better. Although with so many stars on this repo now I plan on maintaining it for the foreseeable future. Start adding feature requests everyone! :)

PS: They also don't even use their own library in their code examples so I don't think they meant it to be used in that fashion.

metakermit · on June 27, 2018

That would be great – it's always good to have some competition ;)

Regarding feature requests I'd love to see a well-maintained B2 Django Storage. I'm currently using an existing implementation, but it's not that well maintained:

https://github.com/royendgel/django-backblazeb2-storage

gsibble · on June 27, 2018

I saw that one. I'm not a particular fan of Django but integrating my library apart from Django's storage library wouldn't be difficult. Neither would be building a django library on top of mine. Any takers? :)

scarface74 · on June 27, 2018

I use Backblaze now and once I get my NAS, I’ll probably end up using a B2 based backup. But let’s make an honest comparison. Backblaze does not replicate your data across data centers. The standard S3 storage class does (0.23/gb). The comparible storage class for S3 is one zone infrequent access (.01/gb). B2 still comes out ahead, but I wouldn’t use either one for primary storage. For thier suggested “3-2-1” backup strategy, sure.

Then again, just for backup, I could use S3 glacier for $.004/gb. That’s cheaper than B2 and I get multiple AZ storage. The data charges would be higher - but its backup. If catastrophe struck and I lost my primary and my local backups, getting my data fast is the last thing I would worry about.

josteink · on June 27, 2018

> Then again, just for backup, I could use S3 glacier for $.004/gb

Having done that in the past, I have to say that's just a million times less practical than basic S3-like storage. And if you want to automate that setup, Glacier is even worse.

scarface74 · on June 27, 2018

Why do you say that?

I could see using something like rsync + Cloudberry (maps S3 and make it look like a network drive). Set it up to use one zone infrequent access, and then after x days use a lifecycle policy to move it to Glacier.

My use case for backups is solely for movies and music. For source code I use hosted git repos, pictures Google photos, and for regular office documents, they are either on Google docs or One Drive.

josteink · on June 27, 2018

Last time I used Glacier, it was a separate product from S3 and had its own API.

You had to upload pre-prepared "tapes" for backups. You couldn't mutate an existing backup, you had to create a new one. And frequently fetching and/or deleting existing "tapes" (backups) would cost you money (more so than the original cost of the backup).

That meant you couldn't just ZIP it all up, backup the latest version and the delete the previous one to avoid being doubly charged for storage either.

Basically at time of archiving you needed to determine what was already archived and create a new bundle with only what's new, and archive that only. In the same spirit, restore meant piecing together multiple such tapes into a full restore-set.

Absolutely terrible. It was like having traditional backup-software constraints, but none of the software-support.

If Amazon has improved on that now, good for them, but I figured they probably had to if they wanted any users at all.

scarface74 · on June 27, 2018

Honestly, I’ve never used the Glacier api directly. I’ve only used it as part of a lifecycle policy where objects were stored in S3 and then using the console to have AWS migrate data after a certain amount of time.

My offsite backup would only be accessed in the case of catastrophic failure - my primary and local backup data is unavailable. Data transfer does cost more but if I had that type of catastrophe, worrying about getting my movies back for my Plex server would be of little concern. Everything that I would care about - source code, photos, documents etc are stored other places.

That’s another strike against Backblaze backups (not B2 based backups). When we were in between residences last year - we left our apartment when the lease was up and stayed in an extended stay waiting for our house to be built, my main computer was offline for 5 months. One more month and my Backblaze backup would have been deleted. I forgot about it and I restarted my computer before I reconnected my external drive - so my backup from my external drive was erased from Backblaze as soon as I came back online. It wasn’t catastrophic but irritating. Luckily I have gigabit upload.

Aissen · on June 27, 2018

According to online.net "cold storage" C14 comparison, they are cheaper than Backblaze, most of the time:

https://www.online.net/en/c14

qeternity · on June 27, 2018

C14 is really not at all an object store. Getting data in and out is a huge pain, even compared to other cold stores like AWS or OVH. We evaluated them and passed.

Aissen · on June 27, 2018

Thanks, I was looking for this type of feedback. The pricing comparison is still interesting.

Could you do a summary of your evaluation for others that didn't test most services?

qeternity · on June 27, 2018

Yeah, here's a quick stab:

S3 - not much to say, fast, durable, expensive...the gold standard. Given limitations of below, we use for rotating nightly backups despite cost.

Glacier - great for cold storage/archive, but has 90 day minimum

OVH hot - open stack based, cheaper than S3 but not absurdly cheap, charged for egress even intra-DC which is absurd and kills many use cases. They have crippled OpenStack permission management (i.e. no write-only keys with lifetime management per bucket which is necessary for doing backups securely)

OVH cold - charges for ingress but then storage is crazy cheap, and egress not as bad as Glacier. This is our preferred archival option.

C14 - not object storage, more like a "cold" ftp dump

B2 - pricing is epic, S3-incompatibility is a pain and lack of Backblaze-sponsored libraries (the library in the python b2 cli is not a proper API)...we've been working on adding B2 to WAL-E. However, their permission/user management doesn't cut it.

Wasabi - S3 compatible, great pricing if not for 90 day minimum, which they hide in the fine print

brianwski · on June 27, 2018

Disclaimer: I work for Backblaze.

> B2 - their permission/user management doesn't cut it

Have you seen the new "Multiple Application Keys" APIs we have published docs for (and the release coming in a week or two)? I'm curious if they satisfy your permission needs. The docs are here: https://www.backblaze.com/b2/docs/application_keys.html

A screenshot of the web GUI to these keys is here: https://i.imgur.com/RdlgdAs.jpg (NOTE: the web GUI does not expose the full power of the multiple application keys, it is meant to be easy to use and hopefully satisfy 95% of customer's needs.)

AgentME · on June 29, 2018

That looks great for me at least! I'd been using B2 a bit personally, but had written off using it for any serious projects because of the inability to make extra restricted per-project(bucket) API keys.

Aissen · on June 27, 2018

That's awesome value, thanks !

deno · on June 27, 2018

It’s more like Amazon Glacier.

  > Q: How do safe-deposit boxes work?

  > A: The safe-deposit box is a free temporary storage space 
  > that lets you to upload your files before creating an 
  > archive.

  > The safe-deposit box can be accessed for free using Rsync,
  > FTP, SFTP, SCP protocols for a period of 7 days and 
  > supports up to 40TB.

  > After 7 days or when you archive your safe-deposit box, 
  > your data are permanently stored on C14.

  > When unarchiving, your data are delivered untouched, 
  > including file metadata.

gsibble · on June 27, 2018

B2 is warm storage much more like S3 while cold storage is like glacier.

logeek · on June 27, 2018

Is there a fundamental reason why B2 is (and will remain) cheaper than S3, or is it just because they need to compete with AWS and once successful the prices will be the same (or higher)?

programmarchy · on June 27, 2018

From my understanding, they've put a lot of work into lowering the cost of storage. I know at one point they were using arrays of consumer-grade drives, and they've done a bunch of analysis on the cost and reliability of drives on the market. They also created the "Storage Pod" [1] to maximize storage density.

[1] https://www.backblaze.com/blog/open-source-data-storage-serv...

Aissen · on June 27, 2018

Every cloud company does that. Google, Amazon, MS, all use consume-grade drives with software on top to reduce costs and increase reliability at a fraction of traditional enterprise storage solutions. Scality even provides a proprietary solution to do the same on-premise.

agopaul · on June 27, 2018

Objects on S3 are replicated on 3 AZs (datacenters) by default and I can't find any info on B2 if they also replicate the data on multiple DCs. That can definitely change the cost per GB for them if that's the case.

gsibble · on June 27, 2018

B2 has 99.999999% durability

scarface74 · on June 27, 2018

That’s nice in theory, but if B2 has a castrophic disaster in thier one data center, that durability goes to zero.

snvzz · on June 27, 2018

That's just 8 9s. S3 has 11 9s.

gsibble · on June 27, 2018

Depends upon your client SLA then. You can also store the file twice on B2 for still less than the cost of S3 :)

kccqzy · on June 27, 2018

Will failures be correlated then?

snvzz · on June 27, 2018

Then you should be comparing it to reduced redundancy S3, which is cheaper than regular S3.

pas · on June 27, 2018

No (no fundamental reason). Yes (same or higher).

Also, AWS costs a lot just in traffic. A lot of people store things on S3 and then make that publicly available.

AWS is "da cloud" for a lot of people. So they ride that wave high and mighty, charging a lot for everything they can easily measure. People will just pay it and will try to [post-]rationalize how it's cheaper than other providers, because AWS is better.

scarface74 · on June 27, 2018

I posted a longer version earlier. But the correct B2 vs S3 comparison is 1 zone infrequent access. B2 still comes out cheaper especially when you consider transfer costs but not by as much.

voltagex_ · on June 27, 2018

Are you able to try multithreaded uploads too? I found that single stream uploads were too slow (< 10 megabytes per second) but I could get ~35 megabytes per second from packet.net to B2 by using 4 threads.

Edit: removed incorrect stuff.

thisacctforreal · on June 27, 2018

If you use the large_file API (needed for multithreaded uploads), you do only hash chunks not the file.

brianwski · on June 27, 2018

Disclaimer: I work at Backblaze.

> If you use the large_file API (needed for multithreaded uploads)

We recommend for small files that you use multi-threaded where each thread sends a totally separate file. So if you have to upload both cat.jpg and dog.jpg, you upload cat.jpg in one thread and dog.jpg in another thread.

Based on the Backblaze architecture, that means cat.jpg will be sent to one "vault" in the Backblaze datacenter with one thread, and dog.jpg will be sent to a totally different "vault" in the Backblaze datacenter with another thread. This scales incredibly well, in that it should be twice as fast for two files, and 20 times as fast for 20 files if you do it correctly.

Source: I wrote a lot of the Backblaze Personal Backup client, which uses this philosophy.

voltagex_ · on June 27, 2018

Okay, I'll have to go back and have a look at some of the client libraries I tried - it may have been the machine I was using wasn't quick enough to hash ~30GB in a reasonable amount of time.

gsibble · on June 27, 2018

I haven't added multi threaded stuff yet because I wanted it to be compatible with single threaded web servers like flask and django. I can and will add it if you want to add an issue.

voltagex_ · on June 27, 2018

OK - if that's your target I'm not going to hassle you. I'm probably going to choose something written in Golang for other reasons in the end.

gsibble · on June 27, 2018

I'm definitely going to add multi-threading. I wrote this library in 4 days (look at the commit history). It was just something fun to do.

tobias3 · on June 27, 2018

This whole library wouldn't be necessary if Backbalze implemented a S3 compatible API. They give reasons like being able to load balance on the client for their API (which I do no think is a good reason), but ultimately they just push work from their end to a lot of applications and developers.

Maybe it also has a strategic advantage? Now every product has to announce they support B2 whereas nobody has to announce they support Wasabi, because they support any S3 compatible storage such as AWS S3, Google Cloud Storage or Wasabi.

ianamartin · on June 27, 2018

Meh, I can see why they didn't. they aren't really in the same business. It makes sense for followers to implement APIs compatible with market leaders. Riak CS is API compatible with S3, which is nice. But it's literally intended to be an open source version of S3 that you can host and scale yourself.

Backblaze is in a different market. They may be finding out that there's overlap and allowing that use. But they are not the same and probably aren't prepared for developers to start using b2 en masse.

I think it makes business sense. You want to save some money? Do a little extra work for the cheaper product. Want to save even more money? Roll your own with Riak CS. Cloud services all work along the same spectrum where you pay more for convenience and ease of use, and you pay less up front if you're willing to pay in developer or devops or infrastructure costs. I think this fits in nicely on that spectrum.

gaul · on June 27, 2018

Object stores unfortunately innovate on their APIs instead of their implementations. I wrote S3Proxy to bridge the gap between S3 applications and a variety of object stores including B2:

https://github.com/gaul/s3proxy

dzek69 · on June 27, 2018

Hah, I hadn't heard about Blackbaze in a while and I was even thinking about creating Ask HN asking if anyone was maybe using Blackbaze for a longer while and can say something about them (speed, data reliability). Now I'll take my chance:

Had you used Blackbaze B2? How was your experience?

manigandham · on June 27, 2018

It's very cheap and effective for archival storage without having to deal with time/cost issues when you actually need to retrieve something. I use it for all my media so I can store terabytes and download in minutes for viewing.

Bandwidth is limited since they aren't connected like the major clouds, but it's workable if you don't need gigabit speeds. Single API key for permissions and lacks all the other features like events, object lifecycle, etc. Basic reporting but shows bucket size in real-time which is nice.

API can be annoying because it requires a request to "start" an upload (to get the address of where to upload), then doing the actual upload itself, but this can be automated away. Only single region for now (with multiple datacenters that aren't visible to you) so no global replication for extra durability or locality.

They have a partnership with https://www.packet.net (cloud bare metal) for free interconnect between their servers and B2 so you can do processing on your data without the public internet bottleneck and fees. Allows for an interesting data lake/warehouse option.

Use Cyberduck for a decent GUI client. If you just need personal computer backup, then use their actual backup offering which is unlimited storage and has auto-uploading background app.

lithiumfrost · on June 27, 2018

Perhaps. Personally I use Arq with B2 for a back end, and this usually costs me less than $1/mo (their regular backup starts at $5). In addition, using B2 with something like duplicity is the better approach for backing up Linux or NAS boxes, where their official client is not supported.

maxyme · on June 27, 2018

I've used B2 for some internal backup handling (several 100s of GB but millions of files) and have largely found it inexpensive and performant.

A few considerations: Their web UI can not handle large amounts of files (support said after a few million the file browser will not work). Sometimes when making a large number of deletions at once the API may serve 500 errors and the web UI give Java Servlet errors (this only happened a few times and resolved itself in a hour or two). As another user noted the per file/fragment upload speed isn't fantastic but I could max out my gigabit fiber with many concurrent downloads/uploads. The API has no concept of folders, only file path strings (which is mildly annoying to work with). Lastly I think all the data is currently housed in one geographic area but they are working on a DC in Phoenix.

Overall a pretty smooth experience but I was mostly using it for cold data.

voltagex_ · on June 27, 2018

Have you reported this stuff to support? I found their support was great, even when dealing with me noodling around in C# and breaking things.

_jcwu · on June 27, 2018

I have used B2 for a couple of months. From Europe it is way too slow. They only have like 2 DC, which are both on the west coast I believe.

As I am using this for backups only, I went back to Google Drive where I can max out my Gbit upload.

tribaal · on June 27, 2018

If you are connecting from Europe, maybe you could consider exoscale.com 's S3 storage offering?

It's 100% european (with Swiss and German regions), and priced pretty aggressively. https://www.exoscale.com/object-storage/

Disclaimer: I now work for exoscale, but was a happy user before that.

sschueller · on June 27, 2018

Cool, you guys should send out mailings to your customers when new offerings come online. I asked about s3 at the end of last year and was told it is currently unavailable for new customers. I haven't thought to check if it was available until now and may have gone to a more expensive competitor.

haywirez · on June 27, 2018

Thanks for this - contacted now. If you hear about any discounts for small startups, please let me know :)

icefo · on June 27, 2018

I can max out the upload of my residential internet connection (about 1 megabyte/sec) and that's enough for me.

I use use it to back up my Nas where all my other computers are backed up. I set it up with duplicacy-cli, rate limited the upload to 700KByte/sec (internet stay usable that way) and the script that launches duplicacy check that it is not already running.

Since I never upload more than 60Gb per day on average to my Nas I don't have any issues

_jcwu · on June 27, 2018

Yes, I think I maxed out at 30 MB/s.

brianwski · on June 27, 2018

Disclaimer: I work at Backblaze.

> From Europe it is way too slow.

We are opening a European datacenter in 2018, so stay tuned!

For now, we recommend you use multiple threads and you should be able to saturate any network connection, including yours in Europe. However, we do realize not all programmers or applications are capable of using threads and it would be more convenient to have lower latencies, thus the European datacenter in 2018. :-)

GordonS · on June 27, 2018

I tried it from the UK a couple of years ago, and had the same experience. I have ~1TB of data to backup, and it was going to take months to upload vs a few days for Azure or AWS hosted storage.

glenneroo · on June 27, 2018

Did you try enabling multi-threaded support? I was able to upload ~12 TB using 12-16 threads in a few months (and I'm based in EU as well).

GordonS · on June 29, 2018

It was a while ago, but I seem to recall I had enabled it. Things might have changed since then, so I guess I'll give it another go at some point.

lloeki · on June 27, 2018

> From Europe it is way too slow.

That's why we're still using AWS S3 currently in spite of the price, and consider moving to Digital Ocean's Spaces: B2 location is a non-starter.

brianwski · on June 27, 2018

Disclaimer: I work at Backblaze.

> B2 location is a non-starter.

We are opening a European datacenter in 2018, so stay tuned!

lloeki · on June 28, 2018

Interesting! Hopefully it's within the EU and not in the UK (some of our customers are picky for legal reasons)

jonatron · on June 27, 2018

It'd be interesting to find out if it's acceptably fast when proxying via a cloud provider.

gsibble · on June 27, 2018

Here is the CTO of Backblaze addressing speed questions earlier today: https://www.reddit.com/r/Python/comments/8u40xa/i_wrote_a_li...

voltagex_ · on June 27, 2018

Maybe I misunderstand his point, but isn't transferring data from Australia to the US or reverse always going to be slower due to the speed of light? What he's said doesn't negate the fact that he's only got one POP.

monort · on June 27, 2018

Speed of light affects latency, not throughput. TCP works bad with large delays, that's why it's recommended to use several TCP connections to saturate the link.

voltagex_ · on June 27, 2018

Yeah, I can never remember how TCP window size + latency affects transfer speed but you're right.

gsibble · on June 27, 2018

I'm not sure how many POPs they have so I do agree with that. I'm not shilling for B2 and have no affiliation with them. I've had good performance them but am US based.

glenneroo · on June 27, 2018

My individual connections to B2 are slow (200-500 KB/s) coming from Europe but that's why the multi-threaded option makes B2 very usable.

voltagex_ · on June 27, 2018

Doesn't that destroy the value of B2 in that you're going to pay Amazon or whoever's egress costs?

gsibble · on June 27, 2018

Not if you're only egressing to B2 once to save and then egressing many times from B2 to serve.

namibj · on June 27, 2018

In that case, consider packet, egress to B2 is free. They claim a good network, but I didn't test (yet).

voltagex_ · on June 28, 2018

You use the SJC1 option in packet.net to get the free ingress/egress. It's slowish single threaded but 100-200 megabit is easy with 4 threads.

yani · on June 27, 2018

Only good if you are using it from the states.

trigoman · on June 27, 2018

I’ve been using BackBlaze B2 for about 6 months and it’s been very stable!

The Java SDK is OK. So I plan on eventually having my application talk to Minio which will allow me to use the S3 API.

gsibble · on June 27, 2018

The idea behind this was to avoid Minio at least when using Python with a very simple to use SDK.

haywirez · on June 27, 2018

I'm trying to use it as an S3 replacement for audio content delivery - seems slow and laggy unfortunately. Uploads also fail frequently enough. I don't know if CDNs would make a big difference. (Europe)

Edit: lack of webhooks or something similar for doing follow-up after successful uploads is also irritating.

piqufoh · on June 27, 2018

This feels like a handy tool! The first thing I read when opening new code is the test suite - it's worth getting that right at the start. Would you consider deeper unit testing? S3 (and aws) have the indispensible `moto` boto mocks, I think something similar would be dead handy here.

gsibble · on June 27, 2018

Yes, on my TODO list is much deeper unit testing. I made this in four days and was just testing that it worked. It already has about 92% code coverage but I want to cover that fields are returned properly and such. Some help would be appreciated if people would like to, including mocking it up.

post_break · on June 27, 2018

I just wish Backblaze would fix their snapshots. You still have no way to tag a snapshot, put in any notes, anything. You literally make two snap shots 5 minutes apart and they only thing that differentiates them is the time stamp. unforgivable.

brianwski · on June 27, 2018

Disclaimer: I work at Backblaze.

> You still have no way to tag a snapshot, put in any notes, anything.

We totally agree, and the project is fully spec'ed, just waiting for an available engineer to implement it! On a side note, we also have open recs for engineers. :-)

post_break · on June 27, 2018

Thank you! Please send it out in an email or something when it's ready. We've been waiting for this for a while now which is why I'm so grumpy lol.

erickj · on June 27, 2018

Hey that's cool. I made a B2 Ruby gem a few years back that gives a library and cli for the API. +1,000,000 for backblaze!

https://github.com/erickj/bkblz

gsibble · on June 27, 2018

Awesome!

Dawny33 · on June 27, 2018

The best part is this API looks and feels very similar to AWS's boto S3 API.

Great job!

gsibble · on June 27, 2018

Thank you!

DanielDent · on June 27, 2018

B2's API design has security implications: https://www.danieldent.com/blog/restless-vulnerability-non-b...

voltagex_ · on June 27, 2018

I don't see how B2 is affected by this unless you assume malicious control of a B2 API server.

DanielDent · on June 27, 2018

It's fairly standard for network clients to assume potential malicious control of the server they are connecting to.

It helps reduce the blast radius of a compromised server.

In the case where the server is operated by a third party (as is the case with the B2 API server), there can be many compliance implications if that third-party-operated server has access to an internal network.

We don't accept when SSH clients or web browsers have the ability to do things they shouldn't based on instructions sent by the server they connect to.

Why would we suddenly have lower expectations of our file storage API clients? (or any other network/HTTP clients for that matter)

voltagex_ · on June 27, 2018

Ah, I see what you're getting at. It'd be better if the URL for get_upload_url (I think that's what the API was called) could be calculated client side.

At the moment, you're probably still more at risk of downloading a malicious library from PyPi or npm but this is sure to turn up in a CTF at some point - even curl is technically vulnerable.

Have you talked to anyone from Backblaze about this?

DanielDent · on June 27, 2018

Yes, client-side URL calculation and/or a whitelist of acceptable URLs would be a significant improvement.

Thankfully command-line curl won't follow redirects unless you pass it a special flag, though if you do need it to follow redirects, I'm not sure what the best way is to restrict the range of redirects that it will follow.

This issue was part of a broader coordinated disclosure and was only published today. I've gotten in touch with B2 support & I'm hoping my support ticket will make it to the correct people.

simula67 · on June 27, 2018

> Backblaze B2 provides the cheapest cloud object storage and transfer available on the internet

Are you sure ? http://gaul.org/object-store-comparison/ says there are cheaper options

EDIT: Edited the link away from https://wasabi.com/pricing/. Wasabi seems cheaper than B2 and claims to be a hot storage solution

qeternity · on June 27, 2018

We are big users of OVH, AWS and B2 object storage. OVH charge for ingress and egress even if local. AWS Glacier has 90 minimum storage time. For most uses cases, B2 is much much cheaper.

EDIT: Response to your edit, Wasabi also has 90 day minimum storage policy.

subbu · on June 27, 2018

B2 is cheaper than OVH object store?

qeternity · on June 27, 2018