> fortunately it is fairly strait forward to cap your data retreival rates.
Amazon has done a great job with this feature. By doing a poor job implementing something for an extremely narrow usecase, in a technology that is outdated and then providing the most complicated pricing structure surrounding every aspect of the product one can't helpbut use the feature: any other provider or service.
Like, wtf would be the usecase for amazon glacier in 2016? I dont think I would put hubdreds of petabytes of sata into 20 year cold storage, and the author of this post certainly wouldnt use it again. The fact that i need to read 2 pages of pricing docs and then the 2 pages you linked to control them because I cant estimate them myself, is a sure sign this is absurd
SOX compliance, legal requirements to save communications, etc. There are a lot of places where there are needs to maintain a huge amount of information that you're probably never going to need again.
Not all products are for all people. If you foresee a need to recover a large amount of data all at once, then glacier's not for you. If you might occasionally need a filing from 6 years ago, then glacier would be great.
It's not about recovering a large amount of data; its about recovering a large % of your own stored data.
Amazon starts to charge you extra anytime you exceed restoring 5% of the data.
If for example you save all your tax-related documents in Glacier, then you are audited then the accounts department or the government will want all the information. Not 5% of it. Not 10% of it. Everything. At that point Amazon will have you over a barrel, because getting the data out at a reasonable time frame will cost exponentially more than dripping out the data over the course of 20 months.
> Amazon starts to charge you extra anytime you exceed restoring 5% of the data.
Isn't one way to get past this... increasing your data usage by 20x? If OP used less than a $1 a month, then if he uploaded $20 of junk data, he can get the 5% original data back "for free". Sure, it's $20, but it beats out $150+.
It looks like it is even more complicated than that. You can get 5% out per month at no charge, but that it only if you spread it out across the entire month. The extra charges happen the first time you exceed 5%/30 in a single day.
A better way would be if you had 20+ categories of data that are totally unrelated, like; your tax-stuff, your code, your diary from 1995-2010, ... Since these are very unrelated, you are not likely to need every one of at the same time, ASAP.
Though it's hard for me to imagine having so many categories of unrelated, useful and important data.
>If for example you save all your tax-related documents in Glacier, then you are audited then the accounts department or the government will want all the information. Not 5% of it. Not 10% of it. Everything.
Are you sure about that? I haven't worked with tax litigation specifically, but I've worked with e-discovery w.r.t. e-mail and I can assure you that no one ever asks for all the e-mails sent by a particular company over all time. It's always a matter of asking for the e-mails sent by or received by a select group of people, over a fairly discrete time period. For something like this, a Glacier store might make sense, if it was coupled with an online metadata cache stored in e.g. S3.
With tax litigation the issue is that you have to prove you didn't simply shift money and accounting briefs around, and the only way to realistically prove it is to show all the statements in the time period that you'd required to keep them (I think that's the last 6 years).
The government basically comes to you and says they think you owe X, and you have to prove that false to their satisfaction. The more data you give your CPA to work with, the better.
Lots of businesses have data retention requirements, and it can be difficult and time consuming to make sure this data is backed up in a way that is secure and can survive a catastrophe.
The author's use case (and most other personal use cases) might not be a good fit for Glacier, but he's not the target market.
Tape storage is still the most optimal form of long-term storage. If you need to store things for an exceptionally long time, such as financial data, scientific data, etc, then you're going to get the most bang for your buck on tape.
The post states he paid $150 yo retrieve 60gb of data. For $150 you can buy a 5tb hdd drive.
What usecase could the price delta make sense to have a 4 hour feedback loop and all of your important data locked in someone elses data center.
the usecase where your data is so properly massive that this makes sense && you don't have the storage infrastructure in place, is so narrow that it doesnt make sense.
It is basically one research student's crawl data
Edit: also, s3 is pretty cheap. So again, i dont really see the usecase here. How much room is in the market between your own physical or digital system and amazon s3 or an equivalent. you would have to have a massive amount of data you dont care about and be very price sensitive.
You don't have to pay $150 for retrieval of 60GB. And you don't do long-term storage for X TB / 5 TB * $150. You might have to rent space in someone elses datacenter to put your own external backup... or you could pay Amazon for Glacier and not deal with maintenance etc. Might be worth it even if you have infrastructure for all data that isn't glacier-cold.
The data I care about is already backed up on two different multi-TB drives at home, and another one at work.
Glacier is the contingency for "something took out the original data and all three backups in two different locations 7 or 8km apart - if I'm still alive after whatever just happened, I'll consider whether or not to pay Amazon a grand or so to retrieve it quickly from Glacier, or wait ~20 months to get it all for single-digit-dollars".
Right the services cost money but retrieval is free. Going by the cost of the harddrives amortized out, it'll probably be the same or less. You get far more durability and less complexity with universal web access.
I believe there are other similar services for Linux or you can just use browser to upload files with Amazon.
Depends. Having an in-house key and shipping everything to Amazon encrypted means that you have all the infrastructure there and waiting, and not capable of being read. Additionally, that tape library would need to be stored, and periodically tested so that tapes can be rotated out as needed. Sending that data to a service like glacier means you've shown due diligence, but at the same time, don't need to maintain a schedule of testing every disk every year.
this was what I meant. thanks. You can buy a 5tb hard drive for ~138.00. You could likely buy several of them at discount to get started. As you go forward in time, these will become much cheaper, allowing you to continue purchasing them on demand from the market for much less money.
This allows you to trivially share, copy, move and retrieve that data quickly as well as fully control who has access and when.
I am sure there are use cases for this but in a situation where you have petabyte scale data, it is often the case that you also have the infrastructure to save it. How many places would need to store >5tb of data a week that
* don't have this capability in house
* will almost never need to access it again.
* will not need it in a timely manner, if they do need to access it again.
* don't have the money to implement their dedicated server and storage on site for this purpose.
I am not saying that this rules everyone out, but the prices are so low, and tape must be so annoying, I couldn't imagine why they keep offering this. Obviously, some peopel must be using it but in 2016 with storage prices being so low already, i don't know how many places have this amount of data and meet the above requirements.
Amazon has done a great job with this feature. By doing a poor job implementing something for an extremely narrow usecase, in a technology that is outdated and then providing the most complicated pricing structure surrounding every aspect of the product one can't helpbut use the feature: any other provider or service.
Like, wtf would be the usecase for amazon glacier in 2016? I dont think I would put hubdreds of petabytes of sata into 20 year cold storage, and the author of this post certainly wouldnt use it again. The fact that i need to read 2 pages of pricing docs and then the 2 pages you linked to control them because I cant estimate them myself, is a sure sign this is absurd