Myspace lost all the music its users uploaded between 2003 and 2015

mellow-lake-day · on March 18, 2019

>Someday, this will happen to Facebook, Instagram, Tumblr, etc. Don't trust the platforms to archive your data.

This also goes for Google Drive, Dropbox, and many other websites (if not all)

Examples:

https://medium.com/@jancurn/how-bug-in-dropbox-permanently-d...

https://motherboard.vice.com/en_us/article/9kgwnp/porn-on-go...

https://www.zdnet.com/article/dropbox-under-fire-for-dmca-ta...

zxcvbn4038 · on March 18, 2019

I used to work at Tumblr, the entirety of their user content is stored in a single multi-petabyte AWS S3 bucket, in a single AWS account, no backup, no MFA delete, no object versioning. It is all one fat finger away from oblivion.

leowoo91 · on March 18, 2019

I guess your statement is a bit beyond NDA, but thank you for sharing.

dev_dull · on March 18, 2019

Borderline whistleblowing.

SmellyGeekBoy · on March 18, 2019

It's not covered by NDA if it's made up.

heinstrom · on March 19, 2019

Indeed, wild what people will say on the inter webs.

ummonk · on March 18, 2019

What the hell. It is so easy to configure multi-region glacier backups, mfa delete, etc. for a single S3 bucket. Took me like a couple hours to setup versioning and backups, and a few days to setup mfa for admin actions. Why would they not set this stuff up?

gregrata · on March 18, 2019

The key words you probably need to look at are "multi-petabyte". Not saying they shouldn't be doing something but it all costs - and at multi-petabytes, it cooooosts

1 Petabyte (and they have multiple) S3 - $30,000 a month, $360,000 a year

S3 - reduced redundancy - $24,000 a month, $288,000 a year

S3 - infrequent access - $13,100 a month, $157,000 a year

Glacier - $7340 a month - $88,000 a year

zxcvbn4038 · on March 18, 2019

Add in transit and cdn and Tumblr’s AWS bill was seven figures a month. A bunch of us wanted to build something like Facebook’s haystack do away with S3 altogether, but the idea kept getting killed because of concerns over all the places the S3 URLs were hard coded and also breaking 3rd party links to content in the bucket (for years you could link to the bucket directly - still can for content more then a couple years old)

PostOnce · on March 18, 2019

Well, the business was acquired for $500,000,000 and a single employee probably costs what backing up two petabytes of data for a year (on glacier) does.

They could also always use tapes, for something as critical as the data that is the blood of your business.

Imagine if facebook lost everyones' contact lists, how bad would that be for their business? Backups are cheap insurance.

FussyZeus · on March 18, 2019

Backups are still a hard sell for management, though. No matter how many companies die a quick and painful death when they lose too much business critical data, the bossmen just can't wrap their heads around spending $100k for what they perceive as no benefit.

Same problems with buying things like antivirus software or even IT management utilities; when they're doing their job, there's no perceivable difference. It's only when shit goes sideways that the value is demonstrated.

Hell you could take this a step further for IT as a whole; if IT is doing their job well, they're invisible. Then they can the entire department, outsource to offsite support, and the business starts hemorrhaging employees and revenue because nobody can get anything done.

magduf · on March 18, 2019

>No matter how many companies die a quick and painful death when they lose too much business critical data, the bossmen just can't wrap their heads around spending $100k for what they perceive as no benefit.

Yeah, but what exactly IS the benefit? The business doesn't die if something really bad happens? Is that really important though?

Consider the two alternatives:

1) The business spends $x00k/year on backups. IF something happens, they're saved, and business continues as normal. However, this money comes out of their bottom line, making them less profitable.

2) The business doesn't bother with backups, and has more profit. The management can get bigger bonuses. But IF something bad happens, the company goes under, but then what happens to the managers who made these decisions? They just go on to another job at another company, right?

I'm not sure I see the benefit of backups here.

FussyZeus · on March 18, 2019

> Yeah, but what exactly IS the benefit? The business doesn't die if something really bad happens? Is that really important though?

I mean the way management gets on me when we have outages, you'd think that was a significant priority?

magduf · on March 18, 2019

They can get more money in the short term by pushing you harder, and there's zero cost to them to go yell at you. If they could get a bigger bonus by ignoring outages, they'd do that, but instead, they can get a bigger bonus by pushing you to reduce outages without any additional resources.

meko · on March 20, 2019

You'd be absolutely right but it's still a sad state of affairs.

softawre · on March 18, 2019

The managers that make these decisions need to have equity.

magduf · on March 18, 2019

Seems like they do just fine with big golden parachutes. Why tie their compensation to the company's performance when they can just have a big payout whenever they leave under any circumstances?

ConceptJunkie · on March 18, 2019

I worked at a place that lost their entire CVS repository. The only reason they were able to restore it at all was because I made daily backups of the code myself. Sure, a lot of context data was still still lost, but at least there was some history preserved.

antt · on March 18, 2019

They are expensive until the business goes bankrupt.

Bartweiss · on March 19, 2019

I wouldn't be surprised if this was actually the rationale for not having backups.

Tumblr is apparently fragile and tech-debt laden on engineering side, stagnant on users, and unprofitable. At a certain point, it's a coherent decision to just say "a few days of downtime would seal our fate, the business can only be saved if everything goes right", and not spend any money on mitigation.

ummonk · on March 18, 2019

88k per year per petabyte is a small price to pay to protect your entire business from being wiped out.

OscarTheGrinch · on March 18, 2019

Devil's advocate: it depends on how many petabytes you have. This cloud of uncertainty over your uploads could be seen as the hidden cost of using a free platform.

dotancohen · on March 18, 2019

> cloud of uncertainty

So far as Myspace (or Tumblr apparently) is concerned, it is "somebody else's computer of uncertainty".

pmlnr · on March 18, 2019

There are Supermicro chassis' out there with 106x14TB drives in 4u, super deep racks.

1PB is nothing today.

idlewords · on March 19, 2019

Building such a storage behemoth is not the challenging part. Filling it with data, backing it up, and keeping the RAID rebuild time under load on such monster drives below the average drive failure time is the challenging part.

crest · on March 19, 2019

At that scale it makes sense to start thinking about alternatives to RAID e.g. an object storage with erasure coding should work well for a code base already using the S3 API. In theory even minio should be enough, but I never had enough spare hardware to perform a load test of that scale.

bufferoverflow · on March 18, 2019

Or they can just have their own backup solution for a lot cheaper. 8TB = $140 on Amazon.

1 petabyte = 125 drives = $17,500 (one-time cost).

It will probably cost more to connect all these drives to some sort of a server. Though 125 is within the realm of what a simple USB should be able to handle (127 devices per controller).

whoami_whereami · on March 18, 2019

And how many days of downtime are you willing to tolerate while you are restoring that petabyte of data from your contraption? Let's say you have a 10Gbps internet connection (not cheap) all the way through to the Amazon data center, the data transfer will only take about 12 days per petabyte then.

Getting petabytes of storage isn't the problem, transferring the data back and forth is.

Bartweiss · on March 19, 2019

This is all true, but it sort of presupposes competence.

Taking a full month to recover a downed social media platform isn't really acceptable, but it's still better than being literally unable to recover it at all. Spending a small fortune to ship hardware to an AWS datacenter and convincing/paying them to load it directly would probably also be worthwhile, when we're talking about simply losing a $500M company. If the claim here about "no backup" is true, it's so profoundly stupid that everything I know about best practices sort of goes out the window. Approaches that any sensible person would consider unacceptably slow and unreliable are still a step up from a completely blank playbook.

(I guess the theory might be that Tumblr is such a trashfire it can't be restored, or would lose so much value in days/weeks of downtime that there's no point in even planning for that. Again, I don't really know how you run cost-benefit analyses when it's not entirely clear the project has benefits.)

yayr · on March 19, 2019

you can just colocate that server

whoami_whereami · on March 19, 2019

And where does Amazon offer colo services? What they offer is Direct Connect at certain (non-Amazon) data centers. That costs about 20k per year for a 10Gb port, ON TOP of the colocation and cross connect fees you are paying at the data center where you want to establish the connection. If you want to bring the restore time down to 12 hours, you need 24 connections (and you need at least as many servers, no single server can handle 240Gb of traffic), so we are now at about 480k+X (large X!) per year per petabyte just for the connections you need in case you have a catastrophic failure (establishing such a connection takes days or even weeks, even if ports are available immediately, so you can't establish the connections "on demand").

That's not even talking about availability, as you are now getting into the realm where it starts to get questionable whether even Amazon has enough backhaul capacity available at those locations so that you can actually max out 50+ 10Gb connections simultaneously.

QuinnyPig · on March 19, 2019

At this scale there is no “just.”

lugg · on March 18, 2019

That's like a developer or two..

Wth?

zwily · on March 18, 2019

MFA delete at least doesn’t cost any extra.

quotemstr · on March 18, 2019

So, roughly the cost of one or two good engineers? Not having backups is penny wise and pound foolish.

ConceptJunkie · on March 18, 2019

"Penny wise and pound foolish" is the universal motto of management everywhere.

de_watcher · on March 18, 2019

They'll lose it as soon as they try to configure that.

johnvanommen · on March 18, 2019

> I used to work at Tumblr, the entirety of their user content is stored in a single multi-petabyte AWS S3 bucket, in a single AWS account, no backup, no MFA delete, no object versioning. It is all one fat finger away from oblivion.

Remember when Microsoft lost all of the data for their Sidekick users? Basically they were upgrading their SAN and things went badly.

labster · on March 18, 2019

I'm surprisingly okay with this. Well, I guess I'd miss McMansion Hell.

kevinmchugh · on March 18, 2019

McMansion Hell is now archived by the Library of Congress. Don't be too concerned.

http://mcmansionhell.com/post/181936133241/what-level-of-pos...

ivm · on March 18, 2019

Thousands of skilled artists use Tumblr as their main publishing platform.

lostlogin · on March 18, 2019

Picasso (supposedly) drew on a napkin, and Banksy draws on derelict walls or sticks his work through a shredder. The medium doesn’t need to be lasting. Edit: The potentially short-lived medium was chosen by the above artists. Tumblr users many not be too happy if work is lost.

buboard · on March 18, 2019

banksy's walls are sold though; and he is still kind of the exception because of his art format. Not everything needs to be lasting but 100% temporary art is not common.

whoami_whereami · on March 18, 2019

Oh, it's more common than you think, only it being highly valued is rare. That doodle you drew while having a conversation on the phone? That's throwaway art, even if you don't consider yourself an artist.

benjaminikuta · on March 21, 2019

Hi, I'm new here.

Why is your name green?

criddell · on March 18, 2019

How many do you think they would be willing to pay some small monthly fee? I'm guessing most of them think their work is worth at least $5/month, right? Maybe Tumblr should become a paid service and ditch the advertising model. That way they could be more relaxed about what types of content they are willing to host.

kirillzubovsky · on March 18, 2019

I’ve heard from Amazon friend that AWS as a whole is like that, one click away from a total meltdown. Probably true.

stone-monkey · on March 18, 2019

That's basically what happened with S3 a couple years back. Mistyped command caused an outage for large parts of the internet in the US. Now, I dunno if they could make a big enough mistake that would bring down the whole company, but certainly it's been proven that a single mistake can affect major portions of the internet.

antt · on March 18, 2019

I always find it funny how I'm designing with best practices in mind on top of infrastructure someone out of university build as their first project.

nostrebored · on March 18, 2019

This is not the case with S3 and not the case with that incident.

lugg · on March 18, 2019

Pretty sure there are first year grads who have worked on S3 as their first project.

stavros · on March 18, 2019

So what? You're saying it as if they gave them root access to the servers and went "go nuts".

lugg · on March 18, 2019

Bugs in code happen. You don't need write access to cause irreparable damage when the app you're working on has it.

stavros · on March 18, 2019

This applies to everyone, juniors and seniors, and that's why we have code reviews, tests and tooling.

karlkatzke · on March 18, 2019

Yeah, that's pretty much what major companies I've worked for will do with summer interns.

stavros · on March 18, 2019

I don't know, that hasn't been my experience at all in the companies I've worked for (maybe because there's no way I'd let it happen).

antt · on March 18, 2019

You can't prove me wrong since it's source is not available.

klodolph · on March 18, 2019

I would believe that AWS is one click away from being unavailable for 12 hours, but not one click away from major irrecoverable data loss.

(Don't ask for a rigorous definition of "one click away", though.)

dodobirdlord · on March 18, 2019

For most AWS services it would be fairly difficult to cause multi-region damage by mistake.

nostrebored · on March 18, 2019

> experienced code reviewers verifying change sets using sophisticated deployment infrastructure targeting physical hardware spread out across one or more data centers in each availability zone

but the availability numbers speak for themselves :/

alekratz · on March 18, 2019

This is fascinating. Are there any other crazy "wtf, how has this site not died yet" stories from the inside?

Bartweiss · on March 19, 2019

There's an awful lot of less-critical stuff that users have tracked down themselves. A few random highlights:

- The mobile and desktop sites are completely separate products with vastly different behavior. Some privacy features (relevant to both) can only be accessed on one, some on the other. Tags are rendered in all-lowercase on mobile, but as written on desktop. Block quotes on desktop render as enlarged-font cursive on mobile, for some awful reason.

- Tumblr support(s/ed) font coloring, with no documentation of that fact. You enable it by using the HTML editor and picking among color tags with Friends-themed names like "Monica Pizazz Orange". Oh, and the preview feature won't honor the tags, but actually posting will.

- NSFW content is flagged even in drafts, but if that content is reviewed and approved, it's automatically posted publicly, not returned to drafts where it started.

- Tumblr's desktop sign up page use(s/d) semi-random images from the site as backgrounds. Yes, they did serve cartoon porn to people trying to make accounts.

- Certain posts were impossible to view. Tumblr accounts can have their own themed pages, or simply be popup sidebars over the main news feed. Tumblr "read more" content hiders took users from the news feed to the poster's account - if that account was in popup format, a readmore opened from the wrong location would simply force a circular redirect.

- All Tumblr links are actually pushed through a site-specific forwarding system to track users. As a result, Twitter and many other sites are inaccessible because they view all link clicks as bot traffic from a "single source".

nightpool · on March 21, 2019

your info is somewhat put of daye. aside from #1, these are all bugs that have been fixed. #2 was only before the feature was officially launched. #3 was fixed within a few days. #4 wasn't a bug, serving artistic nudity was intentional and part of tumblrs brand (just like an art museum would). #5 was a bug for a while and it sucked. I've never heard of #6 being an issue—its true that they use a link tracking system but I've never heard of it causing "bot traffic" issues, respectfully, that sounds like bullshit—while I hate it for privacy reasons, lots of sites use link tracking, like Google and Facebook.

Bartweiss · on March 25, 2019

I agree that most of these bugs are old; I figured the question included historical stuff, and I have a better knowledge of Tumblr's old bugs than the its new ones.

It looks like I was simply wrong on #2, thank you; I remembered it as something that had been around for ages but was noticed, then publicized. If it was found before a planned announcement, that's different.

#3 was fixed within a few days, but frankly I think "posting people's drafts with no warning" is a "damage done" thing, the same as an email client sending drafts to all listed recipients. There are reasons like the "private post" option that you would draft something and never openly publish it, and even beyond that it's reason to draft anything you might not want to publish as-is offline instead of in the site's draft feature.

#6 is complained about by plenty of other people, and happens to me perhaps 90% of the time. I realize I missed one thing: it's mobile-only. Opening a Twitter link on mobile produces a "you're rate-limited" blocking page which sticks around even if you try again later, but choosing "open in Chrome" to escape the Tumblr app immediately solves the problem. I haven't seen comparable behavior in any other app where I've followed Twitter links. Mobile-specific implies it's not purely the link tracking, granted, but it's very much a real Tumblr-specific issue.

tazjin · on March 18, 2019

My experience with Tumblr was generally that a large part of the content, especially larger media content like videos, failed to load most of the time. Makes me wonder if that's related ...

scarface74 · on March 18, 2019

I’m not saying it isn’t dumb, but that one fat finger would have to be

aws s3 rm bucket —-recursive

It won’t let you just go into the console or delete the stack that made it if the bucket isn’t empty.

VWWHFSfQ · on March 18, 2019

there was a S3 sync client that some people used that did:

    aws s3 sync --delete ./ s3://your-bucket/

The delete flag was added by just a very innocuous checkbox in the UI. The result is that it removes anything not in the source directory. Kaboom. Everything's gone. The point is you have no idea what stuff is going to do even if you think it's obvious.

electroly · on March 18, 2019

Have you tried this? It takes forever to clean out a bucket. At the scale we're talking about, doing this on a single thread from the CLI tool means you could go home and come back the next day and cancel it then, and you still wouldn't have made a particularly big dent in the bucket. It's really a pain in the neck to delete a whole bucket full of data when you actually want to. It's "easy" to start off a recursive delete, sure, but I think you're overestimating the "kaboom" factor.

VWWHFSfQ · on March 18, 2019

not every business critical bucket has petabytes of data in it

electroly · on March 18, 2019

This one does. We're talking about Tumblr.

dahfizz · on March 18, 2019

Maybe the moral is that you shouldn't rely on third party clients for mission critical stuff if you dont know what they do.

VWWHFSfQ · on March 18, 2019

and also have backups like a normal competent person/organization does

foxtrottbravo · on March 18, 2019

I know in the particular example that is something that's good advise and more or less easily done.

Do you think it would be good to extend said argument to say scp / ftp clients?

bashinator · on March 18, 2019

awscli is a first-party client.

scarface74 · on March 18, 2019

He mentioned a third party GUI wrapper on top of the CLI.

PetahNZ · on March 18, 2019

This would take so many hours to actually run though, probably weeks for that amount of data.

itronitron · on March 18, 2019

maybe someone at Tumblr can test this...

Dunedan · on March 18, 2019

That's not accurate.

From the S3 management console user guide[1]:

> You can delete an empty bucket, and when you're using the AWS Management Console, you can delete a bucket that contains objects. If you delete a bucket that contains objects, all the objects in the bucket are permanently deleted.

[1]: https://docs.aws.amazon.com/AmazonS3/latest/user-guide/delet...

onefuncman · on March 19, 2019

if you'd used the s3 management console, you'd know that it uses the same API as everything else, and so has to do the same list objects by page / delete a page dance just like everybody else... the only bulk optimization i can recall is the server side transfers for sync...

aiven · on March 18, 2019

After porn ban they probably have only ~one petabyte.

supermanlocal · on March 19, 2019

And I know being an auditor, how the controls in the SOC audits are designed around to miss the pressing issues of cloud!!!

undefined1 · on March 18, 2019

Did anyone in the company make a big deal out of this?

cortesoft · on March 18, 2019

When was this? Being owned by Yahoo, I am surprised they don't use NetApp.

zxcvbn4038 · on March 18, 2019

Tumblr rejected all things Yahoo, except the money, so the answer to just about anything Yahoo asked was either “no”, “get stuffed”, or silence and a note to David that he needed to escalate to Marissa.

On the other side the Yahoo services were so heavily integrated that it was hard to carve out any piece of them, and the few times we tried it was a slow and painful process because Yahoo’s piece was glitchey and unreliable outside of it’s home turf and the Tumblr engineers defensive and argumentative about everything and not willing to help.

zimpenfish · on March 18, 2019

> Tumblr rejected all things Yahoo

Having worked at Yahoo, I understand this stance.

aasasd · on March 18, 2019

That's exactly how I imagined Tumblr's design and development, based on my multiple unsuccessful attempts, over the years, to find any useful navigation between blogs, or the function of reading comments.

johnvanommen · on March 18, 2019

> When was this? Being owned by Yahoo, I am surprised they don't use NetApp.

Dell used to offer an online backup service. It wasn't even running on Dell equipment!

Basically they acquired a company that offered the service, and while it would be "nice" if a Dell company ran on Dell gear, a lot of the time it's simply impractical/expensive to overhaul things.

soup10 · on March 18, 2019

i do this too with my data on a smaller scale, but i'm suprised tumblr does this because even with only a few million files s3 buckets that big are awkward to work with

mholt · on March 18, 2019

This is why I wrote Timeliner: https://github.com/mholt/timeliner - a tool to download all my content from essential cloud services (like Google Photos, etc) -- I don't like to trust the cloud as a master copy of my data.

nitemice · on March 19, 2019

I've been amassing a collection of scripts to basically do the same thing over the last couple of months, just running them all once a month with cron.

None of the services I've been backing up (Goodreads, Trakt, DeviantArt, Tumblr) are currently covered by Timeliner, but the extra twist of assembling all your data into a single timeline sounds kinda cool, so maybe it's worth contributing a few data sources.

zimpenfish · on March 18, 2019

I use(d) Perkeep (Camlistore) for this but the current version is broken wrt Pleroma (which might be Pleroma's fault) and Pinboard. I'll definitely give Timeliner (and possibly a Pinboard datasource) to fill in the gaps.

hiei · on March 18, 2019

This tool appears to be great! One question though, what does your storing solution look like?

mholt · on March 20, 2019

It's just a SQLite database, combined with a folder on the file system.

hiei · on March 21, 2019

Forgive me - I meant personal storage solutions.

dredmorbius · on March 18, 2019

Very interesting.

I've had a thought about a monthly "data preservation day" concept, (on the 11th, because: https://xkcd.com/1140/, and yes, I know: https://drhagen.com/blog/the-missing-11th-of-the-month/)

A multiplatform archival tool would be useful.

alkonaut · on March 18, 2019

I might not trust them 100%, but I trust them around 1000x more than I trust myself. I have had several data loss incidents and it was invariably my own fault. I simply cannot be trusted with my own data.

If one doesn't trust Google/Dropbox et.al. to handle the data, then just use two of them, or use a personal solution and one of them. Just don't use your home rolled backup system only because you don't trust any of the online storage providers.

Hoasi · on March 18, 2019

> This also goes for Google Drive, Dropbox, and many other websites (if not all)

Of course, but that is not a bad thing per se. Digital data archiving itself is unknown territory. What should happen to the data you put online is uncertain, even in the short term.

The analogy of a "cloud" is revealing. A cloud is fleeting by nature. Nothing last forever online, no matter what they say. And that is not mentioning compatibility issues or reading old files made with outdated apps.

Some files that are just above 10 years old can be hard to retrieve today.

josefresco · on March 18, 2019

Anecdotally my experience has been the inverse.

Number of my files lost by cloud drive providers: 0

Number of my files lost by me because I'm a dumbass: Tens of thousands, including some of most cherished family photos

kyrra · on March 18, 2019

In the Google case, I don't believe the files were ever deleted. While the article says they were unable to download it again, my understanding is that Google never deletes or prevents access to your own files on Drive (in cases like this). So maybe something else was going on here I'm unfamiliar with.

Google was preventing the sharing of the files, but that should have been it.

(I'm a googler, but don't have any direct info on this specific incident)

kerneltime · on March 18, 2019

At one point Yahoo was invincible.. I really need a way to find an alternative to gmail for email where I am in control and can replicate it when I like.

SpaethCo · on March 18, 2019

You can be in control of your actual data pretty easily with gmail. One of the better options I’ve found for maintaining a local archive is the Got Your Back[0] script. It maintains an SQLite DB with all of the message tag associations, and stores all your emails in standard .msg format text files.

If you want additional peace of mind, you can spend a few pennies and use restic[1] to do regular incremental backups of the archive to Backblaze B2 to add yet another storage location and version index.

[0] https://github.com/jay0lee/got-your-back/wiki

[1] https://restic.net

asteli · on March 18, 2019

you can always set up an email account on another service and have the mail accessed through google's interfaces. i wouldn't recommend setting up your own email server unless you really like securing servers though

OrgNet · on March 18, 2019

The only reason i still use gmail is because of the search engine... im about to implement my own using existing techs so that I can move away from Google... why does search sucks so much in open source programs.. on a side note, Myspace is still active?

zimpenfish · on March 18, 2019

> why does search sucks so much in open source programs..

If you have/can get command line access to your mailboxes, mairix[1] is pretty good at indexing and searching.

[1] https://github.com/vandry/mairix

ksec · on March 18, 2019

I want a simple to use NAS that has NAND + 2x 2.5" HDD Mixed with ZFS. And Backup Option to likely B2 or other Services.

Preferably coming from Apple as new Time Capsule. But All these vendor want is Cloud Services Revenue.

Interestingly There are a few Android handset maker with Wireless + Backup devices. That backup while recharging your battery.

djsumdog · on March 18, 2019

It's a little different for Dropbox, MEGA and similar services because they sync your machines. So long as you have at least one machine with those tools installed and running, you have a copy (unless someone finds an exploit in their servers and deletes all their customer data ... so yea .. still keep backups).

johnchristopher · on March 18, 2019

No. It goes against the whole "what you post online will stay online forever".

Which is it ?

jraph · on March 18, 2019

If you hope it remains private or it gets forgotten, you should assume that it may go public / remain online forever. And archives exist for many pieces of public content.

If you hope it stays forever, you should assume it may vanish in a few seconds.

gdulli · on March 18, 2019

I've seen occasional attachments to years-old gmail messages get irrevocably lost. Well, irrevocably given the likelihood of ever getting human attention on it.

wishinghand · on March 18, 2019

This reminded me of Purevolume.com, which I just went to go check. Seems like it's a clickbait pop culture article site now. It used to be Soundcloud before Soundcloud, though it was mostly popular with punk, hardcore, and their various offshoots. All of that uploaded music...just gone.

It's not necessarily sites going out of business/losing data either. On Bandcamp, Melora Creager used to have three songs up called The Willow Tree Tryptych. I bought it but it's no longer available for download in my account, though it's listed there. I'll have to pirate it somewhere since I don't seem to have a download of it. I'll have to figure out that later since Spotify atrophied my pirating knowledge.

Always archive your digital purchases when you can.

sedachv · on March 18, 2019

mp3.com[1] predated Soundcloud by a decade, and was amazing both for listeners and for independent artists. They were paying artists royalties per-download/stream in 1999. The site shut down in 2003 and all the music was lost.[2] I still have music I downloaded from mp3.com that I have not been able to find anywhere else. This is why I still use and support P2P file sharing. After the 2003 experience with mp3.com it was obvious to me that using music streaming services was a bad idea.

[1] https://en.wikipedia.org/wiki/MP3.com#History

[2] There was some kind of deal with Trusonic/GarageBand.com where artists were able to access their tracks uploaded to mp3.com and transfer them to GarageBand.com for about a year from 2004 to 2005. It is unclear how many people actually did the transfer (I had an mp3.com page for an electronic music project and was never notified about this). GarageBand.com in turn closed down in 2010, offering migration to iLike. iLike was acquired by MySpace and rolled into MySpace Music in 2012.

https://en.wikipedia.org/wiki/GarageBand.com https://en.wikipedia.org/wiki/ILike

coroxout · on March 18, 2019

I miss mp3.com and still have a few favourite mp3s by other people saved on a hard disk somewhere (or so I thought - doesn't seem to be on this hard disk so I'll have to check for them tonight).

I wish I'd archived the band bios too, because now they're completely out of context, just some band names and song titles which aren't Googleable in anyway. If any of the bios listed the musicians' names it'd be interesting to see what they're up to now, 20 years later.

(Ouch, that really was 20 years ago.)

thomnottom · on March 18, 2019

I've got probably a few thousand mp3s from there and other services (lots of eMusic samplers) that now have little to no context. Sorting through them lately, it's a tad depressing when I hear a really great song and can find almost no information about the band that made it.

Also heavily used drip.fm. When Kickstarter decided to change the service to Patreon-lite I asked about archiving the site because of all of the extra info (forget about the music, I wanted the metadata). They told me they couldn't do it.

patwolf · on March 18, 2019

I used to publish to mp3.com, and sadly I don't even have backups of some of my songs I put on there.

I never made any money from mp3.com, but I do have a couple backpacks that they sent me for no real reason--a glorious reminder of the dotcom era.

sp332 · on March 18, 2019

Soundcloud also seems to have a bit rot problem. https://twitter.com/textfiles/status/1102668648630681600

P.S. https://archive.org/donate

pmoriarty · on March 18, 2019

What happens to all the archived content when archive.org itself is gone?

Though a laudable effort, it seems as much a single point of failure as any other site.

int_19h · on March 18, 2019

I think it is easier to get continuous funding for it, because of the extra gravitas that comes from being an archive. Much like with libraries, the appeal of the idea itself is more important than the books inside. If you look at the list of their sponsors, they include e.g. the National Science Foundation and the Library of Congress.

That, and with more diverse stuff in one place, there's also more diverse interest in keeping it going.

userbinator · on March 18, 2019

archive.org apparently has torrents of some of its content too:

https://blog.archive.org/2012/08/07/over-1000000-torrents-of...

so they're not completely centralised but may actually be trying to decentralise things a bit.

ValentineC · on March 18, 2019

I'd be more worried about the copyrighted material that they can archive but aren't allowed to distribute.

dredmorbius · on March 18, 2019

Well, it's on Archive Team's watchlist:

https://www.archiveteam.org/index.php?title=Alive..._OR_ARE_...

There are two mirrors:

https://www.archiveteam.org/index.php?title=Internet_Archive

sp332 · on March 18, 2019

Definitely true. But the article mentioned donating to the Archive without providing a link to do it.

argd678 · on March 18, 2019

In some sense it’s worse since it expensive and complex to run, and there’s also no way to know what’s actually valuable data vs data that has the value of an 80s informercial.

vondur · on March 18, 2019

I guess there is something to using a file system like ZFS?

jakelazaroff · on March 18, 2019

Purevolume shut down almost a year ago [1]. I was in high school during its heyday (and the earlier side of the years MySpace just lost). Those were formative years for my generation, and I have so many fond memories on both of those sites. It's so, so depressing that all of that stuff is just gone from the world.

[1] http://www.brooklynvegan.com/purevolume-shutting-down/

lukevers · on March 18, 2019

Found out about this last night. I wish I knew about Pirevolume shutting down. So much local music and dumb things my friends and I wrote gone.

sitkack · on March 18, 2019

You can email support and they will find the purchase. Or if you can dig up the email from the purchase, the download links should still work.

Tsiklon · on March 18, 2019

I've noticed some artists marking certain releases on Bandcamp as 'private' they're all at the bottom of my collection - usually if the record is picked up by a larger label, did that artist delete them or mark them private?

josteink · on March 18, 2019

> Always archive your digital purchases when you can.

Absolutely, otherwise you might as well just be renting.

Also worth noting for this to be an option, you can’t purchase DRMed media. What good is your backup if you can’t decode, authenticate or play your own media?

baroffoos · on March 18, 2019

If you are looking for super obscure stuff then soulseek is the place to check. If you are looking for less obscure stuff but want the maximum quality then redacted is the place.

aasasd · on March 18, 2019

I was going to ask how Redacted compares to Waffles, but it looks like Waffles is almost a decade older, which likely translates to the number of releases pretty well.

Though, Waffles had a sizeable fall in activity in the last couple years—notably coinciding with the closure of What. One would think that What-expats would bring in fresh blood.

mr-ron · on March 18, 2019

After what died, there was a flurry of invites to both waffles and redacted. However, waffles went down for some reason for a few days right when everyone was looking, so everyone went to redacted

coroxout · on March 18, 2019

It went down and lost control of its domain, eventually reappearing at a different TLD. A lot of people probably got lost in the move.

fao_ · on March 18, 2019

Is that a literal site called redacted or did you redact it?

sincerely · on March 18, 2019

http://redacted.ch

https://interviewfor.red/en/index.html

tqkxzugoaupvwqr · on March 18, 2019

A quick search leads me to believe it is an invite-only torrent tracker for music.

baroffoos · on March 18, 2019

Yep, its the replacement for what.cd

crucialfelix · on March 18, 2019

My best friend from high school contacted me on Myspace. He had uploaded all his music, each band with a separate profile for posterity. Then he killed himself. I found out on Facebook. We formed a group and mourned in Facebook. Myspace died soon after that. Castles in the sand.

cozzyd · on March 18, 2019

And a million people who can't remember their MySpace passwords to delete their angsty teenager MySpace accounts jump with joy.

dredmorbius · on March 18, 2019

Probably already reprieved:

In 2013, MySpace suddenly purged most of its users’ content, including blogs, custom profiles, videos, and posts. There was no sunset, no death announcement that would allow active users to round up their data. It was an astonishing and quietly reported loss.

https://thebaffler.com/salvos/404-page-not-found-wagner

buboard · on March 18, 2019

I don't know why we don't have "digital safes" for our lives? People spend so much time creating those streams of data and just leave them hanging wherever on the internet. It is just not typical behavior of humans to leave all their stuff on the streets. Current consumer-level storage devices aren't very safe for very-long-term storage ; i wonder if anyone is working on some kind of optical-based device or something else.

namibj · on March 18, 2019

BD-R is only plagued by delamination, which is gradual and can be countered by ~yearly visual inspections looking for delamination combined with some redundancy.

verytrivial · on March 18, 2019

M-Disk is a somewhat pricey but apparently legit archival format that appeared (from my point of view) just as the DVD/Blu-ray external writers market collapsed.

I now have an external HDD in a portable fire-safe that I know could go from 100% working to 0% working at any moment. The thing I liked about optical is you could have some hope of recovering most data as the media degraded, and basically all with judicious use of ECC. It's a shame.

jstarfish · on March 18, 2019

Those safes are prone to mold. Not a good place to store backup mechanical equipment unless you compensate by opening it often and re/placing dessicant packets.

SmellyGeekBoy · on March 18, 2019

Is a NAS with RAID not safe for very-long-term storage? That solution has served me well so far, anyway.

eythian · on March 18, 2019

No, the regular home consumer won't know what to do in 5 years time when one of the disks is failing but all they can buy is SATA when it's expecting an IDE connector.

Also, next to no one knows what those terms even mean.

dd36 · on March 18, 2019

MDisc

lemcoe9 · on March 18, 2019

Like a CD or DVD?

JeremyBanks · on March 18, 2019

IIRC The lifespan of many CD-Rs isn't reliably above ten years.

mixmastamyk · on March 18, 2019

Mine from the late 90s work fine, for now.

JeremyBanks · on March 18, 2019

It seems like higher-quality CD-R should be able to last many decades, but some were produced using lower quality materials that are not as durable, and have been known to fail in as little as ten years. I haven't found a great reference, but it's discussed a bit at https://www.canada.ca/en/conservation-institute/services/con...

They could be a safe choice if you know exactly the type of media you're recording on. I'd just like to caution people against assuming by default that their old CD-Rs will be stable long-term.

_ps6d · on March 18, 2019

This seems to just be blogspam that links to two different reddit threads, and one's over a year old. The whole story appears to be based off a single screenshot of an email from over 7 months ago [1], where you can't even see the actual question they're responding to.

Is there a better source for this claim somewhere?

[1]: https://www.reddit.com/r/techsupport/comments/7uiv8b/myspace...

AnotherGoodName · on March 18, 2019

It's literally a banner on https://myspace.com/ right now

"As a result of a server migration project, any photos, videos, and audio files you uploaded more than three years ago may no longer be available on or from Myspace. We apologize for the inconvenience. If you would like more information, please contact our Data Protection Officer at DPO@myspace.com."

_ps6d · on March 18, 2019

Weird, I don't see that banner. Here's a forum post that mentions it from over 7 months ago though, and it sounds like it might only appear for visitors from certain locations: https://hydrogenaud.io/index.php/topic,114746.0.html

Either way, the message in that banner makes it sound both more severe (includes photos and videos too) and less certain. It's also strange that this is suddenly getting attention today if that banner's been there for so long.

AnotherGoodName · on March 18, 2019

Oh that's weird. I'm also getting the EU cookie acceptance banners as well. I'm in Australia (definitely not in the EU). Perhaps they tied the data deletion banner into the EU cookie acceptable banner?

You know I'm starting to think that MySpace isn't well run...

Screenshot of the banner i see for reference: https://imgur.com/GDrYqST

seltzered_ · on March 18, 2019

One of those threads links to a support article that seemingly hints at data loss: https://help.myspace.com/hc/en-us/articles/202233310-Where-I...

It looks like one of their triages has been to link to YouTube copies of tracks if available.

rchaud · on March 18, 2019

I suspected that this had been the case for years now. A few years back, when I heard Myspace had "relaunched" with a focus on music artists, I checked it out, as in the 2000s MySpace was basically what Bandcamp is now for independent artists; streaming music with options to buy tracks.

The artists' pages were completely blank, barring a few pictures and a description extracted via the Wikipedia API. No music available at all. This is was in stark contrast from the original MySpace days when the profile pages would be chock full of streaming songs, tour announcements and interactions with fans.

dman · on March 18, 2019

Geocities, AngelFire, AudioGalaxy, Napster shutting down taught that to my generation.

sandes · on March 18, 2019

"Lost". I think storage servers are so expensive and they've decided remove data

bitxbit · on March 18, 2019

Just curious to see if there are any sort of government or private ‘Day 0’ instructions out there to help rebuild the world or to preserve human knowledge in case of apocalypse?

namibj · on March 18, 2019

Made it my lifegoal. Currently fighting minor issues of work/life balance to gain income security to dedicate significant time to it. Currently looking for dual-layer (if sufficiently cheap, also dual-side) BD-R sealed with desiccant and buried in suitable bodies of water (temperature stability against temperature-swing-induced fatigue of the data layers; yet retaining relative ease of access). Predicted cost 15€/TB (incl. sales tax) @ low redundancy (save against non-deliberate attacks via conventional weaponry), roughly doubling for resilience against multiple isolated thermonuclear devices (even in pathological locations w.r.t. the archive) or tripling (45€/TB) for a current-arsenal worst-case WW3 not deliberately targeting the archive. All assuming mass-migration after 50 years.

michaelgrafl · on March 18, 2019

What are your thoughts on M-Discs?

dpcx · on March 18, 2019

This is probably related to what you're asking about? https://www.archiveteam.org/index.php?title=Main_Page

ForHackernews · on March 18, 2019

http://the-knowledge.org/en-gb/similar-projects/

_ooqq · on March 18, 2019

...MySpace lost these, too. Damn you Tom, I'm not proud of you.

actionowl · on March 18, 2019

Thanks Tom!

seltzered_ · on March 18, 2019

If one has a few archived tracks from MySpace that are now lost, what’s the appropriate way to share them? Contact the artist? Upload to YouTube? Or archive.org?

Nemo_bis · on March 19, 2019

archive.org is fine.

sytelus · on March 18, 2019

Storage cost of hosting 1000 songs on web is about $1/yr. If these tracks are not generating at least that much revenue then you are running operation at lost. I would wonder if hosting these much data without having revenue streams to support the cost would have been viable for long term anyway.

fao_ · on March 18, 2019

Doesn't mean they couldn't have archived it. Many places like archive.org or archive team will host it for free.

shmerl · on March 18, 2019

Which reminds me about Geocities MIDI collection set up by Internet Archive:

https://archive.org/details/TheGeocitiesMidiCollectionVersio...

gwern · on March 18, 2019

Are there any estimates of what the total loss is? And what else is covered by 'some' when they say 'We completely rebuilt Myspace and decided to move over some of your content from the old Myspace.'?

didgeoridoo · on March 18, 2019

Why refer people to the DPO? Her role is to ensure GDPR compliance, not to make sure data migrations go smoothly. Feels like a bit of buck-passing.

spydum · on March 18, 2019

Agree - that job just got a bit easier (less data to protect)!

eurticket · on March 18, 2019

Do people still use myspace?

r721 · on March 18, 2019

Alexa Global Rank 4,244

Alexa Rank in United States 2,079

https://www.alexa.com/siteinfo/myspace.com

SimilarWeb Global Rank 5,260

SimilarWeb Country Rank United States 1,644

Total Visits ~ 7.53M

https://www.similarweb.com/website/myspace.com

isostatic · on March 18, 2019

Many moons ago granny used to have 'alexa toolbar' junkware installed as one of the many internet explorer toolbars.

I assume Alexa has moved on from such tracking methods?

r721 · on March 18, 2019

They apparently use extensions now:

"Alexa's traffic estimates are based on data from our global traffic panel, which is a sample of millions of Internet users using one of many different browser extensions. In addition, we gather much of our traffic data from direct sources in the form of sites that have chosen to install the Alexa script on their site and certify their metrics."

https://www.alexa.com/about

"Q: What is the “data panel”?

A: Alexa’s data panel is the sample of global internet traffic that is used to calculate Alexa Ranks and estimate non-Certified metrics. The panel is comprised of millions of internet users using one of over 25,000 different browser extensions."

https://blog.alexa.com/top-questions-about-alexa-answered/

SimilarWeb works similarly too (I actually like it a bit more):

"We leverage hundreds of sources which we categorize into 4 distinct groups: 1. Global Panel Data from hundreds of millions of desktop/mobile devices 2. Global ISP Data from partners with millions of subscribers 3. Public Data Sources from over a billion sites and app pages every month 4. Direct Measurement Data from hundreds of thousands of sites and apps"

https://www.similarweb.com/ourdata

isostatic · on March 18, 2019

That really doesn't feel representative

int_19h · on March 18, 2019

Not as such, but there was a lot of music posted to it back in the day when it was the primary channel for bands to share with their fans. That music is no less valuable than it was. So yes, people are (were?) still using it to listen to some of that.

baroffoos · on March 18, 2019

Last time I checked they seem to have rebranded as some kind of entertainment news website.

Theodores · on March 18, 2019

I did about a week ago. I wanted to get a 'share on myspace' widget for comedy value.

What I found disturbing was the login screen that had more than the usual cookie notice:

> I understand that if I choose to post or share any sensitive data (defined as data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data, data concerning my health, sex life, sexual orientation, or children’s data) on Myspace that Myspace will process that data in connection with making the Myspace services available to me and expressly consent to such processing.

I can understand them wanting to ban my far-right-wing-neo-capitalist ramblings about Brexit, my racist 'jokes' and my Heaven's Gate religious rants but 'trade union membership'? Is collective bargaining in the workplace forbidden already?

What I found even more disturbing was how I was one or two clicks away from getting a prostitute. I don't ever get that on the normal internet I know.

I clicked a few other profiles, seems there are marketing losers who didn't get the memo and that is about it within the several astronomical unit search radius.

There are also the 'stars' on there and you will see the same faces. Katelyn Ryan is like the new "Tom Anderson" but I have no idea if she is a bot or if she wants to have my babies. There is no way of finding meaningful information out for that 'connection'. So to 'connect' with a zombie on myspace you would have to Google them.

It is a very bizarre website. It is built by zombies for zombies. And to think there was a time when I could find people on my street and say 'hi' to them via myspace, for them to be real and communicative.

Why they don't just pull the plug I do not know. Presumably they prefer paying the bills. It is shocking to find such a soul destroying site when you think how much time and money has gone into it.

I think mySpace is worthy of study. Boring Facebook blue was what people wanted, design and redesign didn't keep them on mySpace. Adverts were another thing, by the time it had to pay the billions the likes of Murdoch had spent on it the adverts had to be laid on way too thick. We get told the failure was in allowing user generated themes but that was truly creative rather than selfie-narcissistic. People were engaged in something, not zoned out scrolling through empty lives.

I didn't get my myspace share button. Which is stupid. how hard can it be for a social media site to have a 'share on service x' button? I only wanted one for retro comedy value - the old Twitter 't' logo instead of the bird, the aol email address etc. - just needed that myspace link to complete the set.

MagicPropmaker · on March 18, 2019

Who is this "Dr. Jena Jentzsch"? Is she the person responsible for losing the data, or is she just the contact for people now?