GIF site Gfycat announces mass deletions, threatens Archive Team with lawsuit

dredmorbius · on Nov 9, 2019

I've been on several sides of this type of situation.

As a content provider, what makes large-scale hosting possible is in large part the fact that access follows a very strong power distribution (Zipf's Law), with a minuscule fraction of content accounting for the vast majority of traffic. Front that on your CDN and the backing store is only occasionally accessed as cache expires or something becomes Teh New Hawtness.

Meme-hosting services are particularly suitable for this, for some reason....

When an archive project starts, suddenly assumptions on which design, service, and provisioning decisions have been made fly out the window as everything becomes popular. It's a bit like the proverbial monkey trap -- there's only so much that can squeeze out through the pipe at once.

https://coaching-journey.com/coaching-story-monkey-trap/

And yes, such attempts can very much look like a DDoS to site engineers and ops teams.

As an archivist, it's immensely frustrating to get scant, no, or rapidly-changing reports of content culls or service EOLs. Gfycat's problem here is somewhat self-inflicted as the window for deletion is so short. ArchiveTeam's efforts would have far less effect if spread over more time.

As user of online services, finding my own content, and interactions with others, suddenly missing (and having to figure out how to fix such issues as broken image links) is immensely frustrating.

Or, conversely, there's the wish for content posted years ago to simply die an honorable death. The fact that many services (HN among them) don't provide a reasonable way to delete old content is problematic.

The fact that Gfycat have immediately jumped to threat of lawsuits, and don't appear to be talking with Jason Scott on Twitter, increases tensions and disappointment levels. Being reasonable, understanding, and human makes much of life vastly more tolerable.

HoppedUpMenace · on Nov 8, 2019

On a somewhat related topic, I noticed massive amounts of gifs that I would typically discover through Android's keyboard option (hooked up to Giphy I believe) were deleted sometime last year (tons of content, even obscure stuff, related to pop culture and movies, even DBZ abridged stuff). I was not sure if this was done by the website hosting these gifs or the users themselves but it was such a large number across so many genres that I suspected the website scrubbed them for whatever reason (also cannot be found through the main website itself).

Additionally, I've been looking for gif hosting websites and Google does not make one mention of Gfycat anywhere in their search results for keyword "gif websites", and this is after going through the first 5 pages of results, which seems a bit odd to me.

edit: I found Gfycat on the 7th page of Google search results, interesting...

walrus01 · on Nov 8, 2019

Good luck to them blocking any well executed, geographically distributed and well architected scraping/mirroring effort. If you're willing to ignore a robots.txt file and do whatever is necessary to mirror content from publicly accessible http daemons on the internet, there's a lot of ways to do it.

It sounds like the initial effort from archive team was a bit too concentrated and strained server resources.

bdcravens · on Nov 8, 2019

They have 10 days before the deletion takes place.

"On Nov 18, we are planning on permanently removing anonymous content that meets the following criteria: 1) less than 200 views, 2) older than one year, and 3) anonymous (made w/o an account)"

https://twitter.com/GfycatHelp/status/1191770073259577344?s=...

jumelles · on Nov 8, 2019

https://twitter.com/textfiles/status/1192518085997137920

"On November 5th, @Gfycat announced they were going to delete scads of content off their site in 15 days. @archiveteam began trying to download some of the at-risk works. @gfycat has threatened to sue and is demanding compensation for the downloads. We have stopped the project."

mr_toad · on Nov 9, 2019

I don’t think I’ve ever seen anything on Gyycat that was actually uploaded by or with the permission of the copyright holder.

Traster · on Nov 9, 2019

I can understand the motivation to archive the content about to get deleted (although I'm skeptical of the idea it has any value), but it sounds like they basically took no consideration of whether their archive effort would flood the site they're archiving. I think it's fair for Gfycat to be pissed off about suddenly being flooded, it's not worth a lawsuit though. Having said that, the content and website belongs to gfycat and they can choose who they serve that content to.

ajayyy · on Nov 10, 2019

They definitely did. They said they did contact Gfycat and they said that they would extend the deadline a bit. They took that as them being okay with the archive and went fully forward at the speed to be able to get the archive done in the amount of time left.

See https://dd.reddit.com/r/DataHoarder/comments/dt3aom/archive_...