Hacker News new | past | comments | ask | show | jobs | submit login

How do we back up the Internet Archive? If they lose this case then it will be like losing The Great Library of Alexandria.

This is our culture. This is our heritage!

Edited to add: when I say “back up”, I mean preserve the data and the archival mission (minus the legal quagmires).

The technical challenges can be solved, and we should do this before it’s too late. It seems there was a previous effort, but it has lost momentum: https://www.archiveteam.org/index.php?title=INTERNETARCHIVE....




This is how you backup the Internet Archive: https://www.archiveteam.org/index.php?title=INTERNETARCHIVE....


“IA.BAK has been broken and unmaintained since about December 2016. The above status page is not accurate as of December 2019.”



A conversation about this is the stickied thread, in fact.


There are many other digital archives, and that is how the "single point of failure" problem you envisage is solved.

The Internet Archive is AFAIK the biggest, and the first - the important thing is for others to continue its good work (many have done this for years) - and not for us to rely too much on it or any other single archiving initiative.


It's possible that over the long term it could be backed up via torrents. However, in the short term the only way to save the archive is for IA to sell it. I hope they swallow their pride and do so.


Torrents are a distribution mechanism, not storage. Many IA items already have torrent links available (with a fallback to a "web seed" since they are mostly not seeded).


Roughly 44 million items are available via torrent (I maintain a catalog of IA items independent of IA, with each item's torrent file). Wayback data is not (to my knowledge). This is important, as IA then can act as a global metadata catalog for the items, with the underlying content being served up through an uncoordinated fleet of seeders. I think many might agree that the time has arrived for this data to live on globally distributed storage nodes.

It would be helpful if IA published Wayback data files over torrents, alongside cryptographic signatures of the files (for attestation and provenance purposes, as Wayback data has been used in legal proceedings and you would want that trust in the data maintained regardless of where the bits were retrieved from for hydrating the WARC client side).


I kind of hate to think that someone would want to buy it and attempt to turn a profit.


This is abuse of copyright.

IA did bad, and should not be rewarded for this behavior.


And (modern) copyright is abuse of a legal system.


I don't like copyright either; I think that it is bad and it (and patents, too) should be abolished. However, there are some other opinions (for example, I have seen the suggestion that you have copyright limited to only a few years, that after the first year you have to pay a fee, and other changes to improve it), but I would be OK to just abolish copyright entirely. I don't want to copyright my own writings either, and would rather be public domain, so that is what I do.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: