That sounds like a excellent opportunity for corporate IT departments to make a good haul of PR and general kudos, for making a effort to release their archived caches, wherever they may be stored due to data retention policies.
I have a huge soft spot for projects where you can get the most happening because you are not required to jump a known hurdle to usefully contribute.
Overnight I was fretting with the necessity of sorting out any residual legal issues that might attach to digging out old cache dumps.
But forgetting the fact that the very same companies are commonly invested in just the tech to sort out the problem,.
Problem being one which can present all kinds of ways.
Do you have any chance of finding adult content in your cache?
Do you care about how much you are seen surfing the competition's websites?
Will URLs reveal that anonymous forum login from 2002, slagging your rivals benchmarking?
Did you put Squid on your intranet or webmaster, without https because your predecessor thought it was on the non routeing private range?
Did you use DNS in any way to point to document resources that are accessible to users via the proxy server and Squid?
Anyhow lots of data protection suites have been used to purge archives and remove any trace of activity or files best kept private.
That's how the latest HDS kit is sold : cluster FS with hyperconverged local nodes crunching security, audit and search results.
I'm sure I'm only wishing on a prayer that you can find great troves for redecorating the empty space that the WWW once pointed to.
But imagine that we really could fill enough gaps in the dead link forests!
Even just attempting could be a superb way to promote your storage products and bless your customers who offer up the raw stores with lots of great ways to engage regular journalists with the subject.
I totally live best effort projects, and especially the ambitious culturally interesting ones.
So does Joe Public.
I'm in.
Where do we start?
(if I have the time, I'm quite serious about this, my profession is advertising not online but the traditional way of doing it. I have just got my head back from exploding with the multitude of ways to sell, promote, demo, boast, reminisce, predict, forecast, warn people about how society will crumble without personal all flash arrays... Last one might be pushing it a bit, but I see a fantastic deal of business on the back of the idea. It's beautiful because everyone is invited to participate, no vendor or company is locked out. So the public is not getting the boring official line and canned quotes. This is real people showing the technology but showing the extent to which we discard valuable culture. For thousands of years mankind grew as our means of recording documentation and thought and expression grew. Look now how easily we will throw it all away!!!! I know we can do better.