Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Oldest Search – Search for the oldest result on internet (oldestsearch.com)
295 points by jarrenae on May 10, 2022 | hide | past | favorite | 87 comments
Oldest Search is a custom google search that specifically targets the oldest entries available. I'm always curious about the first entries for certain data on the internet, it's a valuable perspective builder.

I personally like news articles that have been digitized that were written in the pre-internet era. Unfortunately some results don't always work well because pages have been dated incorrectly. For example, searching "Covid" shows recent results.

I launch new projects like this daily: small tools to increase human agency. I'm also very open to suggestions to improve!




I learned something new about my father through an old news article that came up when searching his name. I won’t say what it was, but it is something personally meaningful to me that explains some childhood memories that never made sense to me before.

For curiosity I tried the same search for his name on google.com and went through about ten pages of results without finding this link.


Thank you so much for sharing. I didn't expect this at all, and this is probably the most rewarding thing to hear that has come out of my work personally in a while.

Best wishes.


This turned up the old school year books for my dad, including some photos I would probably not have seen otherwise. Saved into my family archives now as something for my son to see about his grandpa in the future.


I momentarily thought "Me too", but it turns out there was a kid about 10 year younger than my dad who went to the same-named school at the same time but in a different State.

So ultimately all that searching his name revealed was that he's never attempted to cultivate an online presence.


Thank you. You inspired me to do the same. Nice to remember what my parents accomplished.


Rhizome


The results aren't very meaningful. Seems like it is relying on self reported fields from sites, which don't make much sense in this context. So far I have found:

- LinkedIn profile results from my own name, all attributed to the 90s and early 00s (before LinkedIn even existed)

- Backdated or incorrectly dated blog posts (most of them on very modern sites)

- Wikipedia articles where the reported date is the release date of the film/book rather than when the article was published

- Results from journals which key on the publishing date of the journal rather than when the site was created

I'd expect a site like this to show me the first mention of a word or phrase on the internet, which obviously seems impossible (though maybe we could at least accurately go back to when we first started large scale indexing).


It's Google's fault, since they let sites backdate themselves and take the date as gospel. Luckily it's not all that widespread/deliberate, since backdating your site is basically asking Google to downrank you.

I'm sure Google must keep track of things like date of first index, but if they do they don't expose it.


Years ago, Google released site where you could search one of their very early indexes of the web. I think it might have been around 2010 when they released it?

I haven't been able to find any reference to it since, but I might be misremembering some details.

Does anyone else recall something like that?

EDIT: Finally found it by searching HN, but sadly it's gone: https://news.ycombinator.com/item?id=319992

It was an index from January 2001. In 2008, they released a site to search it for their 10th anniversary.


There used to be an internal tool to view every crawl, ever, on a time slider. Like archive.org but ridiculous depth.


Wow, that would be such an incredible historical resource.

Archive.org often has a scrape of the data, but the URL it was once hosted at is lost to time.


Why is it flashing a bluescreen at me every 25 seconds?

https://oldestsearch.com/blue.png

Code from their site:

    setInterval(() => {
      app.style.backgroundImage = "url(https://oldestsearch.com/blue.png)";
      setTimeout(() => {
        app.style.backgroundImage = "url(https://oldestsearch.com/bg.png)";
      }, 200);
    }, 25000);


It's the blue screen of death[1]. I guess it's paying tribute to the time when this used to be a frequent occurrence on Windows, which was contemporary to the early web. But yeah, flashing it that fast would confuse a lot of people.

[1] https://en.wikipedia.org/wiki/Blue_screen_of_death


That was fun. I googled my own name and got a bunch of articles written about reddit that quoted me, as well as the time I was quoted in the New York Times as a college freshman talking about how computers/email will completely change how students interact with each other and their professors.


Nice that some sites better remember ancient history than I do!


Haha, time definitely proved you right!


Childish as it is, I'm surprised that the earliest result for "goatse" (oh come on, someone had to do it) is 1/31/2001, considering both it and Google were around in the late 90's.


Thank you for your service


Wikipedia says goatse.cx was launched in 1999 but I'm fairly sure the picture is older than that - I saw it posted on an early discussion board which, IIRC, was around 1997.


I've seen a lot of results dated 2001-01-31 that were not published then. I'm guessing it's a default somewhere that gets incorrectly picked up.


Word of warning, don't look this up.


The key to this is just searching the archives of Usenet. That will almost guarantee you the "oldest" result since you're seeing essentially 20th century social-media-message-board streams that existed long, long before the web.


>The key to this is just searching the archives of Usenet

That's interesting... but how exactly do you do that? Seems like the search engines I tried never return anything more than 5 years old.


https://www.usenetarchives.com/ is a good resource for that!

EDIT: seems like search there is broken.


I wonder what they mean by "oldest entries" after doing a query for "Apple II". The first two results were this day in history style pages, with the first page having a publication date in 2019. The third pointed to a Wikipedia article, and the next two pointed to abstracts of articles published in the years following the introduction of the Apple II (which I suppose is valid, even though it wasn't what I was expecting).

In order to be a meaningful tool, there needs to be a meaningful definition of oldest entries. Pulling up pages with accurate dates, such as the abstracts, or even when the pages were originally indexed would be far more useful. (Since I am assuming that it is next to impossible to ascertain when content was actually posted.)


Agreed, this is a sort of proof of concept version, and ideally we could more effectively filter out inaccurately dated pages.

Like I mentioned in my original comment, we're performing a custom google search, so unfortunately we're reliant on those results. In a future I'd love to add additional sources like usenet and historical documents!


I wonder what the cost per search result is on a query like this, compared to the average query. I can only imagine the cache misses, and deep digging, that this is causing!


I honestly need the opposite of this most the time. It's almost muscle memory for me to do `Tools -> Any Time -> Past Year` on any search for technical information.

A lot of times I wish I could do 2 or 3 years ago but the Custom Range dialogue is garbage.


"You need to enable JavaScript to run this app."

This seems especially unappealing for people who like the old web.


Super cool idea! Thanks for sharing.

TI searched for "fish" and a result from PubMed from 1977 was the first (non ad) result. It seems to be taking the paper publication date rather than the date it was put on the internet (which is fair I suppose). Similarly a result further down was from poetry foundation and the date seemed to be the poem publication date rather than the page. Just not exactly what I was expecting at first.


Definitely does not work for everything - Google likes to see dates on the site and associate it. Search "Super Mario 64" for a direct example.


Agreed. Better filtering would be helpful. I'd also like to add Usenet and book searches eventually as well.


It's unfortunate that there's like, an entire dark age of the internet of sorts on something like this because so much of the "eternal september" era of the web was on geocities and the vast majority of that is just gone now. Looking for anything on here that you'd recognize from that period and demographic is kind of depressing.


With this I can find stuff of mine back to about '95, but Google search still finds posts I made on Usenet in the 1980s.


In another one of my same day projects I'd like to add usenet results as well.


Way better search (or is it content?) on certain keywords. Feels like all the noise in `Latest Tweets` is filtered away.


Top result for "IBM" is dated March 31, 1983. But it's cheating, see for yourself https://www.science.org/doi/10.1126/science.220.4592.43


Top results for folk music are from the NYTimes... in the 70s.

So it is indeed the "oldest results that are on the Internet" , rather than the "oldest Internet results".


I just read the most interesting article about Lee Trevino from searching 'golf swing'. It was published in 1974.

[0] https://www.nytimes.com/1974/09/15/archives/how-to-help-golf...


For some things, like "lisp", it goes back to the 70s. That seems correct.

For others, like "slashdot.org", it only goes back to the turn of the millenium. It can't really be that the first thing indexed on Slashdot.org was CmdrTaco's statement of the sale of the site, can it?


Something is wrong.

Obviously I put in my own family name because I know both my brother and I have been active very, very early. Here's Google on the name 1993-1997: https://www.google.com/search?q=n%C3%A9gyesi&rlz=1C1CHBF_enC...

Here's oldest: https://cse.google.com/cse?cx=ae9362ae18f003da9#gsc.tab=0&gs... starts at 1999.


It's not flawless unfortunately. We're simply performing a custom google search. Eventually we'd like to improve the search filtering.


Some of you found cases where it says that something is more recent than it actually is.

I find it funnier when it finds results that are older than the real thing. How's that possible? For example, for "Tiktok", first result:

Make Your Day - TikTok: 9 Feb 1996 ...


I also searched Tiktok and my first result was a BBC News story called 'What is Tiktok' from 1991!

I suspect it's a typo in the publish date field.


Many websites abuse dates to try to display old stuff as recent, but I've never seen the opposite... after a few searches I see it's certainly a thing (less so in content other than in English).


> "Covid" shows recent results.

Covid wasn't around until 2019, so I don't imagine what you'd expect to find earlier than that? It's funny how it mis-dates a lot of the entires as being from decades earlier though.


It's exactly why it's a perfect test search to see which sites have inaccurate dates.

My personal theory is that blogs and companies set older dates on purpose to appear as a more legitimate source. But I have absolutely no evidence to back that up. I'd like to make results more accurate by adding other sources eventually as part of another one of my same day projects.


Pretty sure Reddit does that for SEO purposes. 5 year old threads (with no recent comments) have started showing up on the google serp even when I’m using a date filter to exclude everything >1 month old. The filter works for most sites, but a few consistently get through.


May be the address hasn't been indexed until now though (or reddit has updated their URL schema, causing everything to be "published" again)?

Dating a website is actually fairly tricky if there's no explicit <time>-tag (which there rarely is).


Off-topic, but I always assumed there was some consistent disease naming convention in use here and that there must have been earlier instances of "COVID-YY", given that this isn't the first coronavirus disease that has been observed, but I couldn't find any examples of another one.

This is from an article [1] on Coronaviruses from the NIH website:

>SARS coronavirus (SARS-CoV), which emerged in November 2002 and causes severe acute respiratory syndrome (SARS); MERS coronavirus (MERS-CoV), which emerged in 2012 and causes Middle East respiratory syndrome (MERS); and SARS-CoV-2, which emerged in 2019 and causes coronavirus disease 2019 (COVID-19).

So SARS-CoV causes SARS, MERS_CoV causes MERS, SARS-CoV-2 causes COVID-19. I guess they didn't want to call it SARS-2? Who is "they" - who coined the term COVID-19?

Wikipedia [2] explains:

>During the initial outbreak in Wuhan, the virus and disease were commonly referred to as "coronavirus" and "Wuhan coronavirus", with the disease sometimes called "Wuhan pneumonia". In the past, many diseases have been named after geographical locations, such as the Spanish flu, Middle East respiratory syndrome, and Zika virus. In January 2020, the World Health Organization (WHO) recommended 2019-nCoV and 2019-nCoV acute respiratory disease as interim names for the virus and disease per 2015 guidance and international guidelines against using geographical locations or groups of people in disease and virus names to prevent social stigma. The official names COVID‑19 and SARS-CoV-2 were issued by the WHO on 11 February 2020. The Director-General, Tedros Adhanom explained that CO stands for corona, VI for virus, D for disease, and 19 for 2019, the year in which the outbreak was first identified. The WHO additionally uses "the COVID‑19 virus" and "the virus responsible for COVID‑19" in public communications.

So I guess it really is the first official "COVID-YY", although we have had diseases in the past that might have been given similar names (SARS -> COVID-02, MERS -> COVID->12) had the current naming guidelines been in place.

[1] https://www.niaid.nih.gov/diseases-conditions/coronaviruses

[2] https://en.wikipedia.org/wiki/COVID-19


A certain influential country didn't want a new SARS relative associated with them so they told the WHO to change the name.


Stop it, WHO issued best practices for naming new human infectious diseases four years prior - in 2015.

https://www.who.int/news/item/08-05-2015-who-issues-best-pra...

>Terms that should be avoided in disease names include geographic locations (e.g. Middle East Respiratory Syndrome, Spanish Flu, Rift Valley fever), people’s names (e.g. Creutzfeldt-Jakob disease, Chagas disease), species of animal or food (e.g. swine flu, bird flu, monkey pox), cultural, population, industry or occupational references (e.g. legionnaires), and terms that incite undue fear (e.g. unknown, fatal, epidemic).

>WHO developed the best practices for naming new human infectious diseases in close collaboration with the World Organisation for Animal Health (OIE) and the Food and Agriculture Organization of the United Nations (FAO), and in consultation with experts leading the International Classification of Diseases (ICD).


The date is when the url was first crawled, not the date of the current content.


I got a result dated 14 April 1981. It obviously wasn't published then, but did contain the words "born in April 1981"


I think google has some quirks with the last modified header in HTTP.

When doing a couple searches where I know they've existed before the listed date (e.g. astalavista, jabber ccc, MIT opencourseware etc) it seems that google chimes in with quotations in other sources like pdf files or papers and if the pdf file is from an earlier date, it seems to recorrect the linked URL's last modified HTTP header date.

Interesting to observe, thought this was worth mentioning.


I don't get it. It pulls up journal entries from the 1970's on PubMed. Are those "on the internet"? What about Google Books results?


Very well done, this brings back a lot of memories and nostalgia.

For example, searching for: jeff bezos

returns the 1997 shareholder letter.

https://media.corporate-ir.net/media_files/irol/97/97664/rep...


The first result when searching my own name was a court record from 1980 (before I was born) where my name-fellow was convicted of selling cocaine.

Got a good chuckle out of that. I have a fairly uncommon name, but not uncommon enough that I don't share it with a few low-key famous people. But before all that, one of us was out there slinging snow. Small world?


I was surprised to see "covid" has a bunch of results from 2002 onwards; looks like CDC updates their dates weirdly

https://www.cdc.gov/coronavirus/2019-ncov/easy-to-read/index...


Mmmm how does this work? Searching for Harry Potter gives Wikia results which aren't the oldest one out there.


Found this gem by searching "bsod" – https://www.itprotoday.com/compute-engines/q-how-can-i-chang... – had no idea!


So you found the easter egg on https://oldestsearch.com

Every 30 seconds the background flickers


Indeed!


Nice idea though I tried with "ocaml" and one of the top results is:

    ocaml/num: The legacy Num library for arbitrary-precision ... - GitHub
    GitHub › ocaml › num
    31 Jan 2001 ...
Which is... seven years before GitHub was launched!


You should activate safe search for this. The oldest results on the internet for many women with distinct names may well be revenge porn... I was able to find an example of this pretty easily, it was a first-page result for an uncommon name.


amazing how some discussions boards from 2000 are still intact

goes to show how permanent the internet can be


Has results from early 2000s with my wife having my last name. Except she didn’t have my last name until 2015. She wasn’t even on social media until we were together after 2011. Something is off, at least for personal searches.


I found an Escape Velocity plug in https://artsforge.com/spaceport/override/home.html


Well these results are clearly wrong.

Everything looks like it's coming from January 31, 2001, and the third result for "meme" is the 2019 University of Tennessee's 2019 Women's Basketball Team roster.


For my search:

Several results from 2001, when the oldest result should be circa 2012.

One result was an indexed tweet, correct date is May 4 this year, but it was incorrectly dated as 2001.

However, I love this concept - it just didn't work for me.


I searched for covid 19 and got articles from 2001, they all look recent though


They URLs were first indexed at that time, they likely edited the content of those pages. Newspapers have begun doing that with articles too, "updating" them. This is why we urgently need a permanent archive of news and government sites. Most of the data released on the internet is currently lost, I think I saw an estimate claiming something like 80% of internet data is lost every 20 years.


google should productise this... for every decade if it's easier on infrastructure, snapshot it and preserve, hell work with waybackmachine.. I know it won't bring back geocities but hey.


state gov does it pretty well, what a trip down the memory lane

https://2001-2009.state.gov/index.htm


Interesting first result for 'wiki' https://i.imgur.com/b6nAaxD.jpg


The oldest porn it could find was from 2005. That's not very old. I have fond memories of being kicked out from a lab in school in 1998 for looking at porn.


Searched up my family name and all I got is some unrelated person, hundreds of clothing sites (my last name is a piece of clothing in Polish) and some censuses


Very fun to see how trends got started! Searched quantum computing and the oldest result was Peter Schor's paper on what would become Schors algorithm


I searched for Bitcoin and results came up for a Bitcoin etf and the timestamp claims it’s from 2001 but that’s definitely not the case.


My friend you just made me tear up a bit after searching for my name I found a bunch of old private world of warcraft servers I used to be a part of.

I must continue building my app https://github.com/sergiotapia/ekeko I think it will be a benefit to a lot of people, mostly myself.


search for 'green tea' puts sam's club at the top of the list for it's lipton diet citrus brand. it was posted in 1977 and is still for sale!

Must be some miraculous high falutin green ass gingery spice tea that will make a camel fart louder than a donkey oinks.

=p

I love this@ .


Nothing was posted in 1977. The web didn't exist until 1991; some sites might map to FTP archives older than that, but the first Sam's Club opened in 1983.

and of course you know that...


"How can the net amount of entropy of the universe be massively decreased?"


You can find some neat references to rather hard to source books.

It's kind of cool


Search for "google" yields mostly results from 2005


Seems to have flaws, e.g blockchain yields anachronisms


OK this is amazing


[deleted]




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: