Hacker News new | past | comments | ask | show | jobs | submit login
The 1000 most-visited sites on the web (by Google) (fseek.me)
52 points by ddbb on May 28, 2010 | hide | past | favorite | 52 comments



I extracted the data, added IP, country and response headers, and dumped it into a usable format:

http://openmymind.net/top1000data.txt

You can do some decently interesting analysis..like the fact that nginx is the front-end for nearly as many sites as IIS.


Thanks for the data.

"nginx is the front-end for nearly as many sites as IIS", oh I wished that was true but according to my naive counting IIS is over 3 times more popular, am I missing something?:

$ wget http://openmymind.net/top1000data.txt

$ grep -c '"Server": "nginx"' top1000data.txt

39

$ grep -c '"Server": "Microsoft-IIS' top1000data.txt

149


Could you share how you extracted the data? I thought it was nifty that way you have it in that txt file. Thanks.


Why is dropbox 'myth and folklore'???


The categories of the json dump come directly from the original google source..so i don't know (and dropbox certainly isn't the only mis-categorized entry):

http://www.google.com/adplanner/static/top1000/


Oh god, and I've been trusting them w/ my backups!!!!!


Over what time period is pageviews calculated on?


The page views is pulled from the original google list. They have more information at: http://www.google.com/support/adplanner/bin/answer.py?hl=en&...

They don't say what time frame page view is, but they do say unique visits is over the course of 1 month - so its probably safe to assume page view is also over 1 month.


They must have filtered porn. There is no way redtube.com is not in the top 100.


They did: " Keep in mind that the list excludes adult sites, ad networks, domains that don't have publicly visible content or don't load properly, and certain Google sites. "


A curious list of restrictions... I wonder what kind of site would be in the top 1000 that didn't have publicly visible content or load properly? Pure Flash sites, maybe?


Alexa has a top 500 list with google and porn sites included http://www.alexa.com/topsites


Sorry! I must have missed that.



Not sure why that's not the main link here instead of a useless blog article that removes half the data and adds no insight or extra information whatsoever.. :-)


Amazing that Facebook has ONE THOUSAND pageviews per unique visitor compared to about 100 pageviews per unique visitor on the rest of the sites.


# 824 stackoverflow vs # 994 expertsexchange

Perhaps another confirmation of what the SO folks have been saying about their popularity.


Note that StackOverflow is listed as being in the "Music" category :-) Wha..?


I always go to StackOverflow to find out what the kids are listening to these days.


#20 - ask.com #252 - digg.com #261 - justin.tv

Reddit not in the list... What I don't understand is how come sites like openoffice.org, kaspersky.com, mcafee.com are so high in the list. Do people really visit them that often?


It's more likely that their computer does. For stuff like Auto-updates.


Ahh that probably explains counduit.com at #29 as well.


And likely HP.com as well. Stupid printer drivers.


Given how inaccurate the categorizations are I think it's possible that Reddit got put in the "adult" category.


Anyone care to do a scatterplot of domain name length vs rank?


It looks somewhat periodic:

http://images.rdujour.com/wp-content/uploads/2010/05/domainL...

I did not include the length of the tld.


Monthly uniques versus domain name length:

http://images.rdujour.com/wp-content/uploads/2010/05/uniques...


Depressingly, Mahalo is on the list (#946).


Dropbox on the list at 985. Impressive.


But reddit.com is nowhere to be seen... surprising.


My question is, if reddit were on that list, would the moderation system be able to prevent it turning into Digg? I don't think it could, although maybe the subreddit system could. Places like /r/politics/ are fairly useless for intelligent debate already. Such link sharing sites don't have the same "my friends are there" grounding force that Facebook has, it's just the "vibe" and community so running away to a new site is easily mounted.


And listed as a "Myth and Folklore" site. I wonder who came up with these "categories" - there are some crazy assignments in the list.


Keep in mind that the numbers listed are estimations.

For example, HubPages, the site where I work, is listed at #270 with 11M unique visitors and 97M page views.

Our monthly absolute unique visitors (according to Google Analytics) is more than three times that. Our monthly page views are greater than 100M.


"...as measured by Ad Planner."

the list is a decent guesstimate. unless every single site on the planet uses the facebook like button, google analytics, google ads, or something else that tracks globally, there is now way to correctly measure UC or PI.


Did you guys notice Wordpress.com and .org in there:

.com: 120,000,000 uniques

.org: 8,100,000 uniques

Impressive.


it reassures me that foxnews.com is farther down than I would have expected.

43. http://bbc.co.uk 83. http://nytimes.com 179. http://reuters.com 257. http://foxsports.com 279. http://foxnews.com


Mashable - #696 Techcrunch - #850. Nice to see I'm not the only one that prefers Cashmore to Arrington.


Why? All Mashable does is post about Twitter trends and recycle Techcrunch posts a week later.


I really doubt that usgs.gov (#978) gets anywhere near the number of visits that nfl.com gets (#993).


Everytime there's an earthquake in California, half the state goes there.


Wow, yahoo is up there. It's a shame they can't monetize their websites more.


I think their problem is that they are to big an operation. They probably could be if they trimmed the company.


Geocities still running in Japan after being shut down in the west? Wierd!


It's interesting how hard it is to come up with http://windows.com even though I tried explicitly searching for it


where s youtube?


"Keep in mind that the list excludes adult sites, ad networks, domains that don't have publicly visible content or don't load properly, and certain Google sites."


where is google in that list ?


"Keep in mind that the list excludes adult sites, ad networks, domains that don't have publicly visible content or don't load properly, and certain Google sites." reply


the daily mail at 236 aarrgghhhhh!!!!!!!


Scribd 102. Excellent. I would like to know how much financing each one of those site took to date.


google isn't on that list?


where is slashdot?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: