Hacker News new | past | comments | ask | show | jobs | submit login
The Destruction of the Web (jacquesmattheij.com)
156 points by bussetta on June 13, 2013 | hide | past | favorite | 96 comments



I've received a fair number of these, dating back to late last year and continuing to this day. I'm not sure when Panda came in, but I wouldn't say they've increased or decreased since they started.

The sites I run are all based around user generated content, and the links are all genuine instances of people sharing information and linking in the process. None of it is backlinks provided to benefit some third party, and we've never participated in link swapping or anything like that.

We think the link removal requests are dodgy.

Suspicion is that by and large the requests do not come from the companies actually associated with the linked site. And that when challenged those senders of the request have then squirmed, apologised and claimed the request was sent by accident.

Example: http://pastebin.com/P9tsWL0x

Basically: I believe that a fair number of these requests are from SEO companies attempting to get competitor sites a lower pagerank so that their properties fare better.

Only a minority of requests seem to come from the companies linked, and in part I wonder whether other SEOs are cargo-culting the phenomena by copying it without understanding it.

I forwarded an example to Matt Cutts a while ago thinking that this whole area feels spammy and dodgy, but I understand he's busy and must get a lot of mail.

I've not removed a single link as a result of these bizarre notices.


"I forwarded an example to Matt Cutts a while ago thinking that this whole area feels spammy and dodgy, but I understand he's busy and must get a lot of mail."

The question is why is there only "Matt Cutts"?

Google could have teams of "Matt Cutts" that are accessible and approachable.

Noting that in the traditional legacy business world (that everyone complains about all the time as the "old" way and needing "disruption") there are usually people that you can complain to about your problem and even an escalation procedure as well as monitoring at the company to make sure the right thing is happening with all the feedback.


There's actually a lot of Google employees who post around the web, take feedback and get it to the right teams at Google, etc. I'm just one of the more well-known people.

If the question is "Why doesn't Google have someone I can talk to about my issue?" then I think the answer is probably related to the scale of the web. Remember that Google gets billions of searches a day. There's also ~250M domains on the web. Even if we shifted every single Google employee to user or webmaster support, I'm not sure we could talk to everybody. That's why we look for scalable methods of communication, e.g. our webmaster console or webmaster videos.

Note that we have gotten better over time though. AdWords has phone support now, for example. And just yesterday we started sending example URLs when we send our messages about manual webspam actions. That helps because now webmasters have a better idea about where to look to solve their issues.


Thanks for you thoughts I do appreciate.

"There's actually a lot of Google employees who post around the web"

The overwhelming amount of people that would need help don't see or know who these people are.

I don't even know who these people are and I've been on the web since before there was a google. I know your name because I've seen it so many times in and around the "domain" business.

"then I think the answer is probably related to the scale of the web. Remember that Google gets billions of searches a day. There's also ~250M domains on the web."

Hence the problem. Google has tremendous power and is extremely profitable but isn't able to take care of all the people that it serves or touches. As a result people frequently feel they have gotten, for lack of a better way to put it "the shaft".

"That's why we look for scalable methods of communication, e.g. our webmaster console or webmaster videos."

One of those "scalable" methods has to be the ecosystem that serves the same function as the ecosystem surrounding Microsoft. The ubiquitous "tech guy" who you pay to figure it all out. Adwords is a good example. Easier to pay someone to do that that knows the lay of the land than to watch and learn yourself using the mass of info you need to know to do ad buys the right way.


AdWords is a bad example, its probably the most straight forward product Google offers, likely because its purpose & functionanlity is discussed in detail by Google.


That's completely unrealistic, we've been on one of the main forums for SEO for the last 2 weeks - there are a few decent ones at most (not 250M) - YELLING out that we need real help, and real communication. The have SCALED that communication down for you by front paging it.

There are, at most, a few places like this on the web you could easily focus on.

Bonus: I've written to you 4 times, tried google+, twitter, the lot. Poke poke poke.

I'm armed with a comprehensive set of example data, and have 30 years of programming experience behind me. I've been doing this with a bunch of whitehat webmasters who have a combined audience of about 10 million people a day. We have received absolutely no indication that you have even seen anything we've posted or written, or asked about.

Seriously Matt, you're not exactly reaching out. Frank.


> None of it is backlinks provided to benefit some third party, and we've never participated in link swapping or anything like that.

The people sending out the e-mails don't necessarily know that though. Google won't tell sites which links are triggering penalties against them for obvious reasons, and there are a lot of spammy sites posing as user-generated content, far too many to manually review all the backlinks.

Worse, many of those sites aren't attempts at blackhat SEO for the pages they link but are actually trying to rank for particular terms themselves, but as far as I can tell Google doesn't make that distinction - if you're linked from a whole bunch of spammy sites you will now be penalized.


buro9, what email address did you forward things from? I searched my email for buro9 but didn't see anything.


Would've been from david@ buro9.com which is a Google Apps domain.

I tend to permanently delete spam, phishing and related dodgy stuff once I've dealt with them in some way. It is just luck that I still had that one I stuck on pastebin as it took some searching to find one that I hadn't nuked.

I'm fairly sure that pastebin is one of the couple I forwarded to you (kept the original in case you wanted the headers), and I believe your email is just matt.cutts@

Edit: Just delving into my inbox, here's another from September 2012. http://pastebin.com/8N3XhLVN . I don't have too many as I am thorough in my deletions, tend to receive at a rate of one or two per week since late Summer last year.

I haven't kept track of Panda as I don't really bother with SEO/SEM beyond making content readable, but a quick search says those updates were March/April this year? Hence my point that these take-downs precede that, and given what appears to be false representation that I believe these people are using FUD to harm competitors.


Gad, I am so sick of SEO. Not Jacques article, but SEO in general.

Google owns the game. They run the game on a computer. Ergo, if you want people naturally coming from Google, you must do things its computer likes.

Only they won't tell you that. Instead, they'll offer platitudes like "write good content and the users will come" when we all know you could write great content until the cows come home and if nobody links to you, you ain't getting no traffic.

And I think it's unfair to call all these guys leeches, miscreants, or whatever. I don't like a lot of the things they do, but I also respect the fact that I live in a first world country. I have a good way of living. If I were terribly impoverished and only had to spam a lot to feed my family? I'd do it. We assume everybody else on the web lives the same lives that we do. We also are getting this quasi-religious thing going on where Google must return what I want at the top of the search results. If it does not, somebody has sinned. I'm not drinking that cool-aid.

I'm with Jacques on the solution: a new protocol and the elimination of single-points-of-failure. This thing where Google keeps updating it's algorithm and tens of thousands of people keep gaming the system has to stop. It's not healthy behavior either for Google or for the spammers. And it's destroying the web.

Sidebar: you know, if you think about it, with all the walled gardens and vendors refusing common protocols and such, the web itself is under attack from multiple angles.


Are we really worse off today than we were in 2001? Google isn't perfect, but spam and SEO existed before them. They've beaten it back pretty well in my opinion.

Can I get more detail about the "new protocol" solution? The only anti-spam protocol solution I've seen discussed is authenticated identity for every packet. So no more anonymity, but much better reputation/authority/trust. I'm not down with that. Got something else?


This is Jacques idea, but hell, I'm happy to speculate.

How about a double-blind authorship trust built into the http protocol? Put something like an anonymous secure PR inside the packets instead of sitting on top of the entire page transaction and being managed by one party.

That's plenty of hand-waving, I admit, but I think there's probably something there worth exploring.

As far as the "are we worse off" question, I'd love to show you my inbox and spam folder on my servers. The economics of this is driving us to the point where we are paying hundreds of thousands of intelligent people world-wide to spend their lives doing silly things so that Google can evolve its algorithm and we can have better search. Better for me, sure. Maybe you should talk to one of those guys who made enough to feed his family for a few months then lost it all on Panda.


> Instead, they'll offer platitudes like "write good content and the users will come" when we all know you could write great content until the cows come home and if nobody links to you, you ain't getting no traffic.

You should read this as shorthand for "build your business the traditional way and the users will come." Don't just count on ranking highly for a competitive query for your business to succeed. Build a brand and customers, and people will seek you out. You will have no difficulty ranking for [the name of your company] and this is where most good sites get most of their traffic from Google.


I'm a little baffled by the suggestions of a whole new protocol to fix this: it seems to me like there's a really easy solution.

Just say no to people who want to buy links, or votes, or ask you to take down links. Throw those e-mails in your spam folder.

I say this not as a Googler but as someone who struggled with boundary issues as a young adult, where people would ask me unreasonable things and I would comply or I would ask them unreasonable things and burden them. It is not your responsibility to feed someone else's family, unless, of course, you want to. It is not your responsibility to undo the link penalties that some spammer racked up with black-hat SEO, unless they're damaging your own site's reputation and you want to do something about it. It is not your responsibility to fix the web, unless, of course, you have a concrete improvement that you can convince people to try.

It is the SEO's job to find ways to add value without pissing people off. It is Google's job to keep their product useful enough in the face of people that want to abuse it that you keep using it. It's your job to make decisions that advance your interest without trampling on the interests of others.

BTW, Google's tried several attempts at verified authorship protocols, one of which I worked on:

http://www.ghacks.net/2011/11/17/fat-pings-what-are-they-why...

http://www.google.com/insidesearch/features/authorship/index...

https://code.google.com/p/pubsubhubbub/

They usually fail for lack of adoption. It's very difficult to get people to buy into a system that makes them do extra work for the general health of the ecosystem.

Also FWIW, if I were to leave Google and found a web business, I would spend very minimal time on SEO, despite knowing (in broad strokes) how the ranking algorithm operates and having a huge leg up on the competition. Why? Because there are several hundred people inside Google changing the algorithm, and just me on the outside trying to adapt to the changes. I'm far better off aligning my incentives with Google, so that all the work they do benefits me. I'd do this by providing a compelling enough experience for users that they keep coming back and talk about the product on their own accord, not by trying to force them to talk about it. Then all of Google's evals, metrics, etc. would say that my site should be on top, and so they'll tweak the algorithm to adapt to me instead of me tweaking my site to adapt to the algorithm.


"If I were to leave Google and found a web business, I would spend very minimal time on SEO, despite knowing (in broad strokes) how the ranking algorithm operates and having a huge leg up on the competition."

This is a critical point and needs to be underscored. Chasing SEO from a technical perspective is a fool's game. I think SEO can inform your choice of person-to-person marketing activities, but can never take the place of them.

I agree with everything you've said, but I would add one thing: as the complexity of Google's algorithm increases, there's more and more collateral damage. So, for instance, I had a micro-site I made several years ago. Back then the thing to do was to make sure you tweaked your on-page content so that the search engines better understood it.

Flash forward to today. Now if you "over-tweak" (who knows what that means) you get penalized. The same goes for a dozen other topics. They used to either be best practices or work well. Now they're either considered bad practices and you get punished for them. It's completely unrealistic to expect that I am going to have time to go back and re-honk around with stuff because in the great search engine wars the rules change every year.

As far as the protocol thing goes, who knows? I think it's important to realize that we all have the power to do whatever we want on top of TCP-IP. Let a thousand new ideas bloom. See if any of them gain traction.


For the most part you won't be penalized for "over-tweaking" on your site. Keyword stuffing and hidden text were dealt with in the early 2000's. If you are targeting keywords in the page title, headers, and body of your site at a reasonable amount, you don't have to worry about "over-tweaking". You do have to worry about over the top link exchanges, buying links, selling links, and too much commercially targeted anchor text pointing at your site.


I can't believe I'm reading this, it's like a freakin' miracle. We've been fighting a battle about this for 2 weeks over at WMW. Nobody is listening, it's sick and maddening - it's made 10x worse by people screaming 'Google is evil' and dominating any intelligent thread about the subject.

For the love of all that is awesome on the internet. Webspam team - take a step back, listen to what engineers / webmasters / intelligent people are telling you - make some serious changes soon and fast.

This whole thing is a proper, sod-awful mess.


> In the scientific world there are no spammers and there is no direct commercial advantage to creating a lot of nonsense paper that cite your own paper, also there is some oversight in the world of science and the people there have a reasonably high level of integrity.

Um...what? If it were anyone but the OP, who always writes with a lot of thoughtfulness and insight, I would've assumed the graf above is satire. Academic discovery and citation is very much being gamed; the only reason why we don't notice it more is because the academics don't have the same tools and infrastructure that web spammers do and, also, the world of academic research is not something the average person outside of academia closely parses.


What are some salient examples of gaming in the world of Academic discovery and citation?


This recent article comes to mind:

http://www.nytimes.com/2013/04/08/health/for-scientists-an-e...

> The scientists who were recruited to appear at a conference called Entomology-2013 thought they had been selected to make a presentation to the leading professional association of scientists who study insects. > > But they found out the hard way that they were wrong. The prestigious, academically sanctioned conference they had in mind has a slightly different name: Entomology 2013 (without the hyphen).

And the institution of medicine has long been plagued with accusations of fake studies, underwritten by drug companies:

http://www.nytimes.com/2011/07/29/opinion/useless-pharmaceut...

> LAST month, the Archives of Internal Medicine published a scathing reassessment of a 12-year-old research study of Neurontin, a seizure drug made by Pfizer. The study, which had included more than 2,700 subjects and was carried out by Parke-Davis (now part of Pfizer), was notable for how poorly it was conducted. The investigators were inexperienced and untrained, and the design of the study was so flawed it generated few if any useful conclusions. Even more alarming, 11 patients in the study died and 73 more experienced “serious adverse events.” Yet there have been few headlines, no demands for sanctions or apologies, no national bioethics commissions pledging to investigate. Why not? > > One reason is that the study was not quite what it seemed. It looked like a clinical trial, but as litigation documents have shown, it was actually a marketing device known as a “seeding trial.” The purpose of seeding trials is not to advance research but to make doctors familiar with a new drug.


I won't dig out examples, but I'd like to sketch the general game:

You need to have a good publication record (i.e., papers in journals and conference proceedings, some monographs with a prestigious publisher can also help). When you can't publish in high-impact journals/conferences, you lower your expectations and spread your papers over several journals/conferences, some _will_ publish you.

The next thing is to split up your research results over many papers in order to have many publications; differences between the papers are small (and you can reference to yourself, i.e., to the "bigger picture" of which this paper is a part of). That's btw the same with grant applications: Promise much, do only 30% and have the rest as a follow-up (grant renewal). Splitting up results over several publications and grants plus the usual academic behaviour (internal status games, academic nitpicking) delay research by 300%.

Then you can create citation cartels where you mutually reference with your colleagues.

But it's not that researchers are evil, often it's the funding source that uses those metrics (e.g., publication count) that are then gamed.


My concern -- and this is a total layman's observation -- is that if someone comes up with a good way to "spam" the academic research circuit, will we be able to tell? Most of us remember the recent Reinhart-Rogoff incident in which a massively flawed paper (think, Excel-based) wasn't challenged until a curious grad student took notice: http://phys.org/news/2013-04-excel-austerity-economics-paper...

There are other issues in all of that, but for the current discussion, I think it's enough to argue that discovery of exaggeration (the equivalent of resume padding) is still a very difficult problem in academic papers and citations. Do researchers have the tools to methodically sift out the good from the meh? It doesn't seem so. Combine that with the lack of incentive (as seems to be the case in Reinhart-Rogoff) to disprove published findings, and you have a scenario in which gaming the system seems quite doable.


Reinhart-roghoff was not submitted to a peer reviewed journal.


> Reinhart-roghoff was not submitted to a peer reviewed journal.

Yet among 400+ other citations, it was self-cited in a paper to the American Economic Review (see google scholar,) which is peer reviewed and appears to have a code submission policy since 2004 (see wikipedia.)

Perhaps if they submitted the paper directly it would have been turned down? That would make it an even better example of how to game citation count like SEO/PageRank.


Similarly, self-citation is a known issue.

http://www.rogerclarke.com/SOS/SCSP-09.html



This is a nice description of the hell of modern WWW.

But things are worse than that! With few exceptions (Stack Exchange and Wikipedia are notable) most searches will return sites that have been SEOd.

My Google search for [spectacles cases] returns these two sites on the first page:

(http://www.spectaclecases.co.uk/)

(http://www.aglassescase.co.uk/)

> Welcome to SpectacleCases.co.uk. You will find a wide selection of Glasses Cases / Spectacle Cases / Sunglass Cases.

> Made from leather, fabric, metal or plastic finished to a very high quality. Hard and soft cases for spectacles, glasses and sunglasses. We also have a good selection of cheap glasses cases which offer great protection for your glasses.

> Welcome to AGlassesCase.co.uk. The one stop shop for Glasses Cases, Spectacle Cases and Sunglass Cases. We also sell a number of Glasses Cloths

> Made from a range of quality materials including leather, fabric, metal and plastic all finished to a very high standard. We sell hard and soft cases for spectacles, glasses and sunglasses. We also have a good selection of cheap glasses cases which offer great protection for your glasses.

These two different sites are the same company.

Maybe they're a great place to buy spectacles cases from, but it's vaguely upsetting that Google can create freakin' awesome stuff (A self driving car! It is actually wonderful and futuristic) yet can't fix this stuff. Obviously, Google are not to blame, and really the problem is with sleazy SEO and odd behaviours by vendors.


I think Stack Exchange and Wikipedia are examples of SEO'd sites too. For example: Stack Exchange had problems with indexing before they added a sitemap [1]. Wikipedia takes a lot of care to create machine-indexable pages with quality content, that link back to their sources [2] and are linked to internally. Both follow SEO best practices as described in the Google Quality Guidelines [3]

Or your definition of "SEO'd" is leaning more to manipulative/sleazy.

In your example about the spectacle cases, would the other site being an affiliate make a (huge) difference? If Google punishes companies that have more than one website, people would shift to affiliates or try to hide they own the websites in other ways. In a way I feel multiple McDonalds in one city is perfectly possible. However, there needs to be a balance between the "best" websites (be they from the same owner or not) and diversity of the results (a query deserves diversity).

[1] http://www.codinghorror.com/blog/2008/10/the-importance-of-s... "We knew from the outset that Google would be a big part of our traffic, and I wanted us to rank highly in Google"

[2] http://www.mattcutts.com/blog/pagerank-sculpting/ "In the same way that Google trusts sites less when they link to spammy sites or bad neighborhoods, parts of our system encourage links to good sites."

[3] http://support.google.com/webmasters/bin/answer.py?hl=en&ans...


Yes, you're right. I tend to use "SEOd" as being a bit sleazy, rather than just following best current practice.


The term "SEO" is a bit like "Hacker" in that to most people it carries a negative connotation.


I've noticed some people will use the terms "black hat SEO" and "white hat SEO" to distinguish between sleazy and not-sleazy SEO practices.


SEO = gaming search engines = negative behavior.


White hat SEO = following a search engine's rules = positive behavior.

Disparage not that which you do not understand.


If the goal of SEO is to manipulate a site's position in search engine rankings, I don't care what color the hat is. I don't want the site's owner to be able to do that. I want the results of my searches to be based on their relevance to my query, and not on the degree to which the people who stand to profit from my attention to their sites have manipulated the system.

The ultimate search engine would be ungameable, and SEO would be a waste of time because it would accomplish nothing. To the degree that SEO works at all, it is a bad thing.


Google itself explicitly encourages people to SEO their sites. So much so that they publish a SEO starter guide for webmasters and SEO professionals [1].

SEO isn't what you think it is, yet you continue to paint all of SEO with your broad brush. Stop.

[1] (pdf) - https://static.googleusercontent.com/external_content/untrus...


Are you claiming that "SEO" no longer stands for "search engine optimization", or are you claiming that "search engine optimization" is no longer about "optimizing" a site for "search engine" result placement?


I'm claiming that your characterization of all such "optimization" as malicious is misguided.

White hat is not exploiting the rules of a game, it is following the rules. Proper SEO according to G is doing everything you can to make your site interact well with crawlers. That's akin to following directions when applying for a job, or following the rules when playing a sport. You can't be considered for the job if you don't apply properly, and you'll soon be disqualified if you ignore the rules of a sport. Same with search: if your site doesn't comply with a search engine's standards for crawling, you won't have the opportunity to be ranked fairly.


It's all about the scope of what you would consider "SEO" projects. Would you consider creating search-friendly URLs SEO projects or just best practices?

That's one of the first things I think about when I look at a site from a SEO perspective: can the bots easily digest this site.


A large part of SEO is just making it obvious what your page is about. This helps both the user and the search engine. A lot of it is just usability, but usability specifically for a user who is coming from a search engine.


I think I need to link to the relevant XKCD here: http://xkcd.com/810/ And to slightly paraphrase: "But what will you do when Spammers provide useful and relevant content" "Mission Fucking Accomplished"

I'm not saying that's where we are -- But that's hopefully the end-goal


I noticed this when looking to mail-order some espresso pods: there were eight versions of the same shitty German mail-order company in the top 30 results or so, each trying to look like an independent site targeting a different audience, which you only found out were the same when you tried to check out and saw the same ridiculous extra fees tacked on in the shopping cart. Took forever to find a legit store which was not run by the same guy.


Eight versions targeting different audiences? Sounds almost like something patio11 might suggest...


Twist: He earns all his money by selling nespresso capsules. There has never been any consulting or AP business. It's all coffee.


Do you remember your query?


Interestingly, though much of the content is the same, those domains resolve to different IP addresses, not just the same server tweaking the page for different domains as I thought.

Non-authoritative answer: Name: www.spectaclecases.co.uk Address: 83.170.88.134

Non-authoritative answer: Name: www.aglassescase.co.uk Address: 83.170.70.106

Presumably they have two different webservers, perhaps connected to the same ordering system/db server. I guess they could be using a virtualisation service and found this easier to set up.


The different IPs are for SSL and they are hosted here: https://vps.net


If you haven't seen it yet, you should check out millionshort.com . I think the idea is that it skims off the top layer of scum from search results, hopefully revealing good results that didn't SEO.


> In the scientific world there are no spammers and there is no direct commercial advantage to creating a lot of nonsense paper that cite your own paper

Yet. Google's on it though.

>> The launch of Google Scholar Citations and Google Scholar Metrics may provoke a revolution in the research evaluation field as it places within every researchers reach tools that allow bibliometric measuring. In order to alert the research community over how easily one can manipulate the data and bibliometric indicators offered by Google s products we present an experiment in which we manipulate the Google Citations profiles of a research group through the creation of false documents that cite their documents, and consequently, the journals in which they have published modifying their H index. For this purpose we created six documents authored by a faked author and we uploaded them to a researcher s personal website under the University of Granadas domain. The result of the experiment meant an increase of 774 citations in 129 papers (six citations per paper) increasing the authors and journals H index. We analyse the malicious effect this type of practices can cause to Google Scholar Citations and Google Scholar Metrics. Finally, we conclude with several deliberations over the effects these malpractices may have and the lack of control tools these tools offer.

http://arxiv.org/abs/1212.0638


"Destruction of the Web" is a bit hyperbolic. I tried the SEO game for the interest of my clients, I've come to realize that it's a better long term strategy to simply focus on good, relevant content and let nature take it's course. When you're publishing content targeted at a certain audience, spammy visits to your website just increase your bandwidth cost without much benefit.

Granted there is inherent benefit in coming in on top of a google search, but time and time again, I've seen good content naturally dominate and stay on top. Occasionally some black had spammer comes out on top.

And truth of the matter is, I think it's google/<insert search engine here>'s job to figure out a way to discern good content from spam.

Google Dominated early on, b/c they were able to parse thru a lot of the garbage and find you what you were looking for. There will always be spammers. And if this whole arms race thing is true, I think the spammers are likely to hit a ceiling before Google or another search engine is.

As a programmer, my initial instinct was to write code to counter what Google's algorithm would expect. Then, I thought, if I'm going to put in that much work, why the hell not build a better search engine myself?

I believe that if Google doesn't do a great job of getting the best possible results, not only do they face a treat from other giants like bing, but also from crafty programmers who may be writing black hat seo crap now and have the epiphany to try to build a better search engine (yes... I know... I think PG has a real point with the search engine thing)


Just wanted to say that I enjoyed reading this. :)


I'm "relatively" new to the web. I started using it around 2004. To me, the only way to browse the web goes through Google. I don't think there's a single day I spent in front of the computer without me hitting Google at one point. I even use Google search when I'm specifically targeting Wikipedia or StackOverflow.

For the people who got introduced to the web before me, how was "web browsing" done in the earlier decade of the Web? I'm assuming Google is not the first search engine available, but I'm pretty sure search engines were not the only way to go around.

I understand the concept behind the web, "a globe spanning network of computers linked by hyperlinks pointing to useful information". But was it as simple as that? You only had access to addresses you knew or links available on these pages? Where did you go to find interesting websites or how would you look for specific information (like, for instance, how would you research the working internals of a car engine for a school project?)


Things were a bit more varied. Netscape Navigator came with a default home page that included a number of different search engines which varied in different and sometimes interesting ways.

Because these search engines were not as powerful or accurate as Google at mining content from the web, you would tend to follow links more, relying on the overall navigational structure of the WWW rather than everything being the two step "Google terms -> Go to website in results page" process it typically is today.

Many people posted fragments called "web rings" on their web pages: these would be a linked "ring" of similar, related sites, grouped by a common interest in the subject matter. It was a useful (though very random and sometimes temperamental) way to discover similar sites and content.

Other sites had "guest books" where people could comment on the site and include a link to their own. There's little functional difference between guest books and today's blog commenting systems, except back then there was less traffic, so site owners tended to actually read the comments, respond to them often, and follow the links of the commenters back to their own sites.

Then SEO spam came along, and eventually most people realised 90% of the links in comments were probably not worth following. Then Google added nofollow, so the chances were the only person who would ever find your site through comments were the owner of the site you posted on - if you were very lucky and it wasn't buried in a sea of spam or one-shot snarks.


> For the people who got introduced to the web before me, how was "web browsing" done in the earlier decade of the Web? I'm assuming Google is not the first search engine available, but I'm pretty sure search engines were not the only way to go around.

Yahoo! used to have humans looking at websites and adding them to an index. Yes, someone would send in a link to a porn website, and someone on the porn indexing team would view the site and add it to an index.

Curation efforts like this were important. People built web-rings for similar content; Usenet FAQs listed useful sites.

But people didn't just use WWW. They used Usenet, sometimes Gopher or telnet, ftp, and email. Or they were part of some other online community that had a www gateway. The prices now seem eye-watering.

The electronic landscape in 1988 (https://news.ycombinator.com/item?id=3087928)

(http://i53.tinypic.com/2janfrd.jpg)

Compuserve - $11 per hour.

The Source - $8 per hour

Delphi - $6 per hour

BIX $9 per hour

And this is at a time when people had slow modems and usually paid for the telephone calls too. Thus, offline readers (things like BlueWave for email and fidonet) were popular.

Your last paragraph: I'd have a look through the newsgroup lists for relevant groups. I'd subscribe, fetch headers, look for a faq, retrieve the faq, and read that. I'd lurk the group for a bit, and try to do my own work. Then, after I'd learnt the group for a bit and participated in other stuff I'd try to ask a good question, with links to how far I'd got and an attempt at a correct reply.

Yahoo! was pretty good once you got the hang of rules. HotBot, dogpile, altavista (and astalavista) were also handy. But Google really was revolutionarily good.


Nice!

I remember around the time CompuServe bought The Source, CIS was charging based on the speed of your connection (I think $12.80/hr was for a 9600 baud connection). There were several products to minimize your online time that would script connections - go to your favorite forums, download new messages, post anything you had pending, and hang up.

When AOL came on the scene, they eventually overtook the old walled-garden players by tactics like providing access to Usenet and the web (http://en.wikipedia.org/wiki/Eternal_September was an unfortunate fallout from that), and provided a much cheaper pricing model. CIS tried to catch up by buying the Spry company, using its browser (basically a branded NCSA Mosaic), releasing scripts for Trumpet Winsock to get to the net from a CompuServe x.25 trunk, and added the option to use that same net connection to telnet back into CompuServe.

It was confusing, caused too much network traffic, and CompuServe mismanaged it all, and ultimately collapsed, to be bought out by AOL... who eventually lost relevance outside of the chat universe.

Good times, good times.


Seconding the emphasis on "Internet != Web" being more true for the 90's user. It wasn't until 1998 or so that the family computer transitioned from an dial-up shell account with 5mb disk quota (crl.com, which no longer exists) to SLIP/PPP with NetZero. The Web experience via shell account was Lynx-only, and Usenet was still a major source of information, albeit it was riddled with trolling and spam by then.


It is precisely as you say: Simple as that. You knew of some sites, and you had links from those sites to others. Beyond that, you could take some big names and guess if they had a site. As for your "working internals of a car engine" example, in 1995 I would have instead cracked open a physical encyclopedia at a library. It really was the rise of search engines that helped grant the internet that "Learn/find anything about anything" aura.

All I remember interacting with until about 1997 was a handful of consumer brand's websites, the brands for which I already knew of and could easily guess their domain name.


There were indeed search engines pre-google, the "big four" and a host of smaller ones.

There were also "directories" that were very useful. Yahoo's at least is still around. If you wanted to find a bunch of sites on a specific topic you'd use those. But a lot of the time you would follow links between sites - most sites would have a dedicated page of "links to other sites that we think are good/related". A sibling has mentioned webrings which were also a thing.

I never used AOL, but it had a "keyword" system that worked a bit like twitter - you could look up a keyword and see a bunch of sites that claimed to be related to it.

And it was a lot more common to type in a URL that you found offline - books came with their homepage on (they probably still do).


Before Google there was plenty of ways to find links. Other search engines (e.g. Lycos, Yahoo, Excite, Altavista), link directories (e.g. Yahoo, Netscape Directory), magazines (e.g. Wired, .Net), content-driven ISPs (e.g. AOL, CompuServe, eWorld).


> how would you look for specific information (like, for instance, how would you research the working internals of a car engine for a school project?)

Maybe I'm just an old fart (I'm 26 btw :P ) but I went to the library.


Quite. Encyclopædia Brittanica, and the 5% rule for photocopying. Later it was CD encyclopædias like Microsoft Encarta - the use case that persuaded my parents to buy a new-fangled "sound card", as opposed to the ulterior use case (to play Doom)...


When I began in 1992 the "home page of the internet" was Usenet. The reason I installed my first web browser was to be able to follow URLs referenced by conversations on Usenet. (That first browser was obtained by "anonymous FTP", which is the main way that software and certain kinds of documents like Usenet FAQs were distributed.)

>for instance, how would you research the working internals of a car engine for a school project?

Before search engines, I would consult my personal library of books, which included a paper encyclopedia, or visit a library and look up "automobile engines" in the subject index of the card catalog. Or maybe ask a friend.

Of course, most of the things I look up in search engines nowadays I did not even try to research or ask about before search engines: I just did without the information. It wasn't so bad :)


What about something like a disavow.txt file that site owners could use to list domains or URLs with unwanted inbound links. It would be similar to the Google Disavow Links tool, but more open and standardized.

We could write a simple spec around it. I'd see it being similar to robots.txt in form and function... easy for a human to write, easy for a search engine to parse, easy to generate programmatically if you need to scale it up.

Also, it avoids the black-hat SEO problem since only folks with access to the site could control the content of the disavow.txt file.

Thoughts?


Google webmaster tools already verify if the owner is for real. The disavow tool is 'safe' in this sense, and requests to remove links sent via email are definitely not safe in this sense (that's why I never comply with those, for all I know I'm aiding some black hat by killing backlinks of a legitimate site).


True, but that still leaves other search engines in the dark about which links should be ignored. I thought disavow.txt might be a better solution to help us avoid The Destruction of the Web.


Good point, of course there are other search engines too and in a way a 'webmaster / search engine' interface that requires webmasters to have direct contact with a search engine when the same thing could be fixed by something the crawler could pick up is less elegant. I had not thought this through when I wrote my reply to you, you are absolutely right.


Google will allow you to "disavow" links from spammy sites that are pointing to yours, so there's no real reason to ask people to remove them.

Another problem that seems to be on the rise is corporate shills. In the past these were easy enough to spot, they were people who would go to blogs/forums and spam them with blatant attempts at promotion.

Now they seem to be getting much better, they will come to a site like HN and read the comments that people are posting and will write their own comments in a similar style, not embed links and fly under the radar as legitimate users.

There are marketplaces for selling aged accounts for these purposes. This makes me very skeptical of any type of product recommendation from a website user.


"Destruction of the Web"? This is really going to destroy the web? Though it is an issue, I hardly think that this is going to destroy anything. May inconvenience some things, cause confusion about when to link (and when to unlink) in the short term until an equilibrium is reached and the next SEO change comes along. But I think that this title itself is Link Bait (something else that might be "destroying the web" by this lose definition of destruction).


The author has a history of being very hyperbolic. I was hoping at least one other person would catch this and not take the article at face value. Spam is a problem, not the destruction of the web.


I'd go so far as to say sensationalist hyperbolic link-bait titles are a similar brand (FUD) of spam polluting the web.


Similar article and discussion from a few months ago:

A painful tale of SEO, spam and Google's role in all

http://scriptogr.am/slaven/post/a-painful-tale-of-seo-spam-a...

https://news.ycombinator.com/item?id=5296005


Just like I don't put all my API eggs in one big corporation's basket like the so-called "ecosystem" platform of Apple, Microsoft, et al.

I don't put all of my search eggs in the google basket.

I think that google search was great for about a decade. In the past couple of years, I am starting to have serious doubts about Google's search results. No I don't want the result tailored to me, or whatever IP address that I am searching from. And, I suspect there are lots of sites that are relevant that are not showing up. I also suspect that someone can do search better than Google.

There are other dominant and not so dominant search engines, please use them ...

https://duckduckgo.com/ http://www.yandex.com/ http://www.baidu.com/ http://www.bing.com/ http://gigablast.com/


I've never understood the concept of black-hat SEO.

If your site isn't worthwhile enough to be on the front page of relevant search results, why would you pay somebody to set up link farms and comment spam et al that might get your site up the rankings; instead of just paying Google to put your site as an ad when people run the relevant searches?

Either way it's going to cost you money, but one is guaranteed to work & have no negative repercussions; the other isn't. Is it just about price, or do the SEO guys have really good marketing, or what?


It is hardly that simple, the costs involved could be completely different, high organic ratings might create a positive ROI whereas PPC might be far too expensive. Also if your content isn't at all really desirable it is possible you will have trouble running Adwords for it as well.


So it's better to pay less for an unreliable solution that sucks, than pay more for a guaranteed place at the top of the front page?

And if your content "isn't at all desirable" then how is SEO going to help? "We have stuff nobody wants, if only we could get on the front page they'd buy it!" doesn't seem like much of a business plan..


Some will say that they think people don't click on ads (which in general is false). For some terms clicks can be very expensive.

There is plenty of spammy leads campaigns that Adwords won't let you on that will absolutely make you a lot of money if you have good search position, half the stuff on http://www.clickbank.com/ Think diet pills and similar.


Good article, but the author takes a somewhat generous view of academic citations. There are spammers in academia -- e.g. editors and reviewers who block rivals from publishing or who demand citations of their own work in revisions.


Sure things like these exist in academia, but you will rarely find a paper that only exists for the sake of promoting others, or otherwise extremely useless ones for that sake.

For an unintentionally humorous counterexample consider Tai's paper, which has around 100 quotations. ;-) http://care.diabetesjournals.org/content/17/2/152.abstract


Destruction just seems like such a harsh word.

Any time you measure something, you impact what you measure. Without being pedantic about the Physics, look around. If you measure when people get in and our of work, they will optimize around it, and other things may suffer. If you measure defects in code, people will optimize it.

But back to the web... Search by definition is a measurement or prediction of usefulness. It will definitely impact what gets searched. But that collateral damage can be minimized. And it some cases it can improve the web experience, though I wouldn't bank on it. I certainly don't view the Web as a pristine wilderness being destroyed.


Sorry, but for me this article is a bit BS, because what the author describes as the destruction of the web is for me the healthy and eternal balance between good and evil, all over the society.

One one side, systems like Google, facebook... compete to be the best spam filters, on the other side, cheaters try to fight back by using more advanced spam techniques.

At the end of the day, the Web keeps improving and tons of new applications flourish. It also makes it harder to spammers who can then decide to go white hat.

I must admit the quality of Google's SERPs is going down since Panda, especially for long tail keyword phrases, but look, Hacker News is one good example of alternative to Google and Facebook, and the web is not just about Google...

I also believe black hats are the best friends of Google, because they push the level of anti-spam higher, where newer or smaller search engines can't compete because of its lack of history.


The way to solve this problem that's been discussed before is to require authenticated identity in the protocol layer, so every packet can be traced to a real, live person.

But then that breeds new problems:

1) The NSA might like the ability to tie every packet to a person, but privacy and anonymity are generally good things.

2) Just because you have a reliable way to track/measure "real" reputation/authority/trust, will that stop people from abusing it? Did the offline version of this stop Paula Deen from building an empire on unhealthy eating only to later reveal her own diet gave her diabetes? No.

Human nature is driving a fair bit of this stuff, and has nothing to do with Google, the web, protocols, or spam. We always try to eliminate the "flawed human" from systems, and it never works.

3) The fact that reputation/authority/trust is unreliable might actually be a feature not a bug. For one thing, it allows some dude with no social capital to get a toehold and get his stuff in front of users. Generally Google allows this to happen, and if the content sucks, it falls away. I don't mind a bit of spam if it's the price for more diversity and opportunity for people outside the "lucky sperm club" to rise.

Overall, I don't buy this "destruction of the web" stuff. If anything, Google has made both the web AND search are way better today. It's possible that Google's anti-spam strategies will hit a point of diminishing marginal return, and the spammers will catch up and the balance will swing in their direction again, but so far that's not been the trend.

I think the equilibrium we're seeing is that Google allows a very small amount of spam tactics spam to work for a while, but they use other signals that keeps that stuff from getting major traffic (e.g. eHow). So gaming the system can get you in search rankings, but if you suck, you won't stay there.

But more importantly, I don't want to trade off privacy and anonymity to eliminate what amounts to a very small amount of spam.


Chris Coyier talked about it last year: http://chriscoyier.net/2012/08/17/sweet-spammer-justice/


Spam and SEO concerns are nowhere near the biggest threat to the WWW. Try censorship, DRM, surveilance, and security.


>I believe this will have to come in the form of a reboot, a protocol designed from the ground up to combat these issues and a way to search the web that makes it infeasible for a single party to control such a large volume of traffic

For all intents and purposes, Google is the Internet lobby that has the ear of legislators. They're not gonna stand for any kind of protocol reboot that would destroy their fundamental and wonderfully lucrative business model.


What Google seems to ultimately be doing with its Pandas and other attempts at stopping search spam is TEACHING businesses that the only safe way forward is great content and white hat practices. Sure, you might temporarily get some advantages with search spam but come next Panda and you might be totally screwed. Better play it safe and play nice with Google.


The problem is that "great content" (aka landing pages) is merely another form of spam.


Can you clarify? Why is "great content" == "landing pages"? How are landing pages spam?


So if I refer my link in this discussion will that be spamming or is it allowed to be referred as I am trying to let people know some SEO guidelines which I feel like sharing here.

http://webmasterfacts.com/google-webmaster-guidelines-seo/

But since the original Google page already explains much, I think there should be no provision of building pages with "verified" keywords as well. Now that official T&C directly states that one cannot take control of the content but one can sure take control of the spam words. What if I had an added service to the link above to some SEO company?

Would that have been ethical link building strategy? I don't know much about how things run around the Google cubicles but I know one thing for sure - No one is going to tell that for the next 100 years.

@Matt: Keep the fight on @rest: Keep the questions coming.

Good Luck.


That was a long, windy read for … no clear argument and a ton of begged questions. Spammers have been trying to get backlinks since the 90s – no evidence that this has suddenly become unmanageable. There's a flat claim that nobody makes legitimate backlinks any more, which is both completely unsupported and transparently wrong.

“Destruction of the Web” is a bold title for what appears to be a minor kerfluffle affecting only web marketing types trying to scam search engines. If you produce decent content or something else useful, you can ignore it and carry on: back links will take care of themselves as they have for the last couple decades.

Put another way: the link-bait title got me to click but the anemic post left me less likely to come back and uninterested in sharing. Fix that if you want better search engine rankings.


When I started getting emails from companies asking for the spam-links from my sites to be removed I assumed it was a good sign. If they're cleaning up their spam-links then presumably Google have found a way to make those spam-links not pay.

Hopefully fewer spam-links and robot-comments, no?


The problem is this opens up the possibility of negative SEO. Just start making random links to your competitors' sites instead of your own and it will negatively affect their ranking as google have admitted recently[1]. This just puts us back to square one - at least it seems to be the case that no matter what Google do, spammers will find a way to break it.

[1] http://support.google.com/webmasters/bin/answer.py?hl=en&ans...


You are assuming the requests to take down links are coming from the owners of the sites the links are pointing to…


Indeed.

Though I was mostly surprised to see them at all, they were all been for a forum I thought I'd closed down years ago but had apparently been collecting spam for quite a while.

And they were spam, certainly. If I'd been monitoring the thing they'd have been deleted right away.


One option would be to "avow" links rather than "disavow". Start with the assumption that all links are valueless and untrustworthy and only count those which the webmaster has marked as good.


At the cost of result quality, one could exploit social networks as a possible solution: promote the pages that your neighbors visit for an extended period of time (and thus deem to be of high value) by a factor corresponding to said neighbor's distance from you. If the queries are slightly randomized such that their order differs per person and per session, then new possible results could tested.


I've been hearing about the so-called destruction of the web since the distant future, the year 2000.


Seems like there's something ironic about that giant "tag cloud" being on this page.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: