Hacker News new | past | comments | ask | show | jobs | submit login
Understand the JavaScript SEO basics (developers.google.com)
228 points by jaequery on July 19, 2019 | hide | past | favorite | 60 comments



SEO consultant here (yeah, yeah).

Please, still in 2019, be careful with JavaScript unless you're willing to deal with a lot of uncertainty.

Google routinely says how things are supposed to work, but there are plenty of examples where their crawlers act inconsistent to public statements. To be clear, probably not Googlebot's fault though; we as a species do a lot of weird things when building websites.

Just finished a big project with a client who had a client-side render of their react app for the customer facing website. Googlebot was VERY sporadic with how they were indexing certain parts of the site, and in general, the site lost out to inferior sites in more competitive SERPs. Server side render fixed a lot of this and rankings/traffic jumped.

It's also worth noting that the Bing/Yahoo crawlers (still a notable chunk of web traffic!) can't crawl JS. You can ignore this chunk of market share if you want but someone is going to happily take it.

As a general rule, my advice is always this: Make it as easy as possible for bots to crawl and index your site's content. The more hoops the crawler is jumping through, the worse your site will perform in search engines.


Bing Mobile Friendliness Test correctly renders JS pages. Fetch as BingBot does not currently. I logged a support request through Bing Webmaster and got a reply that their engineering team is working on it, so I would expect Bing to crawl JS sites just like Google in the near future.


Executing js is expensive, so you should expect bing to limit its willingness to index such sites, same as google.


> Server side render fixed a lot of this and rankings/traffic jumped.

Good, they should just delist client side rendered web pages.


While I love to rag on SPAs as much as the next person, doing that is ripe for antitrust. "They're trying to force the web to look like what they want"


“You didn’t go to huge extra expense to index my website so I am going to sue you for anti-trust” - good luck. Google has been picky about executing js all along.


Why? I enjoy writing client-side SPAs, and I can pump out a better UI/UX must faster than I could otherwise.

My users don't care how I build it, just that the site works and is enjoyable to use.

If I need server-side rendering I'll still write it with a client-side rendered (e.g. React), but tack on server-side rendering at the end.


I like to build my websites in COBOL but the browsers won't support it even though I'm more productive.

You are not building this for yourself if you need to be indexed by google. You need to build in a supported way.

The bigger question is why you think an spa should rank well in general. It's only one page.


Yes, I agree - server side rendering is likely better for SEO in most scenarios, and so I should consider it in a business context where SEO is essential.

This isn't everyone.

The comment I replied to said that Google should "delist client side rendered web pages", which is a terrible idea. Maybe I'm OK with the SEO hit? Maybe I've architectured my SPA so that it can be indexed? (At least to some degree)

> The bigger question is why you think an spa should rank well in general. It's only one page.

SPAs can have multiple pages (despite the name). Check out react-router [0]. Browser history can be manipulated to give the same "back button/forward button" functionality between logical pages. This is done for you by whatever framework you use.

For my use case, having the browser reload every time a page is changed would severely interrupt UX.

[0] https://reacttraining.com/react-router/web/guides/quick-star...


But to a crawler it's one "page". To humans it's different, but not to robot. Crawler still (mostly) "page" == GET


Yet here we are discussing TFA that says otherwise, and crawlers are only going to be become better at it.

This is a beautiful part of the new web: indexable applications.


Crawlers were suppose to have this down in 2012 (at least google),now they recommend a sitemap page.


>The bigger question is why you think an spa should rank well in general. It's only one page.

Spoken like someone who hasn't used the web in 5 years.


I hope my mailed letter will reach hn soon so I can reply.

My work has powered some of the sites you've visited over the last 5 years. I hope you've enjoyed the experience.

Still doesn't make a spa worthly of ranking well in general.. unless it's really a useful app with actual functionality instead of a shallow one page website.

Reply with some sites you made that you feel should rank well. Let's see how they hold up.


You can’t just say something completely false, wait until someone corrects you, and then go off about how you’ve made websites that they’ve used in the last 5 years so you know more than them.

That’s called appeal to authority. It’s a logical fallacy. Your statement was false, regardless of how “authoritative” you are on the subject. If you meant something else by your statement, you should have expressed it better.


No one is saying the should get a boost in ranking or rank well just by being an SPA. You're the one suggesting they should be penalized just for the fact they use client side rendering.

You seem to suggest that a SPA can only consist of a single `view`, when in fact there's often no discernible difference apart from the underlying architecture. Maybe we need a new word, so people stop getting confused. SPA has nothing to do with the amount of content or views a website has.

It's just as simple to create a shallow, useless website in PHP as it is in React.


>My users don't care how I build it, just that the site works and is enjoyable to use.

Filter out poor users by making the site inoperable with older hardware. If they cannot afford current gen i9, they will not afford whatever my ad network is pushing. Increases the conversion rate, so all is good.


My laptop from 2011/2012 loads everything perfectly, albeit a bit slower. I assume most people have a computer produced in the last decade.

Even my grandparents use an old iPad or cheap smartphone.

My machine isn't particularly powerful either - I use an old Macbook.


Even if it loads and performes OK, I still don't like it. Client-side rendering is making the web even more fragile - nothing is going to last anymore.


It is a website. If it requires i9 to run, something went seriously wrong. Was this sarcasm?


Exaggeration.


I have also had a site which was purely client side React fail to index properly - some pages were not picked up, others were indexed weirdly (some of the page content ended up in the title on google somehow!). Migrated the site to use react-static (which was not too painful a migration) and all good now! This was before the change google announced to using the latest Chromium for Googlebot so I’m not sure if that would have solved the issue.


I echo what others like @sharkweek are saying - Google has been saying that they index JavaScript for quite a while now. At I/O 2019 they said the crawler should be on the latest Chromium codebase that supports all modern JavaScript features. I built some test cases [0] to see if this is true, and just couldn't get Google to index content inserted via JavaScript. For example, scroll down to the Popover button test case.

Maybe I'm doing something wrong? Open to feedback. Opensourced code is here [1].

[0]https://shan.io/will-it-index/

[1]https://github.com/rayshan/rayshan.github.io/tree/master/wil...


I can confirm that they do not parse low ranking JavaScript sites, but they will parse higher ranking JavaScript sites.

Bots aside (Twitter and Slack still don't parse JS), it is still in your best interest to server-side render or pre-render a JavaScript site. The time-to-first-paint difference always starts small, but as you add features the JS apps become so bloated you'd be better off starting over with a server-side rendered HTML app.

(I am the creator of https://www.prerender.cloud/)


Thank you! Good point about bots not parsing JavaScript. I find it increasingly important to build and test structured data so users sharing the content within these walled gardens will have a good experience.


For those here to find out how relevant server rendering is for SPAs in light of this, I think the answer is still very much 'better if you do'.

Some relevant excerpts from the article (emphasis mine):

> Keep in mind that server-side or pre-rendering is still a great idea because it makes your website faster for users and crawlers, and not all bots can run JavaScript.

> Once Googlebot's resources allow, a headless Chromium renders the page and executes the JavaScript.

So, there are still no specifics on what extra delays might be involved, but this seems to suggest what I've heard about JS-rendered content still taking much longer than static content to update in the index (i.e. days to weeks) might indeed be the case.

I.e. seems they are implying that while this works, static content is still preferred by them and takes priority.


This feels like such a smoke and mirrors thing. This gem:

Write compatible code

Browsers offer many APIs and JavaScript is a quickly-evolving language. Googlebot has some limitations regarding which APIs and JavaScript features it supports. To make sure your code is compatible with Googlebot, follow our guidelines for troubleshooting JavaScript problems.

Leads to this page:

https://developers.google.com/search/docs/guides/fix-search-...

Which says absolutely nothing. It details nothing about what is supported.

Utterly useless. No version of HTML running, no info about what gets run, no actual technical details at all.


Perhaps the biggest gotcha was that Googlebot was running Chromium that was years old and required polyfills that you wouldn't otherwise include anymore. Since May they run a recent Chromium though: "The new evergreen Googlebot" https://news.ycombinator.com/item?id=20482235


Search Engine Optimization? More like Googlebot Optimization, I think.


When 95%+ of search traffic comes from Google, these two things are the same thing.


Isn't that his point


You have to be crawled and rendered first


A little off-topic, but one of the most exciting things in my mind about Phoenix LiveView [1] is that you can get some dynamic web applications and not have to navigate this JS-rendering minefield. When a LiveView page is request, the server sends over the full page as just static HTML. The JS half of LiveView then upgrades to a websocket connection, and your page becomes dynamic.

Not a good fit for everything, but it's great if you want to add some fancy form validation to, say, your landing page. You can render whatever you want with LiveView, get your fancy dynamic stuff, without having to worry about making your page more indexable.

[1]: https://github.com/phoenixframework/phoenix_live_view


1) huge elixir/phoenix fan, glad to see you promoting this here

2) but I'm really here to commend your comment on this now-flagged post that I now cannot respond to: https://news.ycombinator.com/item?id=20500181 . I don't know if HN mods understand the irony in deplatforming a post arguing against deplatforming, but I am 100% in agreement with you there.


1) :)

2) Thank you! Is it really flagged? That's too bad. And funny, in a sad way.


It got unflagged. ??? (Sorry for OT)


I use middleware on my SPA’s static server that sends bots a lite SSR version of the page that gets SEO’d up pretty well. Best of both worlds


Showing the bot something different than the user could be seen as cloaking. We also do that in some cases. Just be really carefull with that strategy.


Is this the first confirmation from Google that they render JS/SPAs? I've heard rumors before...


Not really, there’s been a few posts and talks about it.

This was back in 2014: https://webmasters.googleblog.com/2014/05/understanding-web-...


They've confirmed it for years. It's less often than html only crawls though. Search their docs and you should find info about it.


You can see the results in 'fetch as Google' and in the structured data tool. Injected JS content is there, including changes to the page title and head.

As always take SEO advice from Google with a grain of salt.


> As always take SEO advice from Google with a grain of salt.

Why?


Well, is SEO advice just allows you to get rid of any impediments from rising to where you should in the results then you should trust them, but if SEO advice is a thing that can help you rise to a prominence that you should not have then that advice would be detrimental to Google and they would not give it to you.

So really it matters if you believe in the light or dark sides of the SEO force.


A good search engine shouldn't process JS at all, and index only the content that is reachable via links pulled from HTML (which may, of course, be dynamic).

The processing of JS explains a lot of the sheer garbage that is in the Google results.

I think they do it because it ties into their business model. JS is needed to unearth the kind of crap that a certain segment of the user base is looking for. That certain segment is worth catering to because it consists of users who are likely to click on ads.

Otherwise, nobody in their right mind would integrate JS into a crawler. "I'm going to write a program that automatically finds random source code, much of it malicious, and runs it!" Come on; without a business agenda justifying it, it's a complete non-starter.


This is so out-of-touch with the modern web. Lots of totally legitimate websites, including major news outlets and Wikipedia, render their web pages on the client using JavaScript. Whether or not this is a good thing is a separate issue (personally I see nothing wrong with it), but it should be obvious that a useful search engine in 2019 needs to be able to index JavaScript-rendered content.


I just noticed I haven't been allowing any JS whatsoever from Wikipedia. Everything looks fine. I'm logged in and can edit, etc. (I fixed that now; all allowed).

But that's not my point; how did we go from "search engines shouldn't execute JS" to "you're out of touch if you think you can use the web without JS".


I guess the assumption was that search engines should be able to access the web, like humans do.


I, for one, would welcome a force with the size and influence of Google telling devs to cut the Javascript crap. To me, it’s like violating the conditional rendering rule: there are plausible reasons to want to do it, but Google is right to push back on it.

It’s not as if totally legitimate websites don’t have reasonable technical alternatives here.


> I, for one, would welcome a force with the size and influence of Google telling devs to cut the JavaScript crap.

Wasn't this basically AMP though? The solution to the monster web developers were creating?


hackernews hates that too.


No, that was a big ol’ web cache.

The solution to this is: “you have one year to make your webpages readable by the googlebot without javascript. Have a nice day.”


You’re forgetting that people use search to find things, not just to get their technical-snobbery nut off. If the thing I’m precisely looking for is a SPA, how could I or any average user care? Why would a search engine hide that from me?


>A good search engine shouldn't process JS at all, and index only the content that is reachable via links pulled from HTML (which may, of course, be dynamic).

A good search engine index based on what humans would consume.


JS doesn't ensure that; it is easily gamed to serve SEO rubbish to the spider, which is different from what it renders to a real user.


I'm not a fan of modern js-ridden site implementations but a search engine that displays a resource different from what will actually be consumed by the user is a bad search engine.

Perhaps you can put the blame on the creator of the site, but that won't bring us anywhere if we just allow people to game the system.


That's exactly what happens, though, and JS takes some of the blame.

The spider is indexing stuff that is the result of code execution that won't reproduce!


I quit catering to users who don't run JS a decade ago

"nobody in their right mind" would expect to use the web in the last 15 years or so with JS disabled and have it actually work


Pretty much the entire web works just fine without JavaScript. Indeed, most of it works better without it - everything loads faster and there’s fewer annoyances like interstitial ads, auto playing audio, etc. Unless you’re serving an actual app, like webmail or online office, a webpage that doesn’t even render without js is an immediate bounce. I can get your content elsewhere.

I mean, almost the entire public web consists of blogs, forums, and news articles/essays. Almost all of these, with some modern exceptions, work perfectly fine. Of course, you may need JS to take full advantage of e.g., forum software features, but I can browse just fine. Sure, if you have a legitimate SPA like a mapping application or a game JS is a very reasonable expectation. But it’s hard for me to see what users gain otherwise. Actually, I don’t even see what developers gain, other than an improved resume.


To add to this: when a client comes to my employer, if it is required to work without Javascript it is the norm for that to be explicitly specified. Just to add to this point. It’s otherwise assumed not to be an issue




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: