Broken Links

zemaj · on Feb 10, 2011

Despite all the FUD around hashbangs, the genuine problem I see with them is that they optimise for internal page loads, not the entry into a website. For example with hashbangs, requests to twitter when logged in go like;

1) HTTP GET http://twitter.com/some_account [~500ms for me]

2) 302 redirect -> HTTP GET http://twitter.com/ [~600ms for me]

3) HTML tells browser to download some JS -> HTTP GET bundle.js [~500ms for me] (concurrently here we start getting CSS)

4) JS reads hashbang & request actual data -> HTTP GET data.json [~500ms for me]

... only after about 2 seconds can we start(!) rendering data. Now there's about another 2 seconds for all json data & CSS calls to complete. It takes upwards of 4 seconds for a twitter page to render for me (the Load event lies as it fires well before actual data shows. Try it yourself with your favourite browser inspector).

When not using hashbangs, a single HTTP request can get all the data for the page and start rendering it. One blocking CSS call (possibly cached) is all that's needed for styling.

Hence when I see an external link with a hashbang it frustrates me (barely perceptively) because I know that when I load the page it's going to take a longer than a normal HTTP request. Significantly longer. While subsequent page loads are faster, it's not these you want to optimise for if you care about bounce rates. This issue affects every new link you click into a website, so it affects an even larger number of requests than normal bounces.

Hashbangs are a good solution to an important problem, but I don't see them as a tool to build entire websites upon. Fortunately I see the performance issue as one which will result in people voting with their browsers and choosing sites which only use hashbangs when they genuinely improve the user experience - especially since they're easily visible in the url.

wmf · on Feb 10, 2011

Basically, NewTwitter isn't a Web site, it's an app and you have to "launch" it before you can do anything.

mustpax · on Feb 10, 2011

But once you do launch it, everything is faster than it would have been if you were performing full page loads at each step. For sites you "live" in, the application route makes a lot of sense. This is the way GMail works and people seem to like it a lot.

Unfortunately, web applications and web pages are growing increasingly divergent. It is simply not feasible to take the performance of web apps to the next level without doing away with full page loads. This is why Facebeook, Twitter et al are going the #! route. That's the cold hard truth.

roc · on Feb 10, 2011

Wouldn't js click handlers work just as well?

You follow a canonical link to the resource, get a real page back, with real links, but js click handlers to enable AJAX-goosed speed for those with javascript enabled? And given that they imply a fallback for those times when javascript fails, aren't they actually better?

And GMail is a bit different than Twitter. It handles inward-facing data; content that no-one particularly wants crawled and wouldn't benefit much from caching.

jbri · on Feb 10, 2011

Unless you want your URLs to look like twitter.com/someone#!someone_else, you're going to have to take the multi-step page load at some point when you transition from the HTML version to the AJAXy one.

witten · on Feb 10, 2011

Not if you use the HTML history API: http://html5demos.com/history

Admittedly, you need a modern browser for that. But you can always present full-page-load HTML to users with older browsers and then provide AJAXy history-ified goodness to everyone else.

mbreese · on Feb 10, 2011

But Gmail is a webapp, not a website. You don't get a link to an email stored in Gmail. You can only access things once you're already inside.

Twitter is very different. If you're signing in yourself to update your own feed, that's one thing. But when you are trying to view the feed of someone else, that's a horse of a different color.

The question really is: should Twitter be an "app" or a "site"? If you allow links in from the rest of the web, it should be a "site".

Qz · on Feb 10, 2011

Regardless of your question, the answer is that it's an appsite.

simonw · on Feb 10, 2011

Which sucks, because I frequently end up "launching" it by clicking a link to a Twitter profile or tweet on another site, so every few minutes I have to wait for the entire thing to load - and I often end up with a dozen tabs containing the Twitter app when all I really wanted was a few hundred bytes of HTML containing a single tweet!

stanleydrew · on Feb 10, 2011

Presumably your browser caches the app so that you don't have to re-download it on subsequent visits.

aphexairlines · on Feb 11, 2011

The browser might only cache the js code, not the work that happens on load or the memory the app consumes.

simonw · on Feb 11, 2011

Even with the assets cached, there's still an 8-10 second loading time (I just timed it). Compare that to a static HTML page showing the same content which would probably load and render in under a second.

yuhong · on Feb 11, 2011

Yea, in this case HTML/HTTP is being abused as wrapper around an app written in JavaScript, which loads and manipulates the content you see in your browser.

romaniv · on Feb 11, 2011

Hashbangs are a workaround. A good _solution_ would be something that doesn't require running JavaScript and doesn't mess with URL/document models most of the Web is based on.

For example, browsers could implement partial caching. Here is how it could work. The first time the browser requests a page, it gets all the content in the response. However, some fragments of the content are identified as cacheable and marked with unique ids. When a browser requests a page for the second time, it sends a list of identifiers for the cached fragments to the server. The server then doesn't render those fragments, but places small placeholders/identifiers where they should be substituted into page content.

---

First Request

GET index.html

---

First Response

[cacheable id="abc"] [h1]This is twitter[/h1] bla bla bla, header content [/cacheable] ... Page content ... [cacheable id="xyz"] footer content [/cacheable]

---

Second Request

GET index.html Cached: abc, xyz

---

Second Response

[fragment id="abc" /] ... Page content ... [fragment id="xyz" /]

othermaciej · on Feb 10, 2011

HTML5 "AJAX History", also known as History.pushState, can solve this problem. It allows a website to update its contents with AJAX, but change the URL to a real URL that will actually retrieve the proper resource direct from the server, while maintaining proper back-forward navigation.

See <http://dev.w3.org/html5/spec/Overview.html#dom-history-pushs...; for spec details.

It's in Safari, Chrome and Firefox. While Opera and IE don't have it yet, it would be easy to use conditionally on browsers that support it. I'm a little surprised that more sites don't use it.

EDIT: This chart shows what browsers it will work in: http://caniuse.com/history

thomas11 · on Feb 10, 2011

It's really great that in a few years, browsers will support a new AJAX technology that solves this problem that we wouldn't even have with sane, traditional URL schemes.

othermaciej · on Feb 10, 2011

Maybe you're trying to be snarky, but I'll choose to take your comment seriously.

The AJAX approach to Web apps does provide a genuine user interface benefit. A full page load is very disruptive to application flow, and being able to have new data appear without incurring that penalty is great. Most of the time you only need to load a little bit of data anyway, and it's wasteful to reload all the markup that wraps it.

AJAX solves that problem, but it creates the new problem that your address field no longer properly reflects the navigation you do inside a web app. #! URLs are one approach to fixing it, and pushState will do even better. At that point, the user won't even have to notice that the Web app they use is built in an AJAXy style, other than the navigation being smoother and faster.

_delirium · on Feb 10, 2011

A full page load is very disruptive to application flow, and being able to have new data appear without incurring that penalty is great

On many examples I don't see any real disruption to application flow with just using normal links, though there are more full-fledged webapps (like gmail) where I would agree. Playing around with old v. new Twitter, the old one actually has considerably faster navigation performance, at least on my setup (and I'm using a recent Chrome on a recent Macbook Pro). Sure, some HTML header/footer stuff is being retransmitted, but it's not very big.

rahoulb · on Feb 10, 2011

I would put it down to there finally being a distinction between web-sites and web-apps.

Within an app, my current context (which control has the focus etc) is important and a full page reload loses all of that.

Within a web site, as it is less interactive, this stuff doesn't matter so much.

As to whether New Twitter is a site or an app is debatable (I say site and therefore shouldn't be using #!). And as for Gawker...

thomas11 · on Feb 10, 2011

"Most of the time you only need to load a little bit of data anyway" - that's highly questionable as a general statement. In a rich UI like GMail, yes. But in examples like the new Lifehacker, you load a whole story, yet its locator is behind the hashbang.

Not every website is a web app. Just show one article or item or whatever the site is about under one URI.

othermaciej · on Feb 10, 2011

That might be the case for LifeHacker (don't know; don't use it). But the example in Tim Bray's post is Twitter, which definitely needs to load a lot less data than it's full interface markup on most navigations.

robryan · on Feb 10, 2011

Lifehacker kind of looks nicer only loading reloading the story and not the whole page. I gives things an application feel rather than a collection of pages and saves a heap of extra processing, why run the code again to generate a header and footer and side bars constantly when the version the user is seeing is perfectly up to date.

joelanman · on Feb 10, 2011

The experience is slicker - if you run a search on lifeHacker, you can click through and browse the results without affecting the rest of the page (including the list of results). With traditional page refreshes this would not be possible.

davnola · on Feb 10, 2011

Is it ready for the mainstream?

Apart from not being supported in IE - the browsers that do support it still have quirks e.g. your code has to manually track scroll state.

mjs · on Feb 10, 2011

This solves two problems: (1) the only visible/bookmarkable URLs are those without a #!; and (2) initial page loads can be fulfilled by a single request to the server. It doesn't solve the problem of URL discovery, but two out of three ain't bad.

lyime · on Feb 10, 2011

Not sure what you mean by URL discovery. Although link/hrefs should be the same/legacy. What you could do is progressively enhance normal links. Javascript could disable the default behavior of a link <a href="/about/team" data-remote="true">About Team</a> and check and see if browser supports History.pushState. If it does it would just request the appropriate content for /about/team and update it client side, then update the url. If pushState is no supported, it could just request that link /about/team normally. This would be the ideal way to support both regular and progressively/JS enhanced pages (for speed).

JoachimSchipper · on Feb 10, 2011

Well, it "solves" it - you still have to download and parse a ton of Javascript before you even begin downloading the data...

wahnfrieden · on Feb 10, 2011

CDNs make the download part much less of a problem.

And your server could easily send a fully rendered page on the first page load when it receives a full URL (one which was made by pushState and linked elsewhere) and still subsequently load pages via XHR. So it wouldn't have to parse any JS on first load -- subsequent loads would, but they'd be saving time from not downloading as much and not refreshing the entire page.

pilif · on Feb 10, 2011

With pushState not widely implemented, you have three choices:

1) don't use AJAX in response to actions that alter the page content in a significant way. This of course forces page reloads and prevents the cool emerging pattern that is to not serve dynamic HTML but just have a REST API and do the rendering client side.

2) you do the ajaxy stuff but you don't touch the URL. This leads to a nonworking back button and prevents users from bookmarking or sharing links to specific views. You can work around this google maps style with some explicit "link to this page" functionality, but I would guess, people just don't get that.

3) you do the fragment change thing which allows for ajaxy page content changes but also makes the back button work and keeps links inherently shareable and bookmarkable at the cost of that one redirect, at the cost of screen-scrapability and maybe confusing to techies (normal people probably don't care either way)

pushState can look like a salvation, but keep one thing in mind: to keep the page working for browsers without JS (and screen scrapers), you will have to do your work twice and STILL render dynamic content on the server which is something people are now beginning to try to avoid.

Finally, as pushState is yet another not widely deployed thing, for the next five to ten years, you would have to do all of this three times: dynamic HTML generation for the purists. pushState for the new browsers and fragment change for IE.

Personally, I really feel that fragment change is a good compromise as it works with browsers and even in IE while still allowing the nice pattern of not rendering anything on the server and keeping the URLs shareable.

Maybe this current uproar is based on a) techies not used to this (normal people don't notice) and b) badly broken JS that sometimes prevents views from rendering AT ALL, but this is not caused by an inherent problem with the technology: if I screw up the server side rendering the page will be as empty as it is if I screw up on the client side.

andrewgodwin · on Feb 10, 2011

The main problem with the fragment change solution is that it _doesn't work without JavaScript_. And we're not talking for the one user browsing the site - any links people post (on forums, mailing lists, etc.) that have fragments in them are simply unusable for people without JavaScript, as the server does not get sent the fragment - the best it can do is send a generic "oh, sorry, no JS" page back.

This would be a problem for search engines as well, if it wasn't for the awful translation Google said they'd do. It's just breaking the meaning of fragment identifiers completely, and that really makes me worried.

othermaciej · on Feb 10, 2011

pushState with non-hash URLs doesn't require you to do server-side HTML generation. You can just send a stub page which looks at the URL and loads the right data, just as with hash URLs. To deploy it incrementally, you only really need one code path with a slight fork depending on whether the current URL contains a #! and whether the current browser supports pushState.

pilif · on Feb 10, 2011

I know, but browsers without JS need the server side generated content for that URL or the original complaints just arise again (empty page, albeit with a different URL now)

drdaeman · on Feb 10, 2011

I've proposed sort of possible solution here, in a related discussion: http://news.ycombinator.com/item?id=2197064

jules · on Feb 10, 2011

Might this confuse search engines? For example bing seems to use click data from IE users clicking links in their ranking, so perhaps the whole site being one URL would confuse it. More alarmingly, when people link to your site they will link to site.com/#xyz by copying from the address bar. So search engines will think all links to your site are to the homepage.

bruceboughton · on Feb 10, 2011

Isn't the underlying problem that web applications are often displaying combinations of content that doesn't have a natural URL?

Take New Twitter, for example. If I click on a tweet in my stream, it shows related tweets. If a drill down a few of those, at some point it becomes impossible to represent the address of the current state in a sane manner.

I think URLs are particular to the web (desktop apps don't have them) because the web is traditionally about content. Web applications are increasingly breaking that. Perhaps web applications and URLs don't go together all that well.

Don't get me wrong--I love URLs, and it's crazy for content sites like Lifehacker to break them for so little benefit. But maybe the reason for this hashbang trend is that URLs aren't expressive enough for some of these sites.

prodigal_erik · on Feb 10, 2011

In that case "web application" is a misnomer. If the current state has no natural URL, it's not a legitimate part of the World-Wide Web. Instead the authors are tunneling a proprietary protocol over AJAX to carry opaque content to a single-purpose GUI app, just like all the terrible client/server apps from the 90s only slower.

vannevar · on Feb 11, 2011

If the current state has no natural URL, it's not a legitimate part of the World-Wide Web.

But most of the popular content accessible via the web now fits this description. Look at Google's own homepage, it's a complex Javascript application that's completely opaque. @bruceboughton is right, the problem isn't that people aren't respecting the WWW specification, it's that the specification is no longer adequate to describe what the Web has become.

prodigal_erik · on Feb 11, 2011

Google has competent web developers who practice progressive enhancement. Their search form and results have stable (even sensible) URLs and are perfectly usable without trusting their js.

vannevar · on Feb 11, 2011

Sure the application has a URL. But once I load it, it constructs what I see on the page on the fly. I can turn Javascript off and load an alternate page, but if I leave it on I'm loading an application that's every bit as opaque as Flash.

Isofarro · on Feb 10, 2011

Ran into another interesting shortcoming of hash-bang URLs last night looking through my referrer log. Loads of referring URLs of http://gawker.com/ and http://kotaku.com/ to my blogpost. But no mention at all of my blog-post or a link to it on the homepage.

First I thought they were referrer log spamming, then it dawned on me that fragment identifiers get stripped out of HTTP referers, so making hash-bangs useless as a means of joining up distributed conversations on the web.

Somewhere on those two Gawker media sites there's a conversation going on about the use of hash-bangs. But nobody outside knows about it. It's a big black hole.

Bockit · on Feb 10, 2011

Can't it work both ways? Serve the #! links and provide canonical content located at the (almost) same uri sans #!.

If you visit http://mysicksite.com/article/1 javascript changes all the links to the #! format. Then when the user clicks the links they enter #! land.

Now the user copies a link from their address bar and puts it into the wild. Someone gets that link, http://mysicksite.com/#!/article/1, and visits it. Rewrite with htaccess or whatever method you employ to serve the content at http://mysicksite.com/article/1, using javascript to change all the links to the #! format.

I posted this in the reddit thread about the Gawker/lifehacker problems recently, but was too late for anyone to really give me a response. For those of you that have worked with these kind of systems before, would this solve the problem the original link was describing?

EDIT: Ahh I think I get the problem now, of course after I post it. Server doesn't get the data from the uri trailing the #! I think?

s0urceror · on Feb 10, 2011

That is, indeed, the crux of the problem. Anything after the hash is client-only.

aamar · on Feb 10, 2011

The problem in this situation is that you have a smart technical person arguing for technical purity, while at the same time (seemingly) ignoring the mostly non-technical considerations of user-experience and economics.

Yes, the old, conservative model of HTML is very simple, but when people use AJAX well, the user experience is enormously and materially improved. We're still early in the development of this medium, and many people will do it wrong. But even the people who do it right will probably seem inelegant and kludgey by the standards of the old model.

And yes, you can get both AJAX and clean URLs via (still poorly-supported) HTML5 History API and/or other progressive enhancement methods, but these may require a significant amount of additional effort. Maybe worth it, maybe not.

This topic reminds me of when sound was added to movies. "Tight coupling" and "hideous kludge" sound a lot like the arguments that were made against that too. The conventional wisdom was to make your talkie such that the story worked even without sound; one can still sometimes hear that, but it isn't, I think, a standard that we associate with the best movies being made today.

nostrademons · on Feb 10, 2011

It's not really that bad. The people using hash-bangs are following a spec proposed by Google to make AJAX webpages crawlable:

http://code.google.com/web/ajaxcrawling/docs/specification.h...

So when you see the lifehacker URL in the article, you know that there's an equivalent non-AJAX URL available with the same content at:

http://lifehacker.com/?_escaped_fragment_=5753509/hello-worl...

There's no need to execute all the JavaScript that comes back from the server - if they're following the spec, all you have to do is escape the fragment and toss it over to a CGI arg.

Another option is progressive enhancement, where you make every link point to a valid page and then add onclick event handlers that override the click event to do whatever JavaScript you want it to. I think this is a far superior option in general, but it has various issues in latency and coding complexity, so a good portion of web developers didn't do it anyway.

brown9-2 · on Feb 10, 2011

But as Tim says, the spec proposed by Google is only meant to fix some problems (can't be searched by search engines) caused by using this URL scheme. It isn't meant to be a one-guide-fits-all approach making AJAX content addressable.

In other words the spec treats one of the symptoms, not the original problem.

jvdongen · on Feb 10, 2011

[EDIT: never mind, missed this response, similar in style but 2h earlier ... http://news.ycombinator.com/item?id=2197064]

May be I'm missing something, but it seems to me that there is a way to have your cake and eat it too in this case.

Say we have a site with a page /contacts/ which lists various contacts.

On this page there are completely normal links like '/contacts/john/', each link preceded by/wrapped by an anchor tag - <a href="john"> in this case.

If you visit this site without javascript enabled (e.g. you happen to be a web crawler), you just follow the links and you get just regular pages as always.

If however you've javascript enabled, onclick events on each url intercept a click on a link and fetch just the information about the contact you clicked on (using an alternate url, for example /contacts/john.json), cancels the default action and (re)renders the page.

Then it does one of two things: - if pushState is supported it just updates the url - if pushState is not supported it adds '#john' to the url

If someone visits '/contacts/#john' with javascript enabled, /contacts/ is retrieved and then john's data is loaded and displayed.

If someone visits '/contacts/#john' without javascript enabled, he gets the full contact list, with the focus on the link to john's page, which he can then click.

By using this scheme: - search engine and other non-javascript users can fully use the site and see completely normal urls - XHR page loads are supported - XHR loaded pages don't break the backbutton - XHR loaded pages are bookmarkable - Bookmarks to XHR loaded pages are fully shareable if the recipient has javascript enabled or pushState is supported, and at least not totally broken if not.

The only drawback I can see is the 'sharing bookmarks with someone who has no javascript support' issue - is that a real biggie? In addition of course to the 'made error in javascript, now all stops working' issue - but that is something that has not so much do with the #! debate as well as with the 'is loading primary content via XHR a good idea' debate.

To me it seems that current users of the #! technique have just gone overboard a bit by relying only on the #! technique instead of combining it in a progressively enhancing way with regular HTTP requests.

vanessafox · on Feb 10, 2011

I posted more as a comment on the original story, but I have covered this issue in depth (from when Google initially proposed it, to when it was launched) here:

http://searchengineland.com/google-proposes-to-make-ajax-cra...

http://searchengineland.com/googles-proposal-for-crawling-aj...

http://searchengineland.com/its-official-googles-proposal-fo...

Of course, a better solution is some type of progressive enhancement that ensures both that search engines can crawl the URLs and anyone using device without JavaScript support can view all of the content and navigate the site.

rushabh · on Feb 10, 2011

I can't understand how hard would it be for someone writing a crawler to replace a hashbang (#!) with _escaped_fragment_

For developers of AJAX apps it: 1. Improves productivity 2. Improves user experience 3. Is more efficient on the server as it prevents a lot of initializing code.

I think the old school needs to wake-up a bit!

siddhant · on Feb 10, 2011

Facebook had an outage some time back (I think this one - http://www.facebook.com/note.php?note_id=431441338919), and when everything got back to normal, the hash-banged URLs were gone. Was it related?

robryan · on Feb 10, 2011

They are still there in IE, so I guess they have started using push state where available.

alexkearns · on Feb 10, 2011

Yet another annoying pontificating article about hashbangs. Why can't people accept that there are more than one way of doing things on the web.

Just because you don't like using hashbangs does not mean no-one else can.

Sure, use of hashbangs might make seo of your site harder. Yes, it might make it harder for hackers who want to do curls of your site's pages. But maybe this is not your aim with your site.

Maybe you want to give your users a slicker experience by not loading whole new pages but instead grabs bits of new content.

The web is a place for experimentation and we as hackers should encourage such experimentation, rather than condemning it because it does not fit with how we think things should be done.

andolanra · on Feb 10, 2011

A while back, there was this pie-in-the-sky idea which was really interesting but not too practical, called Semantic Web. It didn't really pan out because it turns out that annotating your sites with metadata is boring and tedious and nobody really liked to do it, and anyway, search and Bayesian statistics simulated the big ideas of Semantic Web well enough for most people.

The ideas behind it still stand, though, in the idea of microformats. These are just standardized ways of using existing HTML to structure particular kinds of data, so any program (browser plug-in, web crawler, &c) can scrape through my data and parse it as metadata, more precisely and with greater semantic content than raw text search, but without the tedium that comes with ontologies and RDF.

Now, these ideas are about the structured exchange of information between arbitrary nodes on the internet. If every recipe site used the hRecipe microformat, for example, I could write a recipe search engine which automatically parses the given recipes and supply them in various formats (recipe card, full-page instructions, &c) because I have a recipe schema and arbitrary recipes I've never seen before on sites my crawler just found conform to this. I could write a local client that does the same thing, or a web app which consolidates the recipes from other sites into my own personal recipe book. It turns the internet into much more of a net, and makes pulling together this information in new and interesting ways tenable. In its grandest incarnation, using the whole internet would be like using Wolfram Alpha.

The #! has precisely the opposite effect. If you offer #! urls and nothing else, then you are making your site harder to process except by human beings sitting at full-stack, JS-enabled, HTML5-ready web browsers; you are actively hindering any other kind of data exchange. Using #!-only is a valid choice, I'm not saying it's always the wrong one—web apps definitely benefit from #! much more than they do from awkward backwards compatibility. But using #! without graceful degradation of your pages turns the internet from interconnected-realms-of-information to what amounts to a distribution channel for your webapps. It actively hinders communication between anybody but the server and the client, and closes off lots of ideas about what the internet could be, and those ideas are not just "SEO is harder and people can't use curl anymore."

I don't want to condemn experimentation, either, and I'm as excited as anyone to see what JS can do when it's really unleashed. But framing this debate as an argument between crotchety graybeards and The Daring Future Of The Internet misses a lot of the subtleties involved.

aamar · on Feb 10, 2011

Very interesting points, but there are couple of errors which undermine part of your point: 1. If the application follows the Google proposed-convention or similar, the crawler doesn't need a full-stack JS implementation; it just needs to do the (trivial) URL remapping. 2. Nothing in this hash-bang approach requires a HTML5-ready browser.

Isofarro · on Feb 10, 2011

I tried both curl and wget last night (neither of these are HTML5-ready browsers), and neither of them could get content using the hash-bang URL. They both came back with an empty page skeleton.

Also, how do you reassemble the hash-bang URL from HTTP Referrer header?

wahnfrieden · on Feb 10, 2011

Neither curl nor wget follow the Google convention for handling hashbangs as suggested by the parent, so I'm not sure what you're getting at with this reply.

Isofarro · on Feb 10, 2011

Hash-bang URLs are not reliable references to content - that's what I am getting at. Curl and WGet are perhaps the most used non-browser user-agents on the web. And both of them are unable to retrieve content at a URL specified by a hash-bang URL.

In this context hash-bang urls are broken.

aamar · on Feb 10, 2011

I'm sorry if I implied that curl/wget handle this already. However, they could handle this with a very small wrapper script, maybe 3 lines of code, or a very short patch if the convention becomes a standard. That's not nothing, but it's maybe 7 orders of magnitude lighter than a full JS engine, and it's small anyway compared to the number of cases that a reasonable crawler needs to handle.

Also, with that wrapper or patch, curl & wget will still not be remotely HTML5 ready, which I hope demonstrates that HTML5 is not a requirement in any way. A single HTML5-non-ready browser that can't handle this doesn't mean therefore that HTML5 is a requirement.

wahnfrieden · on Feb 10, 2011

They aren't? You're only supposed to use them if you follow Google's convention, in which case they should be reliably replaced with a normal URL sans the hash. Of courses your scraper must be aware of this, but it should be a somewhat reliable pseudo-standard (and it is just a stopgap after all).

andolanra · on Feb 10, 2011

We're talking about different internets, though. You're talking about the hypothetical patched internet that uses Google's #! remapping, whereas I'm talking about the internet as it exists right now. If I go to Gawker with lynx right now, it will not work, period. The fact that there exists the details of implementation somewhere—and the fact that the implementation is trivial—doesn't mean that it should become standard across the board.

I hate to invoke a slippery slope, but it seems a frightening proposition that $entity can start putting out arbitrary standards and suddenly the entire Internet infrastructure has to follow suit in order to be compatible. It's happened before, e.g. favicon.ico. All of them are noble ideas (personalize bookmarks and site feel, allow Ajax content to be accessible) with troublesome implementation (force thousands of redundant GET /favicon.ico requests instead of using something like <meta>, force existing infrastructure to make changes if they want to continue operations as usual.)

All of this is moot, of course, if you just write your pages to fall back sensibly instead of doing what Gawker did and allowing no backwards-compatible text-only fallback. Have JS rewrite your links from "foo/bar" to "#!foo/bar" and then non-compliant user agents and compliant browsers are happy.

aamar · on Feb 10, 2011

> If I go to Gawker with lynx right now, it will not work, period.

As a specific issue, that seems like a minus, but an exceedingly minor one, as lynx is probably a negligible proportion of Gawker's audience. In principle, backwards-compatibility is a great thing, until it impedes some kind of desirable change, such as doing something new or doing it more economically.

> it seems a frightening proposition that $entity can start putting out arbitrary standards

I generally do want someone putting out new standards, and sometimes it's worth breaking backwards-compatibility to an extent. So it really depends on $entity: if it's WHATWG, great. If it's Google, then more caution is warranted. But there's been plenty of cases of innovations (e.g. canvas) starting with a specific player and going mainstream from there. I do agree that Google's approach feels like an ugly hack in a way that is reminiscent of favicon.ico.

> All of this is moot, of course...

This is good general advice, but it's not always true. At least one webapp I've worked on has many important ajax-loads triggered by non-anchor elements; it's about as useful in lynx as Google maps would be. The devs could go through and convert as much as possible to gracefully-degrading anchors, that would at least partly help with noscript, but it seems like a really bad use of resources, given the goals of that app.

ladon86 · on Feb 10, 2011

Ah, but the #! is probably just using JS to access a well-defined API - the same API which anyone else can access in completely uncluttered, machine-readable form.

So perhaps the solution is for every #! page to have a meta tag pointing to the canonical API resource which it is drawing data from. Bingo, semantic web!

jarek · on Feb 10, 2011

You also have to ensure every relevant site (in this example, every site that would have used hRecipe) uses the same API scheme.

tlack · on Feb 10, 2011

You can still avoid loading whole new pages. You simply attach Javascript events to your anchor tags and do whatever Ajax content trickery you want that way. The page content itself is maximally flexible and useful to all agents if the URLs inside of it are actual URLs.

joelanman · on Feb 10, 2011

The only problem with that is you end up with a mix of both. If a spider collects all the non-ajax links, and shows that to a javascript-enabled browser, the user will end up on eg. /shop/shoes.

If the site is ajax enabled for a slicker experience, then as the user browses from here they might get something like this in their address bar:

/shop/shoes#!shop/socks

or even

/shop/shoes#!help/technical

which starts to look really weird. The google hashbang spec at least fixes this problem. The spider understands the normal URLs of the app, and will dispatch users to them.

Isofarro · on Feb 10, 2011

Can you not use JavaScript to figure out your URL is a mess and redirect accordingly? JavaScript for redirecting people to the homepage of websites have been available on dynamicdrive.com for at least a decade now.

That's one redirect to the homepage (which you're already doing by 301-redirecting the JavaScript-free URLs anyway), so it's hardly going to be difficult.

I'm puzzled, considering the haphazard redirects already going on for incoming links to hash-banged sites, why this isn't a trivial problem.

Incoming link is to /shop/shoes#!shop/socks JavaScript right at the top of /shop/shoes that window.location to /#!shop/shoes

joelanman · on Feb 10, 2011

two problems

1) The link is weird and confusing in the first place, /shop/shoes#!shop/socks refers to two different resources

2) The server will already have done work to find the shoes, when the javascript redirects to the socks page.

Isofarro · on Feb 10, 2011

1.) Is a limitation of Google's crawlable Ajax proposal. That would probably not have occurred with a proper standards body. What sequence of events would have to happen to have that as an inbound URL? I sense some previous JavaScript would have to have failed to allow that scenario.

2.) The site is already paying this price by redirecting _escaped_fragment_ URLs, and the old clean style urls. All inbound links will have this problem, so you're only shifting some of the burden through this door instead of the others.

joelanman · on Feb 10, 2011

no, with google's proposal, the #! links are all from the site root, see Lifehacker and Twitter's implementation. So these ugly half and half URLs never exist, and you're not paying a double request price

Isofarro · on Feb 10, 2011

Google's proposed kludge doesn't limit URLs to the site root - a path segment is documented. Have a read of it: http://code.google.com/web/ajaxcrawling/docs/specification.h...

joelanman · on Feb 10, 2011

ah you're right, and yes that could possibly introduce the issue of redundant work done on the server depending on the implementation. However the two major implementations I've seen (Twitter and Lifehacker) use it from the root and so dont have that problem.

rimantas · on Feb 10, 2011

aka "Hijax": http://domscripting.com/blog/display/41

rix0r · on Feb 10, 2011

True, but the same non-reload could be accomplished with:

  <a href="realurl.html" onclick="javascript_magic(); return false">

And wouldn't break spiders.

pilif · on Feb 10, 2011

It also wouldn't change the page URL, making the result of that click non bookmark able or it would mess with the fragment again, sooner or later creating URLs that look like

/help/thing#!/something/otherthing

Which is equally confusing and more error prone for the developers as /help/thing loads code specific to that view and then the /something/otherthing stuff is loaded too. Not insolvable and preventable by proper encapsulation, but stuff leaks, so mistakes will happen.

joelanman · on Feb 10, 2011

totally agree - if using hashbangs provides the best experience for your context, why not use it?

garrettgillas · on Feb 10, 2011

The point of mainstream sites indicating that the page has ajax with the URL path is to tell search engines. I have a feeling that what the author doesn't get is that it is very hard for search engines to tell the difference between ajax pages, static pages, and spammy keyword stuffed pages.

To me, it seems that Google recommends indicating ajax content in the path in the same way that our government issues concealed weapon permits. Yes it okay to have concealed content that can loads on the fly as long as you are very clear of your intentions. Once again this is a usability issue that wouldn't be an issue if it weren't for spammers.

zachbeane · on Feb 10, 2011

This rant would be more effective and persuasive if also directed at the Google engineers who made this hashbang style pervasive in Google Groups. I didn't think it would be possible to get deep links to old articles even worse than before, but they managed it.

il · on Feb 10, 2011

It's interesting how many upvotes this is getting in a very short time. However, I don't think the average Twitter user cares about performance and URL elegance, so I doubt Twitter will change anything.

jamesjyu · on Feb 10, 2011

I have seen performance issues and outright broken behavior with Twitter's hashbang ajax loading scheme. In that respect, regular users will care (they just won't necessarily know what is causing the issue).

macrael · on Feb 10, 2011

Probably not, but that doesn't mean people who do care shouldn't discuss the implications, or that Twitter shouldn't think there is a problem.

guelo · on Feb 10, 2011

Considering that twitter is the main reason for the spread of the abomination that is URL shorteners you're probably right. They don't seem to care about the health of the web.

zaius · on Feb 10, 2011

I think people are missing a huge benefit of the hashbang syntax: readable and copy/paste-able URLs. Without them, it's impossible to have an ajax application with a decent URL scheme.

masklinn · on Feb 10, 2011

You don't need the hashbang for that. You never did. Hashbang only tells google "munge around this shit to get an actual page".

robin_reala · on Feb 10, 2011

What the hashbang does is update the URI in such a way that it’s possible to bookmark/share a URI (assuming you’ve coded your app properly). It is however just a stopgap measure until pushstate turns up - that way we’ll be able to have URIs that encode the document/state reference in the proper part, not the fragment.

Actually, you are right with regards to the bang pash of hashbang - that’s just a Google defined way of letting them access the content.

zaius · on Feb 10, 2011

I'm curious what the other solutions are then. The only one I can think of is the History.pushState, and that's only supported in newer browsers.

Let's say I'm writing a web based word processor, and a user clicks on a document. I want the URL to be a reference to that specific document. The only way to change the URL to be specific without requiring a whole-page refresh is to use the hashbang syntax.

masklinn · on Feb 10, 2011

> I'm curious what the other solutions are then.

The hash without the bang. It's only been done for about 10 years. You can put whatever you want after the hash. It's up to your application to decide the meaning of it.

For an example, see the OSX SDK documentations: http://developer.apple.com/library/mac/#documentation/Cocoa/...

tjpick · on Feb 10, 2011

well it looks like they are emitting crap like this

<a rel="nofollow" href="/#!about/" title="Click here to go to About">About</a>

without the hashbang shit, that's a nice normal link that any normal http client can work with. you can still layer on ajax to bind to that link, and use the fragment for the benefit of browser state and user interaction should you think that is a wise thing.

jamesjyu · on Feb 10, 2011

I think the real question here is whether the application should be loading the main content via AJAX in the first place. Tim argues here that it should not, for this use case.

zaius · on Feb 10, 2011

Then that's what the article should say, instead of hating on the hashbang syntax, which is currently the best solution to the problem.

I also don't see why these web pages (specifically twitter) shouldn't be an ajax application - I think loading a page statically, and then the data via ajax is a Good Thing™.

jcfrei · on Feb 10, 2011

Just a thought - but could a lot of people complaining about hashbangs still be browsing the web with lynx?

dtby · on Feb 10, 2011

Hi, HTML/HTTP are the second worse application delivery platform available. Try not to be shocked.

Sorry, your other choice was #1.