What's the shebang (#!) in Facebook and new Twitter URLs for?

_delirium · on Oct 16, 2010

To save an extra click, basically the only information in the stack overflow answer is: see http://code.google.com/web/ajaxcrawling/

alanh · on Oct 16, 2010

Right.

By the way, the whole idea of Ajax Crawling as defined per Google struck me as wholly unnecessary. If you are using unobtrusive JavaScript, and HTML like so:

    <a href="http://twitter.com/ev">@ev</a>

with JavaScript adding a click handler such that a user’s click on the above link actually only triggers an AJAX deal (FWIW, jQuery stub:)

    jQuery('a[href^=http://twitter.com/]').live('click', function(){
        // Do AJAXy thing instead, then
        return false;
     });

Then, assuming you serve up essentially the same content at /ev that you would display via AJAX, you have simultaneously just enabled your site for not only modern browsers, but also robots and dumb phone / JS-avoiding users. (And middle-clickers.) No goofy API required.

Edit I just realized, if you have links to `http://twitter.com/#!ev` in the wild, then the AJAX crawler thing becomes actually pretty useful.

lukifer · on Oct 17, 2010

The hash is also the only way (AFAIK) to insert an Ajax request into the browser history, thus enabling the use of the browser's back button.

sirn · on Oct 17, 2010

History API[1]? :) If History API is being used alongside with alanh's method, that would eliminate the links in the wild problem. Flickr's lightbox view[2] is using this method for its AJAX viewing.

The biggest holdback is Internet Explorer, though. (Well, just like almost everything else.)

[1]: http://www.whatwg.org/specs/web-apps/current-work/multipage/...

[2]: http://www.flickr.com/photos/cubagallery/3986524856/lightbox...

wanderr · on Oct 17, 2010

Yeah, the links in the wild are the problem. Users have a habit of copying URLs from the URL bar, unfortunately.

I think Google could have done this better; they could have made it so all ajax URLs are crawlable (standard robots rules apply) as long as your site opts in, then you'd only have to support pretty URLs, something you probably have to do anyway.

For example, now twitter needs to support twitter.com/#!/ev and twitter.com/ev and twitter.com/?_escaped_fragment_=ev

aw3c2 · on Oct 17, 2010

Users have a habit of copying URLs from the URL bar, unfortunately.

How else are people supposed to bookmark or share an URL in your opinion?

wanderr · on Oct 17, 2010

That was a joke. Our site and many others have explicit ways to get a non-ajax URL for sharing, but it is of course more convenient and more natural in a browser to just grab from the URL bar. It's unfortunate for us but not remotely unexpected.

bilban · on Oct 17, 2010

This is where I'd expect the <link rel="canonical" href="blah" /> element to be used. With browsers using this as the preferred url for a bookmark - and maybe a service for sharing or copying urls from the browser that was easy to use.

alanh · on Oct 17, 2010

That would break bookmarking a section of a long article.

joelgwebber · on Oct 18, 2010

And how, pray tell, would you expect a crawler to fetch the content at twitter.com/#!/ev, considering that the server doesn't even see the hash part of the URL? That's the whole point.

The simple fact is that plenty of sites (like Twitter, Facebook, and lots of Google properties) are using client-side code to build fast interactive sites, and it necessitates this kind of infrastructure.

nose · on Oct 17, 2010

It also prevents the user's id from being leaked in the http referrer header

timdorr · on Oct 17, 2010

Interestingly enough, this is actually gone in browsers that implement the HTML5 History API: http://www.facebook.com/note.php?note_id=438532093919

For instance, use Chrome and browse around the site. You'll notice there are no hashed URLs.

The same thing is done on Flickr. Look at this page: http://www.flickr.com/photos/timdorr/3707685058/in/set-72157... Now click on the sections under "This photo belongs to". As you expand them out, you'll notice that the URL in your address bar changes. This is particularly useful in Flickr because you can use the arrow keys to navigate through photos. And when you link to a page, your personal state might have been browsing through a set instead of a full photostream. This keeps the state intact when sending links to other people. It's a great usability feature.

nix · on Oct 16, 2010

The #! declares to crawlers that a crawlable version of the page is also available from the server by replacing "#!" with "?_escaped_fragment=".

This mapping is needed because the fragment string is never passed to the server, so it has to be encoded elsewhere and the query section is the only available place.

The "!" is needed because otherwise crawlers would start fruitlessly hammering all the existing sites that use '#' but don't support the '?escaped_fragment=" hack.

toolate · on Oct 17, 2010

Just wanted to point out that the query string var is "_escaped_fragment_" with a leading and trailing underscore.

thwarted · on Oct 16, 2010

This title is missing the !. A shebang is #!, a # is a hash. The OP has both chars.

phreeza · on Oct 16, 2010

Isn't that something HN does automatically?

edit: Yes, it does: http://news.ycombinator.com/item?id=1799047

dionidium · on Oct 16, 2010

Yes, I submitted the original title.

thwarted · on Oct 17, 2010

That's kind of annoying. Is this some kind of code injection countermeasure?

gojomo · on Oct 17, 2010

I suspect it's just to prevent gratuitous exclamation points in submission titles. ("Best ever new app!" "You won't believe this!" etc.)

thwarted · on Oct 17, 2010

Heh, that sounds like something that should be a per-user option.

qeorge · on Oct 16, 2010

"AJAX Crawling" is great, but the hash trick has been used with AJAX long before this to provide back button support in AJAX-heavy websites.

Before this technique was widespread, it was all too common to hit "Back" and be taken to the previous website you'd viewed, even if you'd viewed more than one page on the original site.

By changing the URL's fragment the website can add entries to the browser's history. That way when the user clicks Back the website can react appropriately, changing its state back one iteration. This preserves the expected behavior for the end user, while allowing the website to leverage the benefits of AJAX.

The shebang (#!) is merely a way to distinguish AJAX links from bookmarks, for the purpose of crawling.

masklinn · on Oct 16, 2010

> "AJAX Crawling" is great, but the hash trick has been used with AJAX long before this to provide back button support in AJAX-heavy websites.

> Before this technique was widespread, it was all too common to hit "Back" and be taken to the previous website you'd viewed, even if you'd viewed more than one page on the original site.

Uh yes, the original poster of the question is aware of that:

> As far as I can recall, earlier this year it was just a normal URL-fragment-like string (starting with #), without the exclamation mark.

His question was not about the hash, but specifically about the shebang.

jambo · on Oct 16, 2010

Although it's a technical requirement in the current implementation of #newtwitter for the back button & crawling to work, I think part of the attraction of twitter was having your own simple URL. Now http://twitter.com/ev becomes http://twitter.com/#!/ev, which I can imagine confusing non-technical users.

jamiequint · on Oct 16, 2010

http://twitter.com/ev still works though so its kind of a non-issue

jambo · on Oct 16, 2010

It still works, but now you have two URLs for one resource, and when you go to the original URL it redirects you to the #! path, which judging from my timeline, people notice.

jacobbijani · on Oct 20, 2010

It isn't linked anywhere though, so it's still an issue.

thirsteh · on Oct 16, 2010

The hash lets you change the URL without the browser actually switching pages, too. People can bookmark pages even though they're "fancy-loaded".

carbocation · on Oct 16, 2010

Yes. If you're using jQuery, Ben Alman's jQuery BBQ is great for this.

armandososa · on Oct 16, 2010

I asked that recently on Quora

http://www.quora.com/Twitter-com-2010-Redesign/Why-did-they-...

There, a Twitter employee confirmed that they used the hashbang to comply with Google's proposal.

davidmurphy · on Oct 17, 2010

I understand that it helps Google crawl, but this sure is an in-elegant solution from the perspective of end-users & URLs. Not cool.

gallerytungsten · on Oct 17, 2010

What's old is new again. Some may recall the days of "bang addressing" to send email beyond a local site.

joelgwebber · on Oct 18, 2010

I'd say it's a lot cooler than having to change the HTTP protocol and push new web browsers to everyone.

callmeed · on Oct 16, 2010

Interesting, I wonder if this can be used for Flash sites that do deep linking + SWFObject to show content to Google.

mhill · on Oct 17, 2010

Does anyone have suggestion on good practices to make an Ajax page crawler friendly? Since the anchor url (#) are generated on the fly by Javascript, how does the crawler know what anchor urls to follow given the parent page?

Are invisible urls (for archor urls) still frown upon by Google?

russell_h · on Oct 16, 2010

I don't know about the ajax crawling answer, but I did that in one application to make the back/forward buttons and linking work with AJAX content.

It was a major PITA because you basically have to implement URL routing both client and server side.

raganwald · on Oct 16, 2010

Can you explain a bit more about your circumstances so that I can understand why you needed to interpret URLs and anchors on the server and the client?

The libraries I have used (jQTouch and Sammy) serve one single page and the client interprets the anchor. Thus, there is no need to implement URL routing on the server since you simply serve the application and let it route the URL.

When you change states on the client, instead of changing state and then changing the anchor, you change the anchor and that triggers an event which causes the client to interpret the URL and change its own state. Thus, there is only one piece of code, ever, that handles routing.

russell_h · on Oct 16, 2010

In my case I needed to be able to degrade gracefully when Javascript wasn't available, which meant I had to be able to do it server side as well. If you don't have to support that use case its definitely a lot easier.

raganwald · on Oct 17, 2010

THAT is a very good explanation, thanks!

p.s. If this happens to me, I expect I will extend my experimental Javascript view-controller framework so that it runs inside node.js. That way I could serve exactly the same content whether I was running in the browser or in node and I wouldn't need two sets of view logic.

russell_h · on Oct 17, 2010

Yes! I haven't gotten too excited about the whole "run the same code on the server and the client" aspect of Node (mind you I've been very excited about other aspects of it, its my new favorite framework/whatever-it-is), but this is one application where I really think that could be beneficial.

fuzzythinker · on Oct 17, 2010

raganwald gave a good explanation in Stackoverflow: http://stackoverflow.com/questions/3009380/whats-the-shebang....

Thanks for your GO game link, learned about "fun" alternatives for the game I didn't know before. I'll try them next time I play, thanks!

fiveo · on Oct 16, 2010

GWT handles this out of the box.

ceejayoz · on Oct 17, 2010

So that's what it's like to have your SO reputation gain max out...

riffraff · on Oct 17, 2010

interesting, I had not heard of this. It still does not play nice with curl, but seems a sensible idea.