Some web development tips from a former Digg developer

kilburn · on Aug 24, 2012

> Know why it’s important that your GET requests should always be idempotent. For example, your sign-out should only work as a POST request so that someone cannot make your users sign out by just including an <img> tag in their forum signature.

You got that mixed up. Idempotent requests mean that the result is exactly the same even if issued multiple times. In the case of a logout, idempotency is pretty much granted even when using GET requests. The idea here is that GET request should not change the state of the application, because the browsers are happy about opening the same URL multiple times without user confirmation. For instance, a "post/delte/last" URL that deletes your last post would be a terrible idea, because of the following scenario:

  1. The user goes to the "post list", */posts*
  2. The user hits "delete last post", and the web sends him to */posts/delete/last*
  3. The user goes somewhere else, */somewhere*
  4. The user decides he would like to go back, and clicks "back". His browser opens */posts/delete/last* without any warning. _Ops!_ he has just deleted another post without even noticing!

The <img> URL issue is a separate concern: that of Cross-Site Request Forgery (CSRF). The easiest way to protect from this security issue is to require a single-use token for each request that changes the application state. You can read more about it at the Open Web Application Security Project website: https://www.owasp.org/index.php/Cross-Site_Request_Forgery_%...

Zarel · on Aug 29, 2012

The main problem here is that "idempotent" means something different in math/CS than it colloquially does in HTTP.

In math/CS, "idempotent" means "has the same effect when done 1 time as when done n>1 times."

In HTTP, GET requests are often described as "idempotent" by someone who actually means "nullipotent" (i.e. "has the same effect when done 0 times as when done n>0 times"). This is because the spec describes GET, PUT, and DELETE requests as idempotent - which they are, it's just that GET requests are nullipotent as well.

Wikipedia mentions this briefly:

http://en.wikipedia.org/wiki/Idempotent#Computer_science_mea...

MBCook · on Aug 24, 2012

His solution isn't as secure as CSRF protection, but it would cover the basic case easily. All you have to do is pretend logging out isn't idempotent.

Androsynth · on Aug 24, 2012

logout is not idempotent. If you send a log-out request, the system will log you out. All subsequent attempts will do nothing.

Therefore the results are different for different states and it is not idempotent.

mitchellh · on Aug 24, 2012

I don't think you quite understand the definition of idempotent. A process is idempotent is if the process can be applied multiple times without changing the result achieved from the first application.

In this case, logging out multiple times does not change anything from the first application.

njharman · on Aug 24, 2012

you're making way too much assumption as to what logout does behind the scenes. "without changing result" != "without changing state"

sending notifications, updating counters, etc. all could be result of logging out.

kilburn · on Aug 24, 2012

I think it is easy for us to agree in that, from the client's point of view, logging out of a website is idempotent.

Now, you got a point about idempotence from the server's point of view. However, it would take a _badly_ programmed website for the logout operation to _not_ be idempotent. Sending notifications, updating counter, etc. _without first checking if the user is really logged in or not_ is simply moronic. This simple check is what would turn the logout operation into an idempotent one in the server too.

paulgb · on Aug 24, 2012

> Therefore the results are different for different states and it is not idempotent.

No, from all states, the result is that you're logged out.

firat · on Aug 24, 2012

I don't think this is the right way to think about it. The ending "state" might be the same, but the output of the first logout operation would be "change of state: logged in -> logged out". In the second case, however, it would be "no change in state". It does different things under different circumstances.

kilburn · on Aug 24, 2012

From the Wikipedia article on Idempotence [1]:

  Similarly, changing a customer's address is typically
  idempotent, because the final address will be the same 
  no matter how many times it is submitted.

So, even if in one case there's an internal state change (going from an old address to a new one) whereas in the other there is not (going from the new address to the new one again), it is commonly considered idempotent because the end result is the same.

[1] http://en.wikipedia.org/wiki/Idempotence#Computer_science_me...

firat · on Aug 24, 2012

Okay, I understand that the formal definition of "idempotent" is different than what the author means. What is the correct term to use in this case?

Edit: Next paragraph says:

  This is a very useful property in many situations, as it means that an operation can
  be repeated or retried as often as necessary without causing unintended effects.
  With non-idempotent operations, the algorithm may have to keep track of whether the
  operation was already performed or not.

"A change in state" would be an unintended effect I think.

saurik · on Aug 25, 2012

I think the best "term" you can use here is "side effect free". I almost wanted to say "pure" would work, but there is no real requirement that GET always return the same thing: it just needs to not change the state of the server in a way that a later GET could detect (although, honestly, you really only practically care about "find bothersome", and if there is a term for "bothersome side effect free" it would probably be from medicine and not computer science).

wtetzner · on Aug 24, 2012

"Idempotence is the property of certain operations in mathematics and computer science, that they can be applied multiple times without changing the result beyond the initial application."[1]

[1] https://en.wikipedia.org/wiki/Idempotence

firat · on Aug 24, 2012

What is the definition of "result". My point is that result in this case is a "change of state" vs "no change in state".

rhizome · on Aug 25, 2012

the result of multiple log out attempts will be "attemptor will be logged out."

ajanuary · on Aug 25, 2012

"Methods can also have the property of "idempotence" in that ... the side-effects of N > 0 identical requests is the same as for a single request." [1]

The HTTP spec clearly talks in terms of the effect of sequences of repeated operations, not in terms of the results of individual operations. The side effects of a single logout are the same as for 6 - you are logged out and whatever logout triggers exist are executed once.

[1] http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.1...

ashray · on Aug 24, 2012

CSS/JS - Make sure you load these externally so that the browser can actually cache them.

CDNs - Make sure you add a Cache Control (max-age) header to your CDN sync. This doesn't happen automatically through most syncing mechanisms. Helps you save on those pesky HTTP requests that cost $$$.

Gzip - Do not gzip images. It's not worth it. For HTML/JS - YES!

Javascript - If you have ads, definitely load them asynchronously (they go through multiple servers and take ages..). This is really important as you want your document.ready to fire asap so that your page is usable.

POST - Always redirect after a post request to prevent reloads causing re-submits.

Forms - always have a submit button for accessibility.

Usability - Try using your site with a screen reader, don't neglect vision impaired people. (there are apparently a lot of them!)

data-x attributes will destroy your W3C validator checks. Use them if that's not important. (sometimes it just is...)

For external scripts that use document.write go take a look at Writecapture. It's a document.write override which will make your external scripts asynchronous. (https://github.com/iamnoah/writeCapture)

I don't see why counts and pagination are such a big deal. Have done them correctly multiple times. Faceting might be hard though ;) It's a useful usability feature to show counts. (or atleast show counts when there is nothing - i.e. a zero count)

Those are the ones that I could think of right now. :) Great article, some good points in there!

rauar · on Aug 24, 2012

Redirect after POST is not sufficient. Think about users clicking twice quickly (by accident or intention) before the browser receives the redirect. CSRF tokens could help if in place, however disabling the trigger until the redirect arrives is better. Of course this does not solve double-submits using Ajax.

FuzzyDunlop · on Aug 24, 2012

Ajax or no, disabling the submit button/event target is pretty trivial compared to the complexity of doing the rest of your app. Just disable the button when it's clicked, or add a disabled class to the link and a separate click event for it that stops propagation. Or even simpler, just hide it. And do it before it gets around to sending the request.

rauar · on Aug 25, 2012

That's what I tried to say.

Regarding Ajax I just wanted to make clear that it's pointless to wait for the redirect... even if it's send back as response it would not cause anything like showing a different page afterwards.

cancan · on Aug 24, 2012

Author here. Glad you like it.

> data-x attributes will destroy your W3C validator checks. Use them if that's not important. (sometimes it just is...)

Are you sure that is still the case if you have the HTML5 doctype?

ashray · on Aug 24, 2012

It should work fine with HTML5 :) Just that HTML5 is not actually a proper spec yet..

eli · on Aug 24, 2012

Aside from possibly hurting the W3C Validator's feelings, does that matter?

ashray · on Aug 24, 2012

Some companies policies require validation and they require specs that are nailed down. In those cases you'd end up using HTML <5 and that won't validate with data-x.

eli · on Aug 24, 2012

I understand the allure of an objective way to evaluate the "quality" of your code... but that seems ridiculously naive. I'm pretty confident I could come up with something that uses features in the spec that nobody ever implemented, so it would be fully validated and correct and yet totally nonfunctional for actual users.

Andrex · on Aug 24, 2012

I'm curious why you wouldn't use <!doctype html>? Are you using something in XHTML that's deprecated in HTML5? Those are few and far between.

MatthewPhillips · on Aug 24, 2012

In that case, decline to work with said company.

batista · on Aug 24, 2012

Then the validator is not really a proper validator yet. It's 2012. HTML5 is a proper spec alright...

ashray · on Aug 24, 2012

HTML5 is a working draft spec. It hasn't been finalised yet. This isn't usually important for most of us but companies that like calling themselves ISO900X etc etc. don't usually want to work with draft specs.

http://www.w3.org/TR/html5/

It'll probably be finalised by 2014. I don't get the down votes, if you don't agree then why not ask/explain why ?

richbradshaw · on Aug 24, 2012

The downvotes (I didn't even know you could downvote here!) are likely disagreeing with the idea of validating as a talisman. "It validates! Yay, it must work!". As HTML5 formalises much of what already exists, it's hard to move to HTML5 and break things - HTML5 isn't just video, audio, canvas etc, it's also a more sane doctype, ability to omit attributes that only ever have one value (type on script elements for instance), ability to nest things inside anchors etc.

Also, as we all know by now, XHTML doesn't make any sense with the browsers that exist – particularly as it's rarely actually valid XML, and even rarer, sent with the right MIME.

In short, XHTML doesn't exist in any practical sense and HTML5 subsumes HTML4 + modifying the stupid bits to fit with what browsers actually do. There isn't really any logical reason to not use HTML5 syntax, though of course, using the new features can be problematic.

I understand that your logic for validating is likely your companies decision and not your own view, and I'm not attacking your values or opinions in anyway.

cancan · on Aug 24, 2012

That is how I feel too. Never had any issues with HTML5.

SoftwareMaven · on Aug 25, 2012

Redirect after POST so users can use their Back button. Nothing more annoying than a "resubmit?" button, especially if resubmitting is dangerous.

antonp · on Aug 24, 2012

> CDNs - Make sure you add a Cache Control (max-age) header to your CDN sync. You need to set both the Cache Control AND Expires header.

dkoch · on Aug 25, 2012

https://developers.google.com/speed/docs/best-practices/cach... recommends Cache-Control max-age OR Expires. Cache-Control (max-age) takes precedence over Expires. You don't need both.

mikegirouard · on Aug 24, 2012

> Trying to load JavaScript dynamically is a good idea but a lot of the time, it’s not worth the effort if you can keep your JavaScript to a sensible size and load it all at once. This also helps with consequent page visits being fast.

As long as we're talking client-side, I couldn't agree more. It seems no matter how much I try to make things "easier" with YUI Loader or some clever AMD + loader solution, it always turns out to be a headache.

mattacular · on Aug 25, 2012

Agreed. My new system is that all critical JS (eg. anything not related to ads, tracking, social buttons, etc.) should be loaded all at once with the rest of the DOM. Then there is a separate async/lazy-load track for that other crap.

heyitsnick · on Aug 24, 2012

> == is bad. Don’t ever use it.

Could someone expound for the ignorant?

bcherry · on Aug 24, 2012

Lazy response, excerpted from a blog post[1]:

One particular weirdness and unpleasantry in JavaScript is the set of equality operators. Like virtually every language, JavaScript has the standard ==, !=, <, >, <=, and >= operators. However, == and != are NOT the operators most would think they are. These operators do type coercion, which is why [0] == 0 and "\n0\t " == 0 both evaluate to true. This is considered, by sane people, to be a bad thing. Luckily, JavaScript does provide a normal set of equality operators, which do what you expect: === and !==. It sucks that we need these at all, and === is a pain to type, but at least [0] !== 0.

[1]: Post from my blog, but I'm not linking to it because the rest is not useful to answer this question and I don't want to come across as a self-promoting-link-whore :)

wingerlang · on Aug 25, 2012

It is not self promotion if someone asks for it. So how about a link.

logn · on Aug 25, 2012

http://www.adequatelygood.com/2010/3/Performance-of-vs-

ehsanu1 · on Aug 24, 2012

Read the first answer here: http://stackoverflow.com/questions/359494/javascript-vs-does...

rck · on Aug 24, 2012

If the two sides of the comparison have different types, JavaScript tries to coerce the values using rules that are pretty strange. The recommendation I've heard most often is to always prefer === and !==.

kenster07 · on Aug 24, 2012

Using minimal javascript - some webapps nowadays have the exact opposite philosophy, and there are client-side js frameworks which facilitate that

cancan · on Aug 24, 2012

yeah, definitely it doesn't apply to everything. part of the reason is that digg was mostly a read-only site and it had to be fast and it had to work. I'd not advise the same thing for a real "web app".

egfx · on Aug 24, 2012

I couldn't agree more. That's what I was thinking as I was reading it. News site, stuck, speedy but boring to some degree. The data-x attribute being an example rung home to take only bits of the advice in my situation.

on Aug 24, 2012

[deleted]

on Aug 24, 2012

[deleted]

thesmart · on Aug 24, 2012

...sorry. :(

dyscrete · on Aug 24, 2012

It's surprising a former digg developer describes pagination as hard. Really?

ralfn · on Aug 25, 2012

If you would browse 1-100, then 100-200, changes are you are going to see the same links twice, and miss other links, just because the result set had changed in between the two requests. And caching a snapshot per user seems a bit expensive.

Pagination is a mess on Reddit and HN, so maybe he considers pagination "hard to get right"., since no social news aggregator gets it right.

da_n · on Aug 25, 2012

Totally agree, pagination is painfully broken on HN.

bbsabelli · on Aug 25, 2012

It's not hard if you have no traffic.

nixarn · on Aug 24, 2012

I just tried ImageOptim on some PNG:s. I think it performs poorly :S

anatoli · on Aug 25, 2012

Make sure to configure it first, you pretty much want to max out all the settings (and enable all the different tools) otherwise you will get underwhelming results.

gamzer · on Aug 26, 2012

I am satisfied with the results of optipng which is free software.

I just run: optipng -o7 *.png

http://optipng.sourceforge.net/

mattacular · on Aug 25, 2012

"For example, scroll events fire only after the scroll has finished on mobile browsers"

This is not accurate at all for iOS Safari and Chrome... I just wrote some scroll-based events earlier this week and they work just fine.

There is some good stuff mixed in here but a lot of it is misleading, poorly defined, or just flat-out wrong. The most accurate stuff is extremely common sense like "staging environment should mirror prod" "don't use == (JS)" "don't use doc write (JS)" etc

anatoli · on Aug 25, 2012

No, he's definitely correct. See this Apple article for more information: http://developer.apple.com/library/iOS/#documentation/AppleA...

mattacular · on Aug 25, 2012

This article should be called: "Some web development tips from a former digg developer for developing a site EXACTLY LIKE DIGG"

Because most of this stuff is not applicable to webdev in general...

headShrinker · on Aug 25, 2012

I found many of the article's points valid for my applications. Not all tips apply to any ones particular situation but I, like many, am at a point were issues start popping up and its nice to see a general tool box of tips to pick from. Like any tips from anywhere "your results may vary. Consult a professional adviser before acting on any advice."

rhizome · on Aug 24, 2012

Call me snide, but I'm reading this title like, "E-commerce tips from a former Pets.com marketer."

jat850 · on Aug 24, 2012

Seems needless, as though it is easy and obvious to discount all technical knowledge because of an association with a once-popular, now-declined site that failed for reasons precious few to do with their technology or ability as developers.

rhizome · on Aug 24, 2012

I'm not discounting his knowledge, since Pets.com marketers presumably still have valid e-com skills.

cancan · on Aug 24, 2012

Heh, thanks. (op here)

polyfractal · on Aug 24, 2012

Actually, the Stanford ETL podcast with Tom Conrad (Pets.com, Pandora) was very insightful [1]. Just because a particular company failed does not mean they don't have good domain knowledge. In fact, I think I'd rather listen/read about failed companies and what they learned instead of successful ones.

[1] http://ecorner.stanford.edu/authorMaterialInfo.html?mid=2371

brown9-2 · on Aug 25, 2012

And in turn this comment reads like "I judge ideas based on their source rather than on their merit".

rhizome · on Aug 25, 2012

Not "based," but have you ever heard the saying, "consider the source?"