Hacker News new | past | comments | ask | show | jobs | submit login
We have broken HTTP (gadgetopia.com)
373 points by nkurz on Jan 11, 2015 | hide | past | favorite | 139 comments



We suck at error reporting to the user. One of the legacy curses of UNIX is trying to encode all error info in "errno". Then came HTTP, which most languages try to treat as something like a file access. When there's some complex problem at the remote end, the error presented to the user has often been hammered down to some errno value that's vaguely appropriate.

This came up recently on HN in connection with GoGo WiFi on airliners, using satellite links. GoGo doesn't have the satellite bandwidth to let users view streaming video, so they block YouTube and some other sites. The problem is telling the user what's going on. They used an awful hack involving a self-signed fake SSL certificate for YouTube to try to get a coherent message to the user.

According to the IP standard, they should send back an ICMP Destination Unreachable message with a code 13, "Communication administratively prohibited". Most IP/TCP/HTTP/browser stacks won't get that info all the way up to the end user, especially the "Communication administratively prohibited" part. Even if the error code made it to the user, many customers would not interpret "Communication administratively prohibited" as "No, you can't watch Youtube because you're on an airplane and we don't have the bandwidth for that".


On a somewhat irrelevant note, GoGo doesn't use satellite uplink. It uses terrestrial GSM links which are basically cell towers with antennas angled towards the sky.

These are obviously only available mostly during US domestic flights. For international flights, the in-flight internet is provided through satellite uplink but GoGo doesn't provide this service.


> According to the IP standard, they should send back an ICMP Destination Unreachable message with a code 13, "Communication administratively prohibited".

Agreed. However, I've discovered that even folks who really should know better don't know what to do with anything other that port-unreachable:

In the past, I tried setting my ip[6]tables REJECT targets to send back admin-prohibit. For... reasons[0], I was REJECTing IPv6 traffic to Google's networks. I discovered that YouTube on a Nexus device running Android Jellybean would not fail over to IPv4 but would, instead, sit and spin for multiple minutes before reporting a fatal connection failure.

[0] At the time I had a IPv6 tunnel that (for some reason) transferred packets destined for Google very, very slowly.


Why not just reroute those addresses via proxy to a static page explaining?


They're under SSL. All a reroute would do is pop up Chrome's bad cert page.


Hell, it would have been far better to use traffic shaping to transfer at (like) 1bps than to do anything else.


Was HTTP ever non-broken?

I remember trying hard to get URNs to work, and proposed them for one project. They required a URN resolver, which is entire new set of infrastructure.

I tried using path segment parameters (I believe the author refers to syntax like 'http://a/b/c/g;x=1/y' rather than '/key1:value1/key2:value2/'.) At the time (mid-1990s) the tools I used didn't support it, and I never came up with a need for them. Still haven't. The current spec (ftp://ftp.rfc-editor.org/in-notes/rfc3986.txt) says that the text after a ';' is a "non-zero-length segment without any colon" and that the contents are "considered opaque by the generic syntax." "[I]n most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm."

Regarding "404 (traditionally “Not Found”) means it was never here. 410 (traditionally “Gone”) means it was once here but is now gone"; the spec says:

> [404] The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

It doesn't say anything about 404 meaning "was never here", and in fact includes the possibility that perhaps the server doesn't want to give more information.


And indeed the spec for 403 (Forbidden) includes 404 as a valid return code if you do not wish to reveal the fact that there's anything there:

An origin server that wishes to "hide" the current existence of a forbidden target resource MAY instead respond with a status code of 404 (Not Found).


Which is precisely the default behaviour of IIS.


My problem with this article is not the general sentiment that developers should know the history of the technologies they work with, which I think is absolutely right, but instead how the author seems to put HTTP on a pedestal. It just happens to be the protocol TBL & company came up with in the early 90s for this thing that took off beyond anyone's wildest imagination. It's not some perfect work of art that should be revered. Also native apps don't make me sad. They're faster, more featureful, and have more consistent platform specific interfaces than their Web brethren. There are good reasons they exist and consumers prefer them.


This whole discussion reminds me of discussions about language evolution. Some people are prescriptivists and believe that learned language experts should determine what is and isn't correct usage. Others are descriptivists, arguing that languages are a living reflection of how they're used and that incorrect usage can become correct if and when it becomes widespread.

Similarly, we have standards boards, who are often very divorced from the realities of actually building real-world software and we also have the engineers building real-world software. Which version of HTTP is correct, the one codified by the W3C or the version codified by the engineers building Chrome, Firefox, IE, Apache and Nginx?

As a learning exercise, I implemented an HTTP server a few years ago and remember finding it fascinating that none of the popular user agents implemented the Keep-Alive header per the specification. And yet they all implemented it in a common, non-compliant manner. It made me realize that as much as I was using the spec as my guide, my implementation wouldn't be correct if I followed what it said for that header.


>Which version of HTTP is correct, the one codified by the W3C or the version codified by the engineers building Chrome, Firefox, IE, Apache and Nginx?

There is a reason why protocols are not languages. A protocol is a well-defined standard that engineers are supposed to follow. There really is no space for "descriptivists" in technology and especially in standardized protocols. What is that, the protocol is useless? Well, maybe we should come up with a new standardized one instead of bastardizing the already-existing one and creating a lot of confusion and ambiguity (which machines have a hard time to deal with).


Specs have ambiguities and bugs that may be resolved by standardizing implementations' behavior.


There is a major difference between the two: computers are dumb.

As a human, I can usually communicate with someone who uses a dialect or makes language errors. Computers are much worse at doing that, they basically need to know about every standard and non-standard behavior to correctly handle it. As such there is a very good reason for standards and thus standard bodies to exist.


I am guess that the apps referred to are the ones which have existed quite happily as a web page before moving to smart phones where for more likely business rather that user experience reasons they became apps. The ones I can think of are Twitter, Facebook, and LinkedIn.

I am certainly not a proponent of web as a platform for everything but there are certainly some things it is rather good at, and making content discoverable in a consistent manner is one of those. If you hid the content in an app it's rarely exposed.


This debate has been going since the days of AOL and CompuServe. As here, it usually misses the differences between public content, private content, and corporately farmed content.

Twitter etc are corporately farmed content, and they all provide web and app access. The access method is just a front end.

Personal content in apps isn't usually all that interesting. It's hard to think why anyone would care about my Angry Birds scores, or what level I'm on in Candy Crush. (If I played Candy Crush, which I don't.)

If you want to make an app that shares content on the web, it's not difficult. I don't see a problem with the fact that it's extra effort, and if you're going to do it you need to justify that effort in some way.


HTTP is different, though. Developers are constantly creating thousands of HTTP services. What other Internet protocol can you say that about? Protocols like DNS, FTP, NNTP etc tend to be provided by a small number of fixed services. Once in a while there will be a framework for a non-HTTP protocol, like Zed Shaw's Lamson SMTP server, but that's uncommon and I doubt it's widely used.

I can't think of any other protocol where even the most unskilled developer is given the tools and encouraged to determine how the protocol is used.


Agreed. Apps didn't break HTTP. They consume APIs (which can also be used by web applications). Thus, designing the API is what really matters.


Maybe the simpler explanation is that HTTP/WWW was always broken. It's never been a great application platform. It's popular and universal and simple (sortof), so that's awesome, but in terms of great tech it's kind of been junk all along.

It's not like it ever was universally conformed to. It's not like there was this golden era. What would be the golden era? The web before AJAX? Geocities and Hotmail? Frames broke URLS more than almost anything. PHPNuke forums? Web apps sucked back in the early days.

Makes me think of a fun Alan Kay quote:

> The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? The Web, in comparison, is a joke. The Web was done by amateurs. -- Alan Kay.


Yea, the specification of status codes is cumbersome and annoying to deal with, like you're supposed to go find a list of what they are supposed to be on wikipedia and then match it up to what's going in your application as best you can. Also the PUT, POST etc stuff. It's just a pain to try to match up your application's behavior to a crusty old spec that was never that good.

It's really refreshing to code with websockets and just write an application naturally. HTTP is good for GETing static web pages and that's about it.


Way to oversimplify. HTTP works ok for some super dumb use cases. But pray tell, what's the proper way to update multiple objects at once?

Simple verbs work for simple tasks - only. There is more to life than REST.


Updating multiple resources of the same type is easy: POST to the plural form of the resource.

If you need to update multiple resources of different types atomically, then you probably need a new resource abstraction that covers both of the resources anyway. Then just POST to that.


Sounds like the OP's problem is not so much the protocol itself, as the long tail of developers using it who have no idea what it can actually do.

There are developers who do use most/all of HTTP. I'm one. I think I've used most of the obscure features he mentions, including URL parameters and all those status codes (excepting 402 Payment Required, which I've never had a need for, but I do know it exists).

Most of the features he mentions aren't very useful for browser interaction, but really shine for APIs. If you're trying to write an HTTP API, and you haven't at least got to grips with all the status codes and methods, you should and take the time to understand them all and when they should be used. It's a real joy to use an API that uses methods and statuses properly. If in doubt, read the Github API docs - their API is (mostly) very good in this regard.


Web developer for less than a year here, so just offering my perspective on what "new web devs" know these days. I know there are status codes beyond 200 and 404, and work with maybe a dozen in regular rotation. I know there are probably hundreds more, but until they are in more regular use I think it would just inhibit clarity to use them (just like how using less precise language paradoxically makes your prose clearer sometimes). This is sort of a chicken and the egg problem, but the paradigm could definitely shift if the marginal usefulness is high enough.

URN's I had not heard of, but the article didn't make it clear to me what problem they would solve.

Of course I know about HTTP verbs PUT and DELETE and HEAD-- is there really any web dev who doesn't? And I am pretty rigorous about using the appropriate one (unlike with status codes).

I don't have an opinion on how apps are isolated from the web and don't have interlinks between them. Don't really use mobile apps myself except utility ones like Spotify and Netflix.


You should put a bit of effort into making sure you use the most appropriate status code.

If a client isn't already aware of the code, it decides what to do based on the first number:

- 1xx - continue receiving content

- 2xx - finished receiving content

- 3xx - follow redirect

- 4xx - client error occurred, show error page

- 5xx - server error occurred, show error page

So there is no harm in using better (but less commonly known) codes


- 1xx - Don't Refresh Yet

- 2xx - You Don't Need Refresh

- 3xx - We're dealing with Refresh for you

- 4xx - Don't Bother Refreshing

- 5xx - Refresh, You Might Get Lucky


5xx - try again later, if applicable.


> URN's I had not heard of, but the article didn't make it clear to me what problem they would solve.

It's a universal namespace, just like UUIDs, but with just enough extra sauce to make it theoretically possible to do interesting things with it. So 'urn:isbn:0451450523' might represent The Last Unicorn, and if you fed it to your Kindle it might enable you to buy it, and if you fed it into Nook ditto, and if you fed it into Project Gutenberg you might get a notice that it won't fall out of copyright until A.D. 3172, and if you fed it into your browser it might do any of those things or just look it up on ibiblio or Wikipedia.

Unfortunately, like many good ideas, it didn't take off because folks couldn't imagine a profitable use for it.


Looks a lot like the resource names on AWS [1]

<!-- AWS Elastic Beanstalk application version --> arn:aws:elasticbeanstalk:us-east-1:123456789012:environment/My App/MyEnvironment

<!-- IAM user name --> arn:aws:iam::123456789012:user/David

<!-- Amazon RDS tag --> arn:aws:rds:eu-west-1:001234567890:db:mysql-db

<!-- Amazon S3 bucket (and all objects in it)--> arn:aws:s3:::my_corporate_bucket/*

http://docs.aws.amazon.com/general/latest/gr/aws-arns-and-na...


It should. There's a reason for that. :)


What's the reason, and why didn't they just use URNs?


The reason is because they're based off the same principle. "ARN" stands for "Amazon Resource Name".

I don't know why they didn't use URNs. Probably because nobody else does.


I think the implication is meant to be "they look like URNs because they are".


Except that they're not; rather, they're ARNs. URNs must begun with 'urn:'. It doesn't make a lot of sense that they're not URNs, but…they're not.


My guess: "nobody uses URNs generally, so might as well save four bytes off every one?"


> I know there are probably hundreds more

Actually there aren't. There are like 60. It really shouldn't take too long to read over them: http://en.wikipedia.org/wiki/List_of_HTTP_status_codes


> URN's I had not heard of, but the article didn't make it clear to me what problem they would solve.

The idea of a globally unique, permanent identifier that would resolve to a URL to indicate the current location of a resource is fairly commonplace, isn't it? We've tried to bolt all sorts of equivalents onto the existing URL infrastructure without them.


Do you know where to find the canonical list of HTTP status codes? If the answer's "yes", you're probably OK.


Yes... Wikipedia.


To the downvoters: sure, Wikipedia is not the canonical source (that would be IANA's list[0]). But Wikipedia mostly gets it right and also lists some of the "non-standard but can be encountered in the wild" ones.

I actually prefer the Wikipedia page[1] exactly because it includes more than just the canonical subset. For example, 451 is not yet official, but pretty much as close as you can get to a standard without having an actual standard (well, there's a draft).

[0]: http://www.iana.org/assignments/http-status-codes/http-statu... [1]: http://en.wikipedia.org/wiki/List_of_HTTP_status_codes


> Of course I know about HTTP verbs PUT and DELETE and HEAD-- is there really any web dev who doesn't?

Yes: I've interviewed some. Very few realise there's any more than GET/PUT/POST/DELETE, which astonished me: surely everyone's left an HTTP debugger running while updating an SVN repo or browsing a WebDAV share?


Most of the good stuff here (using the full range of HTTP verbs and status codes, idempotent GET, stable URLs) is already common knowledge among web developers and built into any modern framework.

URNs are interesting but difficult to work with in practice; 402 Payment Required has been "reserved for future use" forever. Then there's the suggestion to use custom HTTP methods, which is ill-advised [1].

[1]: http://programmers.stackexchange.com/a/193826


Google cache:

http://webcache.googleusercontent.com/search?q=cache:0PUMGvU...

(Server is not responding at the time of writing, from two locations.)


Well, he does claim that we have broken HTTP. :)

I think the situation is improving and serious developers are using a broader range of HTTP features. Proper status code and verb usage has become very common with the proliferation of RESTful APIs.


> Do you know why your FORM tag has an attribute called “method”? Because you’re calling a method on a web server, like a method on an object in OOP. Did you know there are other methods besides GET and POST? There’s HEAD and OPTIONS and PUT and DELETE.

Except HTML(5) forms do not support any methods other than GET and POST.


A form is for user input and should always be "posted" through a gateway or handler on the server side. A PUT would imply that the payload is accepted as is. Also a PUT implies that a form cannot be sent twice to the same URI after the resource on the server has been created. So PUT is basically a nightmare for user interaction.

A DELETE has no payload.

OPTIONS and HEAD make no sense for a browser, because there is no information to show to the user.

So that leaves us with POST and GET. As I already stated, POST is the best form to send user data to a server. And a form with GET is just an URI builder.

Edit: Well no, that does not leave us with POST and GET, because there are also CONNECT and TRACE. But who wants to concern a user with those?


Yes, because a form is a collection of input fields, and it makes sense to POST input data to a resource. It does not make sense to say attach a collection of input fields to DELETE or OPTIONS request.

You can still use other HTTP methods from a browser (using XmlHttpRequest), just not through a form.


With that logic, why do forms support GET then?


I am of course guessing here. It's possible that I'm completely wrong since I've never been in the working group and don't really know anyone on the working group. But given what I've read publicly this seems like a likely explanation:

Because not supporting Get would break a lot of current web applications and html5 had a pretty clear goal of being mostly backwards compatible. The previous form element didn't restrict the method and GET and POST are used everywhere. Restricting GET would have caused too much confusion to be worth it.


GET is the most appropriate method for many forms, eg. search interfaces. It is not just a question of backwards compatibility.


GET in forms is useful for query interfaces, eg. a search engine. A search should be safe, idempotent and all parameters encoded in the URL. GET is the correct choice for that.


Because input fields aren't necessarily input, they can be query parameters as well. A GET form is a query builder.


There was a time when GET forms were pretty common for search interfaces. It still works fine, but I think the sentiment against 'ugly' URLs mostly wiped them out.


Aren't GET forms still the default choice for search engines like Google? Otherwise you couldn't bookmark a search.


I always found this extremely odd. Does anyone have a satisfying answer as to why this is the case?


Because nobody uses them and supporting them can cause more problems than they are worth.

The author was ranting about unused features, but not all features are good ideas. Sometimes things that look good on paper don't work so well in practice.


> Because nobody uses them and supporting them can cause more problems than they are worth.

Can you substantiate that? Why shouldn't I be able to POST a new resource to a server and update an existing resource with PUT using Forms? Why shouldn't I be able to delete a resource using a DELETE method using a Form?

Doesn't make sense to me.


Its not a good idea for HTML to support REST?


REST-the-actual-paper is a monstrosity that, thankfully, no-one actually uses. "REST" as a name for the practice of using simple JSON and using existing HTTP features (methods, status codes) rather than using HTTP as a dumb transport layer is... an ok idea, but overhyped. The supported HTTP vocabulary is very limited, and adding new verbs like PATCH is a multi-year process that's never going to be well supported by every existing piece of software. So sooner or later you're going to have to layer a more complicated (business-specific) semantics over HTTP, and have e.g. a POST that actually has semantic meaning beyond being a POST. Given that, if you just used POST for everything, and had a simple encoding of method kinds and error responses over that... would that be so bad?

Those who forget SOAP are doomed to recreate it, badly. I worry that things like JSON-schema are headed in that direction.


REST has nothing to do with JSON. How dare you saying that no-one uses while you're using the world wide web every single day?


Have you read the paper that coined the term REST? I challenge you to find a site that actually works the way that paper describes.


Absolutely. If you ever used a web browser (one that understands HTTP and HTML), clicked a link on any website, or submitted a form, you've most likely benefited from REST. Here's an example: https://news.ycombinator.com


So this site, among other things:

* Until recently, used stateful numbers for all links * Uses the same location for different resources (e.g. https://news.ycombinator.com/item) * Serves the same resources at multiple different locations, rather than redirecting you to a canonical location. * Does posting, editing and deleting via POST, not CREATE/PATCH/DELETE * Does approximately no content negotiation * Does not support non-browser clients at all, which, well I suppose you could make a perverse argument that that is HATEOAS.

In short, it behaves not at all like the system described in the REST paper.


HTML is a hypermedia format. REST is an architectural style that requires hypermedia in order to work. Hypermedia formats have factors (http://amundsen.com/hypermedia/hfactor/) that make them more suitable for some use cases and less suitable for others. HTML alone does not have idempotent updates (PUT, DELETE), but as REST allows for "Code-On-Demand", we can augment the user experience with Javascript, and it allows PUT/DELETE (with XMLHttpRequest).


HTML is designed for REST. Which is why a form does not support arbitrary HTTP methods, but only the methods which semantically makes sense for a form.


Many web frameworks will allow you to fake PUT, DELETE, etc. via POST requests (usually using a special "_method" parameter).


Which is in contradiction with the HTTP spec. Because who lets a user just add and remove resources without using a process or handler to handle the user input?

I would not use such a framework, because it did not understand HTTP and HTML specs in the first place.


> Because who lets a user just add and remove resources without using a process or handler to handle the user input?

I'm not sure what you're talking about:

1) It's user input and the user can craft a DELETE request just as well as add a _method=DELETE to the form body.

2) Most frameworks use the _method parameter as the http method, meaning when routing or when the application asks for the method so as to take specific actions. No general framework just deletes or creates stuff for you (unless you specifically set it up to, and even then you can normally add an authorization check hook).


I may have misunderstood the point you tried to make. Let me try again to make mine.

The reason HTML uses an enumeration ("get" and "post") and maps those to HTTP methods ("GET" and "POST") is that they have special meaning for user interaction. With "get" a form becomes an URI builder. With "post" the user can send data to the server. You can call it a coincidence that the values of the enumeration match the HTTP methods. The reason a "post" form is mapped to a HTTP POST is that a server should always process user inputs - even if the behavior of your application results in deleting a resource.

So yes, okay, the frameworks behave correctly, because they process user inputs. My point was to question the intent or reason behind a "_method" parameter of those frameworks. If such a framework invented the parameter because it felt limited by the HTML specification, it did not understand the specification. If it is about automating resource management, then I'm fine with that.

BTW. which frameworks do you mean? I know none (never saw that in Spring Web or WS RS implementations like Jersey).


Where we disagree is about the meaning of method in a form tag. Since it's called method and since the options are a subset of HTTPs methods, it seems natural to assume that they are related. Missing http methods does seem limiting with regards to a restful interface.


That would be one in a long line of quality engineering decisions by the HTML standards committees.


Why do you imply that the committee made a bad decision? Have you read and thought about the HTTP and the HTML specs? I guess not.


> Anyone working in this space 20 years ago couldn’t possibly have known of their problems so every problem deserves a new solution.

It's worth noting that sometimes the reason for developing a new solution is to be free of the baggage that comes with all the failed attempts on the old infrastructure...

e.g. JSON-based solutions over the many attempts to use XML (XML is a capable base, but many find it unwieldy to say the least)


Well, I'm sure the creators of JSON weren't thinking of XML use-cases at the time. When it comes to solution-agnostic, self-describing, self-validating semantic markup, I'm not sure JSON-LD buys you much over RDF serialized to XML (for example).


I can speak for myself and say XML died for me when it was decided that the standard interface to it would be stream parsing. I found that stuff incomprehensible. I get the perf benefits, but it was always premature optimization for me.

You throw a JSON string at a simple parser and get an object back. That's why it won a place in my heart. Nothing prevented xml from being that except that's just not what the lib authors did. There was so much talk about XML being the One True Data Structure I think it ended up being a jack of all trades, master of none.


The problem with XML was that people tried to make it semantic. You had XML Schema (awful), and because of schema you had to have namespacing (also awful).

Take that away - plain unnamespaced tags, with no semantics beyond "this is a character sequence", common-sense business-specific tag names, and no schema - and you get something that's just as nice to work with as JSON is. And there are plenty of libraries (e.g. Jackson) that will serialize back and forth to objects. But too many people drank the semantic kool-aid and tried to autogenerate classes from schemata or other such silliness.


In my experience as interviewer, the major part of web developers doesn't know nothing besides URL in the address bar is a GET and form submission "must" be POST... Headers? What is this?! It's sad...


I hope you're wrong because I find it extremely discouraging that a major part of web developers would know nothing (or at least so little) about HTTP methods. For now, I'll take your post as anecdotal evidence until a proper survey has been done...


Who's supposed to be teaching them, though? If they don't know this stuff exists, how are they going to know to look for it? I learned it because I was always a complete RFC nerd, but I know that many developers are not like that.


Any web developer that is writing not just front-end code but also backend services or an APIs should have come across HTTP status codes and know the reason for their existence. I came across this quite early in my programming career when writing a very basic http service (it's not that long ago, I've been programming for about 10 years) and the first thing you learn are that your HTTP response will usually have headers, a body, and a status code.

In my experience, though, when working as a front end developer I've had to deal with developers (mostly in the .Net camp) who will gladly return errors with 200 OK and an xml/json object with the typical "StatusCode: xxx", where XXX is some made up error code that you'll need to reference in their application.

I believe this is because many backend developers are in web development by accident, and don't really care much about the web as a platform.


This intrigued me:

> Don’t even get me started about URL parameters. No, not querystrings – there was originally a specification where you could do something like “/key1:value1/key2:value2/” to pass data into a request.

I couldn't find a specification, but I did find [1] which led to [2], both of which indicate that there's an idea of parameters within the URL hierarchy itself. It seems to me that it could be useful for API versioning at least.

As to the article itself, I completely agree. The problem is, though, that there's just not enough incentive for folks to actually use the Web standards suite the way (I want to believe) it was meant to be used. How much would cost to make a fully-semantic, JavaScript-extended (but not -requiring), machine-usable website which is also adaptive and beautiful and appealing to humans? And how much benefit would one realise?

Rather than do things the Right Way, folks just hack through and do it A Way, and get on with life. I wish that it weren't so, I truly do, but it is. At this point, railing against it feels like railing against the tides.

[1] http://doriantaylor.com/policy/http-url-path-parameter-synta...

[2] http://tools.ietf.org/html/rfc3986#section-3.3


API versioning should be done with a HTTP Header.


Wow. Dublin Core. Now that is a name I've not heard since around a decade ago.

It is amazing to look back to my own development history and remember that I too used to care about proper status code, using metadata elements, and the like.

But after adopting so many, and seeing so many either fail to catch on, or infighting tear them apart, or the big names pick them up and then put them aside ... it's not surprising that current developers don't think about them or don't care.


For the most part, I say "1000 times yes!" I love the general idea he expresses that developers MUST take the time to understand the giants on whose shoulders they stand. To that end, I have one noteworthy point (more of a clarification) I think is worth making:

> And, technically, you’re supposed to make sure GET requests are idempotent, meaning they can be repeated with no changes to a resource. So you should be able to hit Refresh all day on a GET request without causing any data change.

I think it's important to note here that idempotence does not mean that the resource will never change. It means that the resource will not change as a result of a get request (barring some meta requests, for example `GET /my-last-http-request`). A resource can definitely change. I think the author would agree with me but found his language there a bit ambiguous. Allow me to quote the good reverened himself, Dr. Roy Fielding:

> "More precisely, a resource R is a temporally varying membership function MR(t), which for time t maps to a set of entities, or values, which are equivalent. The values in the set may be resource representations and/or resource identifiers." (emphasis mine) [1]

Immediately prior this quote he references "the weather for Los Angeles today" as an example of a resource complete with its own resource identifier (perhaps GET /weather/la/today, most-to-least specific). The contents at the end of that URI will most certainly change, but they shouldn't ever change based on the request.

[1] Section 5.2.1.1 https://www.ics.uci.edu/~fielding/pubs/dissertation/fielding...


To hit the most important point of what you mean by "change based on the request", this doesn't mean "changing your request changing the resource you get" (which wouldn't make any sense).

It means that the act of GETting the weather right now will never change the weather. (If you want to change the weather via HTTP, you'll need to do it with a method like "POST /weather/hill-valley" and a body like '{"rain": "stop"}'.)

Think of it like a getter function that gets syntactically hidden by the language: you wouldn't want "if (user.accesses < limit || user.accesses > threshold )" to have two different values for "user.accesses" based on the number of times you referred to it.

In a less ludicrous example, it also means "GET /tokens/reset/dsf46JtCb385PDs2" shouldn't log you in and invalidate the token, something that most password-reset pages fail to comply with.


Yes, thank you. Every framework I have ever used maintains this distinction between GET and POST. It's common knowledge.


Amen to this:

"My point is that a lot of web developers today are completely ignorant of the protocol that is the basis for their job."


What a ridiculous statement. Forgetting about the false hyperbole of "completely ignorant," HTTP is no more the basis of my job than the mixing of asphalt is the basis for a trucker's job. The basis of my job is making electronic tools for people to help them accomplish tasks. I would do that with or without HTTP (which I'm knowledgeable in thanks).


It's more like a trucker that only knows a couple of basic road signs.


I largely understand what he's saying, in terms of misuse of the protocol. I think you're going to find those sorts of things being the case for any toolset, though. Given enough time, and enough developers, you'll eventually run into people using it an entirely different way than you initially intended, regardless of the specifications or documentation written around it.

One thing I'm not sure I agree with, though, is the mobile application "linking" he mentions. When have non-web applications ever linked? What of all of the old "desktop applications" that everyone used to use? Those don't typically link, why is there a different expectation for mobile applications? Oh, and most mobile apps do support linking (in some sense, though it's not really a primary feature). Perhaps I'm misunderstanding what he means with this.


I work at Branch (referenced in the NYT article) which tries to make it way easier for mobile devs to leverage better deep links to stuff in their apps.

Making the analogy to desktop apps is fair, but I think the better question would be why not? The URI scheme registration mechanism in Android and iOS allow the dev to create a new OS level namespace for his or her app and all 'pages' inside. For example, http:// is the scheme for browser app based addresses just as venmo:// is the scheme to address all payments or user profiles in the Venmo app. Why not leverage this technology to make app content more open and easier to access?

Now, if we could just get rid of the app store enforced install process and have apps automatically retrieved and cached, things would be a lot smoother...


> Now, if we could just get rid of the app store enforced install process and have apps automatically retrieved and cached, things would be a lot smoother...

I totally see what you did there ;). http://xkcd.com/1367/

I feel the lack of linking in apps is done partially because of laziness, and partially on purpose. Most apps that are neither games nor utilities seem to be made to control the viewing experience and/or help the authors trick the users into parting with their money. Increasing interoperability seems to be counterproductive if the only reason for your app to exist is to earn a quick buck from the less savvy users.


We just discovered that one a few weeks ago - It was then we realized that we had just dedicated our lives to the premise of an xkcd comic

I disagree with your last comment as you could make the same argument for websites in general - there will always be spectrum of usefulness/scamminess. Utilities or games might not have the concept of 'pages' as a news app or a social photo upload app, but imagine being able to link to specific constellations in the star chart app, the various calculator faces of some popular calculator apps, or a user's city in Clash of Clans. It's going to be awesome.


"imagine being able to link to specific constellations in the star chart app, the various calculator faces of some popular calculator apps, or a user's city in Clash of Clans. It's going to be awesome."

I'm imagining it, and in most of those cases I'm imagining the horrible fragmentation clusterfk that will have to be solved in Yavascript - linking to "some popular calculator apps" (plural) means either providing a mess of links for the user to decide, or somehow guessing which one to show.

Two alternatives that both land in the "NAH GANNA HAPPEN" category:

* "everybody agree on a standard prefix for basic things like calculators" HAHAHAHAHAHAHAHAHAHAHA

* "allow client JS to detect installed apps" see also the detect-visited-sites-with-CSS info leakage for why this would be terrible...


     "everybody agree on a standard prefix for basic things 
     like calculators" HAHAHAHAHAHAHAHAHAHAHA
That's a URN. Unfortunately they never really got adopted. But thy solve this exact problem.

urn:calc:rpn

You click the link and your mobile phone then looks to see if any of your calculator apps satisfy that urn.

If they find any they open it up or present the user with a choice between them or open the one the user configured as their choice.

If they don't find any they check the app store or web and offer the user the opportunity to install one that satisfies that urn.

Actually this is exactly what androids intents do as well minus the app search and install. But I don't think they use urns under the hood.


I was actually going to reply something along the same lines. These sorts of services exist on both iOS and Android (can't speak to Windows Phone). Not for the provided example, but for Android, the Intents like you mentioned, and iOS has a similar concept for things like "Apps that are capable of routing/mapping" and possibly other things. Neither of these are done through the example "standard prefix," but both are through by adding some standard permission or key to your app.


>I totally see what you did there

...except the main cache on a web browser is nothing more than a speed boost, and cannot be relied on.



I'm familiar with the linking capabilities of apps, at least for iOS (haven't played with Android). I think it's a bit limited, as it currently stands, but I feel like the intent of the url schemes is somewhat different from the idea of linking on the web. On the web, you typically link to open content, whereas I feel like apps tend to be more personalized; seems like linking wouldn't work quite the same. Interested to think more about that, though.


I think he is referring to apps that only show content, like a news-site reader or something along the lines of that.

However, most of the apps I've seen have a Share feature which one can use to retrieve the link.


Desktop apps did link. Windows had OLE. Old versions of KDE had Kparts.


I'd say the reason why following spec on status codes has fallen out of favor is because browsers don't do anything interesting or informative with them for most users.

On APIs, I'd say it's a sin to not return proper HTTP headers, but when the end user on a web site sees a File Not Found that's 200 Status OK versus File Not Found that's 404, it doesn't really matter.

And while it matters for things like the Googlebot, over the years developers stopped caring, because delivering an attractive and descriptive status page was more important than delivering an error status and not rendering a page at all.

I'm not saying it's right, but it was largely a ornamental step in the early days of the Web, so I understand why it disappeared. If browsers proactively provided data to users about HTTP status codes, I think they'd be adhered to much more.


Returning a 404 status code and displaying an attractive and descriptive status page are not mutually exclusive. If you go to http://www.google.com/unknown, you will get a 404, but you will also see a custom page.


Absolutely, I don't disagree. I'm saying that for most end users, there's no distinction. Status codes are for machines, but if early browsers did something noteworthy with those codes, developers wouldn't have been so quick to dismiss them in a way that is now largely habit.


Ironically, the URL of the post is http://gadgetopia.com/post/9236?retitle


And?


It doesn't exactly follow the recommendations in the second link from the post: http://gadgetopia.com/post/6346


Yes and that is why the linked post and this post itself are bad. Both authors do not understand the principles of an URI in the context of the web.

To a client, the URI should be opaque. If a client has to start to make meaning of the contents of an URI, the server loses the ability to change its implementation (or just move resources around).

Edit: Ah, its the same author for both blog posts. That explains it.


From the same page:

(And yes, I know Gadgetopia’s URLs are somewhat lame in this regard. There are reasons why I did it, but changing it now - See more at: http://gadgetopia.com/post/6346#.dpuf)


I had to chuckle at this, its true of course that http as a standard is greatly abused and it is one of the big reasons that writing a web crawler is not just a matter of calling GET again and again.

In providing APIs for third parties that are rate limited, our APIs return a '509' error to tell you that you've exceeded your query rate. Which if you just sleep for a bit and then send a request you'll get an answer, but no, as a 'soft' error too many developers just retry it immediately which is useless. Some folks want only 3 errors, 'ok', 'dead', or 'try again' and can't deal with any more subtlety than that.


I love implementing stuff like standardizations. Seriously, I love reading documentation that has lists of codes meant to concisely summarize specific frequently occurring events. I am the biggest nerd when it comes to pouring through documentation that was designed specifically with the intent to unify a common need that started off fractured.

In general though, it's just hard to know exactly when and where to implement what standardization. It's not just HTTP. The entire world of programming and computer science would be much more elegant with intelligent standardization, but it's literally a hard problem. The source of information is the hard problem. It is subject to change.

How can you standardize something that is consistently evolving, evolving to the point that it is expected that the functionality of the system is going to exceed all known expectations of it's operation and behavior? As developers, we are consistently striving to exceed our own boundaries, to do things better, more optimally, more safely, more legibly, more precisely, more correctly. Our expectations of ourselves is what screws us when we think about 'the big picture', because we apply the same model of reasoning to 'the big picture'.

Yes, we do not know what we do not know. However, we additionally do not know how to determine the difference between 'that which exists that we do not know' and 'that which does not exist that we do not know'. That is the core problem that lays underneath the lack of implementing standards, and it the reason people standardize things in the first place. Standardization prevents us from re-inventing the wheel. Standardization makes things safe. Standardization makes complex systems predictable even when information about adjacent systems are unknown. Standardization is a vehicle for implementing trust.

People get confused because there are too many words for the same thing - too many ways to describe the same things, too many languages to do the same thing, too many explanations meant to teach the same thing. Either we have to learn the hard way (by observing massive failure, or by experiencing it), or we have to hope we are implementing things the most correct way, and trust we will get there.


I feel like the ignorance of developers here is overstated. Most APIs that I know of make good use of the HTTP status codes. This includes APIs from big players like Google, all the way down to small startup APIs. Sure I've seen bad ones, but I don't know that I would agree that its the majority? Maybe I've just had good luck or lived a blissfully unaware existence as a developer. Hard to say.


This is just symptomatic of a saturated market. Between Google and apple there are over 2.5 million apps and probably over a billions website or something like that, all of which are using HTTP(S). On top of that, the market for developers is exceptional, because of this people are flocking to it by the minute. They are all excited to build a bridge without taking a class in engineering.


One of my personal pet peeves is the misuse of the 401 Not Authorised status code being issued to logged in users. No, try a 403 instead please.


What a pointless rant. Things evolve, even norms and specs.

There's a reason why URNs are not used in the code, there are not enough by themselves, if you need use URIs.

There's a reason people don't use every status code in every webapps. It's a waste of time, better spent elsewhere.

Most webapps uses more then GET.

Etc. Anyway, HTML is not broken, it's used everywhere by billions of users every seconds.


Perhaps if people understood better the technology that allows their business to operate, they wouldn't waste time versioning APIs and having to re-write clients (web, mobile...) every time the server changes something!


You can't expect everyone to master everything. It's like ranting about how the whole classroom haven't got A+.

It's pointless ultimately.

The real question is how can we improve a protocol so it's better used by most people. Hence why it's evolving. Nothing should be set in stone forever and then expect everyone to comply to it or die.


Agreed. This guy needs to take a break from the Internet for a few years.


Indeed. It is the internet where everybody can write about something, even if he/she is (mostly) wrong. And posting it on HN or Reddit does not make it a better blog entry.


You know what really bothers me? It's not the abuse of HTTP status codes. It's the closed nature of apps.

How many of us got our start in web development by viewing the source of the page? By looking at how others did things? Hell, I still do that now.

Apps are the Microsoft of the Internet -- closed, proprietary and secretive. Open source and github has helped this some, but meh.


> ...developers tend to think they can solve every problem, and they’re pretty sure that nothing good happened before they arrived on the scene. ... they feel no need to study the history of their technology to gain some context about it.

So, so true. In fact, this happens depressingly often when "technology" is defined as narrowly as "the app a developer is working on" and "history" is as short as "a month ago".

This has always happened, but it feels like it's gotten worse since "agile" became widespread. As if the principle of YAGNI ("You Aren't Going to Need It") has been turned into a poisonous assumption that nobody before you could have put any thought into the future. Rather than trying to understand the natural extension mechanisms that might already be in place, people just jump in and write more special-case code.


The current state of the web is the result of too many cooks in the kitchen. Nobody is ever willing to stand up and say no to countless new additions that often simply replicate old behavior, or don't add anything compelling or likely to ever be used by others. We even have outright gags in the official specs, like the 418 response.

It's already too late to make HTTP something sane. What we need are developers who understand and appreciate simplicity and minimalism, to make something awesome with a chance of actually catching on. (Please don't xkcd/927 me.) And they'll have to be fierce in protecting it against third-party extensions (which may not even be possible with something this popular.)

That's unlikely to happen, so I guess we'll just keep playing this perverted game of Jenga ad infinitum.


Ha, I had to look up 418.

418 I'm a teapot (RFC 2324) This code was defined in 1998 as one of the traditional IETF April Fools' jokes, in RFC 2324, Hyper Text Coffee Pot Control Protocol, and is not expected to be implemented by actual HTTP servers.

http://tools.ietf.org/html/rfc2324


Sorta related, The HTTP2 Error Code of ENHANCE_YOUR_CALM strikes me as a joke because of the name...

http://http2.github.io/http2-spec/index.html#ErrorCodes

A quick Google shows that it's previously just been used by Twitter to demonstrate rate limiting.


You might also appreciate: http://en.wikipedia.org/wiki/IPoAC



I've heard a couple of times that requests coming from JS "need to get a 200" in order to see the response data at all. Now, I've never worked with JS myself, but to me this sounds more like a badly-designed library problem, but it keeps becoming an issue every time I push for an API without broken error codes.


This is not the case with the great library superagent[0], which I can confirm with a little test case I wrote.

I have a server that returns a 401 when the Authorization header is missing, but I also like to return a JSON body with a little more information as to what went wrong (ie. the header is flat out missing or the credentials aren't being sent in the correct way, with a link to the documentation), and superagent allows you to output the `response.body` in the `end()` method callback.

I think of superagent as the requests[1] of js. In fact, I usually require it under the name requests as they feel so similar.

0: https://github.com/visionmedia/superagent

1: http://docs.python-requests.org/en/latest/

EDIT: link formatting


Just looked this up and you're right: in Firefox 3.6, there's a bug where the XMLHttpRequest response body won't be populated by Firefox if the status code is 400. Seems like a little-known bug that has been vaguely referenced elsewhere.

http://bugs.jquery.com/ticket/7868

http://stackoverflow.com/questions/5884286/jquery-ajax-with-...


That's a regression from 4 years ago. You can get the response body in any status code in JavaScript.


> requests coming from JS "need to get a 200"

I'm not sure what this is, but it could be some sort of expectation where all responses are expected to get a 200 along with wrappers like this:

    {"status":"error","reason":"not found"}
    {"status":"success","response":{...}}
And client-side devs using jQuery (for example) correspondingly expect to handle error conditions in the "success" callback instead of the "error" callback.

When someone used to doing things like the above talks to someone more accustomed to RESTish ways of doing things, quite a bit could get lost in translation and communication often completely breaks down.


All of the commonly used libraries have support for any type of status codes; however, in JS world, you need to have two functions to handle a response, one is for success and one is for error. The developers you have worked with either didn't know that, or was lazy.


I liked the author's comment about using canonical URLs (with a link to another of his articles on this subject). This is something that I have been intending on doing for a while - I just pushed changes supporting canonical URLs to my main 3 web portals.


HTTP is too flexible not to theoretically break but in practice it can be used and abused and still deliver.


As a protocol, it works perfectly well for transporting hypertext, what's the problem?


Implementations.


It sounds like this guy just learnt about HTTP details.


HTTP comprehension and usage are critical. People wonder why websites don't perform and api are difficult to program for. I have seen that developers have this weird need to create your own method and means of communicating errors. Even ws-* developers don't use soap errors. That is what we are talking about mostly in this article non-standard messages/state.

Treating your application and data as resources (REST is the style of the web) and leveraging the power of HTTP so powerful. You are not a crabby old man. If you do count me in, 33 years old. I could make a career consulting and only fixing http and resource abstraction/usage problems.

IMO though educating our fellow developers on HTTP is easy compared to the REST HATEOAS constraint.


> I have seen that developers have this weird need to create your own method and means of communicating errors.

Developers ignore most of the status codes because they introduce unnecessary complexity on both ends (client and server) and do not always map well with the application logic. With JSON payloads, HTTP is used as a "light vehicle" to carry the application logic. Since the payload is going to be specially interpreted anyway, I don't see a benefit in cramming legacy HTTP cruft in the headers and in status codes.

The really annoying thing is the web server spitting out HTML on error conditions instead of the expected JSON.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: