Sharefile is not even a HTTP API (since it doesn't use HTTP methods correctly).
For security purposes, authentication can be further increased by a POST of the "username" and "password" through the HTTP Headers as individual headers instead of the query string.
1. Passing creds via GET will show up in server logs.
2. The way REST works, you don't want to put anything in a GET which will change anything on the server (which includes generating an auth token). POST is the standard here.
edit It's better that they support POST for the auth creds in a header, but they should still be using HTTP standard authentication methods and return a 401 if unauthorized.
Also, look for already complete solutions for what you want. The whole point of REST is that you don't need to create all these query string parameters or put stuff in headers - the HTTP specs already include most of this functionality. For example HTTP already includes several authentication mechanisms[1], as does TLS[2], which are more secure (and standard!) than a form-based login.
Why should that particular book be the bible? I haven't read it, but I've flicked through it and seeing things like the suggestion that clients should "construct" URLs made me doubt its worth.
Shouldn't Fielding's original thesis be the bible?
Ok, maybe bible was used to loosely. In my experience, I found it a well written, well balanced, useful primer for real-world development of RESTful web services that also has enough depth to make it a reference I've returned to since. That's better.
You have no way of knowing what they log and don't log. If their server logs are compromised, you should be assuming their username/password database was as well.
And HTTPS requests are encrypted. The whole request, including the "GET /someurl&password=s33krit HTTP/1.1" part. As I said, using POST doesn't add any additional security to this.
I don't understand the point you're trying to make here, or the stridency with which you're making it. You in fact do know that the most popular web servers do log the URL, and do not log POST parameters or individual headers.
That's why professional security audits will ding you for putting anything sensitive in a URL.
Actually, we ding you for putting sensitive information in URLs that are used in a browser. The reason is that it will then be sent to other sites in the referrer header. When it is used for an API it doesn't matter. A security audit will check that logs are not logging sensitive information or that they are properly secured and encrypted if they do contain such information. The combination of telling users to put their password into the url and logging the url would be a problem, but not either thing on its own.
No, SomeOtherGuy. This is an HTTPS site. The referer header does not behave the way you're implying it does on HTTPS sites.
And having read and written this particular audit line item about 29074894389734897 times in the last 15 years, let me assure you that logging is clearly an issue.
The bit about how "if someone can see your logs you're already boned" is also message-board-logic more than reality. Logs are shipped all over the place. On a network pentest, one of the things you do when you pop a box is hunt for logs; even after you get root on the machine, you don't automatically have all the passwords that get stuck in the log files. Same goes for all the random LogLogic-style consoles those logs get fed to.
If you have more questions about this stuff, my contact info is in my profile --- just click my username above this comment. I just wanted to chime in to say that 'carbocation was exactly right, but my comments are now repeating themselves, and so I'm done on this thread.
I agree with your concerns. But my comment was talking about "typical server logs" and HTTPS-POST vs HTTP-GET, while yours is addressing different issues.
I addressed why what gets in logs doesn't matter: if their server is compromised you have to assume you are boned anyways. And I don't understand why is your comment would be talking about "HTTP-GET"? The API in question is dealing with HTTPS for both GET and POST requests.
I don't understand why your comments pointing out details of the http spec and common-sense security considerations are being downvoted. I guess they are coming across as overly 'stident'? Anyway, I've found them helpful. Thanks.
If you're using SSL then form data in a POST request will be encrypted. HTTP headers are always encrypted using SSL. What wasn't clear to me from the documentation is whether the 'username' and 'password' are form data, or are actually custom HTTP headers. The latter choice would certainly be a facepalm.
So will everything else, including the URI being requested
I disagree from the data I'm seeing in the access logs from my SSL-hosted site running nginx. In the logs I can see lines such as:
GET /path/script?variable=blahblah&another_variable=123
EDIT since I appear to have lost the ability to reply to comments: I disagree with SomeOtherGuy2 that The fact that it may get logged is a red-herring.
Ignoring how secure a server with a rogue user accessing it is, it's possible that there will be more than one server involved in this scenario, and central logging servers are common. Will the traffic sent to the logging server be encrypted? And what if the logging server is compromised? You're essentially storing passwords in plain text.
It's all encrypted over the network. Obviously, your servers have to be able decrypt URL strings (and everything else), or they wouldn't be able to respond correctly. I think the assumption is just that servers may by default log decrypted URLs for GET requests, but not POST requests. But, as people have noted, that's not exactly reassuring, since you have no idea how their logging is set up.
You are very confused. SSL is used to encrypt transmission between the browser and web server. Of course the web server decrypts the data it receives, otherwise it wouldn't be able to use it. I am saying you can not sniff someone's HTTPS traffic and see the urls they are requesting, so sensitive information being in the url is not a problem. The fact that it may get logged is a red-herring, as if someone has compromised the server to gain access to the logs, they can access whatever they want, your username and password included.
But you can be 99.994% sure they're writing your plan text user name and password into their logs if you're using get, and if somebody breaks in and gets them, then they have your password, even if those passwords are properly hashed in the user database.
No, I wouldn't be 99.994% sure of that at all. In fact, I would assume that if they are suggesting that people use GET, that they are in fact not logging the query params, as any security audit would catch that.
And again, if they are compromised, then they are compromised. It doesn't matter if they have logging disabled, someone who would have access to the logs also has access to either the httpd account or the root account. Either way, they can already read your plaintext usernames and passwords directly when they are being submitted. Of course, they don't need your username and password anyways, as they already have full access to the system.
Picking a comment at random to thank you for at least trying to explain to people why their assumption of what's being logged relates in no way to security.
At least one person appreciates someone taking the time to correct this rather serious misunderstanding.
"Everything in the HTTPS message is encrypted, including the headers, and the request/response load. With the exception of the possible CCA cryptographic attack described in limitations section below, the attacker can only know the fact that a connection is taking place between the two parties, already known to him, the domain name and IP addresses."
It's not 'compliancy' as much as they're reinventing the wheel.
REST was born out of a philosophy that the HTTP protocol already solved much of what you wanted to do.
HTTP not only solved this, but solved this a long time ago and with 'great success' (aka The Interwebs ;-)
- Authentication mechanism
- Operations (CRUD) -> GET, PUT, POST, DELETE, ...
- Caching -> Use HTTP caching mechanisms...
- Resources -> URL's
- Formats -> mime-types + use a HTTP Accept: header
- ...
They reinvent the wheel on many of the bullet points above:
custom operations, custom format handling, custom authentication mechanisms, ...
That's why this really is one of the worst implementation of an 'RESTful' API
Would you care to point out where in the HTTP RFCs it requires the use of certain methods for certain operations? The reality is, you can do whatever you like as far as HTTP is concerned. In fact, only GET and HEAD are required to be implemented, all the other methods are optional.
HTTP does not have "verbs". That is REST. Just because they aren't using a REST API, doesn't mean they are not HTTP compliant.
I'm not sure that it actually violates the letter of the law of the spec, but I think it definitely violates the spirit, based on this section: http://tools.ietf.org/html/rfc2616#page-51.
I thought that GET (along with a few other methods) were supposed to be "safe" and not result in any action on the server except possibly logging and stuff like that? Is this required to be HTTP compliant or is this more of a recommendation?
Custom headers do not REQUIRE a prefix. So the lack of prefix does not make it non-compliant. In fact, the HTTP RFC doesn't even say they SHOULD have a prefix. And RFC822 which defines headers only says that protocol mandated headers MUST NOT be prefixed with "X-". Also, there's a draft proposal to deprecate the "X-" header prefix as it does more harm than good: http://tools.ietf.org/html/draft-ietf-appsawg-xdash-02
We're so spoiled that now we're complaining about the APIs we do get? I would have died for stuff like this ten years ago when using data from other sites involved scraping, harassment, and trickery. Just like not everyone can produce a beautiful, accessible, standards compliant website, not everyone can produce a perfectly REST API. I give them kudos for opening up their system, or at least attempting to.
Though you make good points, perhaps Sharefile shouldn't claim that their API is RESTful when in fact it clearly isn't. There's nothing wrong with having a non-RESTful API, apart from it being far less clear and understandable as REST.
For many people, REST == ! SOAP. There is nothing more to it than that. And because REST is more an architectural framework than a protocol (remember, the protocol is HTTP), it's hard to teach them otherwise.
Kudos for the API, but lying about it being REST can cost developers significant time or at least blow estimates. If they butcher their API this badly, would you really put stock in the backend being written in any kind of sane fashion? Would you trust it?
Most of the proprietary APIs I've had to work with were just as bad. Sometimes I wonder if the NDA they make you sign before seeing the API documentation is so that you won't be able to show anyone else how bad it is.
I'm currently working with an unnamed credit API and all calls, regardless of status, return 200. Options can be strung together in single GET parameter. All calls resolve to a single URL and the method is chosen get a GET param (which variant of that method is yet another param). And a person's information can be passed in via any slew GET params, or as POST'd XML.
Its a mess... anyone up for a Stripe of the credit/authentication world?
I once spent weeks writing a wrapper for a particular service provider's API (which they charge a premium to access). Every request was a POST and every response was 200. In one function, timestamps were represented by a unix epoch, but others had the same field encoded as ISO 8601. The XML returned was not guaranteed to be well formed. Etc.
The kicker is that they declined permission to open source my wrapper because "the API is proprietary."
As @artanis0 (kind of) mentioned, it'd be great to see a HN post sometime soon called "How to write a good 'REST' API". Or even just some links to good tutorials that could get me (and others) started?
I've recently built a DB driven site that could possibly be extended with an API, and it's a good chance to learn something with a purpose behind it!
It's pretty straight forward, just do the following:
- Use HTTP verbs: GET to retrieve one or more objects, POST to create a new object, PUT to update an existing object, DELETE to remove an object.
- Address objects by collection and by individual object: /users/#{user_id} is a specific user you can PUT, GET or DELETE. /users is where you POST to in order to create a new user.
- Use HTTP codes to return the result back to the client (ie HTTP 200 when you get an object, 201 when you successfully POST an object, 404 when you try to GET/UPDATE/DELETE an object that doesn't exist).
- I find it good form to return objects in JSON with the type of object at the top of the data structure, ie {:users => [ #array of users here ]} or {:user => { #single user }}
- Use OAuth or some sort of token system for authenticating the calls, don't use HTTP Auth.
You can get pretty anal about things but if you follow the above you'll have a cleaner API than 90% of the API's out there.
> Use HTTP verbs: GET to retrieve one or more objects, POST to create a new object, PUT to update an existing object, DELETE to remove an object.
You forgot PUT to create a new object and POST to update an existing object.
> Address objects by collection and by individual object: /users/#{user_id} is a specific user
As far as REST is concerned, that doesn't really matter. The important thing is that the client doesn't generate the "/users/#{user_id}" URLs itself, but rather selects a URL from those it has been told about.
> I find it good form to return objects in JSON with the type of object at the top of the data structure, ie {:users => [ #array of users here ]} or {:user => { #single user }}
A big dumb mistake this api makes is sending commands and object ids via query parameters, instead of making them part of the url. The url should look like a file path that addresses an object -- the path of the objects shouldn't be in a query parameter. Especially when the object you're addressing is a file that has a logical location specified by a path.
Engineers like us can be such snots sometimes. If as a culture we want people to build great RESTful apis and a company attempts to do so, public shaming isn't the answer.
But. Their homegrown text serialization format is pretty wack. They couldn't just use CSV?
That culture of being "snots" comes from engineers in the real world. The one where this kind of attention to detail means buildings fall on people.
I agree that maybe it's not necessary to carry over that cultural legacy to the internet, although so far it's worked pretty well.
If social pressure (public shaming) isn't the answer then what kind of pressure should be used? Public shaming is a pretty civilized way to enforce rules when you have no top down control that dictates those rules, I can't think of any alternatives that would be less harsh.
According to some the latest hn posts about facebook, in our industry you're doing something wrong if the building doesn't fall over from time to time ; ).
Public shaming can be a good idea, especially on people who are knowingly breaking the rules and holding back progress (See IE6).
However, I'm just glad people are creating APIs, especially because basically no one gets REST right anyways (Hint: if you can't click around your api in firefox with the JSONView extension, it's probably not RESTful).
Agreed. I was mostly hoping to mitigate any conclusion that the design was a result of them being .Net developers. :)
Luckily there are some constructive comments in this post which point out some valid concerns. This is a good chance to share opinions and learn a bit. Unfortunately there are a lot of developers out there who are tasked with projects and have no one in-house with experience. I was in that boat and it took developing and maintaining a lot of bad APIs to learn what made a good one.
This reminds me of Shu-Ha-Ri (Cockburn from way back - Agile as Coop):
Shu - you don't know what you're doing (people here asking for help);
Ha - you closely follow the rules (people here complaining);
Ri - you understand the topic sufficiently to adapt and respond as necessary (people for whom this is really not a big deal).
[Actually, now that I read the Wikipedia page for that, it's not quite right - apologies to any martial arts people out there http://en.wikipedia.org/wiki/Shuhari]
Probably not, but after seeing how nice REST API's can be with sensible url mappings, HATEOAS, HTTP methods as verbs, etc. it's almost insulting for sharefile to call it RESTful (presumably because it can return JSON?).
Off the top of my head:
- API method urls are all the same .aspx, regardless of method used
- All calls are sent as GET (ignoring the whole point of HTTP methods in REST)
- No HTTP method codes as responses for automated parsing
- Custom authentication putting credentials in URL or in headers instead of relying on proven HTTP auth schemes we've been using for years (basic, form, etc)
- Return format is done with get arguments instead of HTTP content negotiation in headers (not so bad)
I'm with you on just knowing we can do so much better. Some of the RESTful HATEOS based APIs coming out are simply beautiful. However,
- API method urls are all the same .aspx, regardless
of method used
Extensions are meaningless. That is why we have Accept/Content-type headers. The HTTP spec even explicitly says to not use extensions to relay content information between client and server, from what I recall.
- All calls are sent as GET (ignoring the whole point
of HTTP methods in REST)
I haven't read Fielding's dissertation in a while, but using the "Coles Notes" version from Wikipedia, I do not actually see using the verbs a requirement for REST. I do not recall REST even requiring HTTP. You can use any protocol you want, so long as it conforms to the principles.
RESTful, on the other hand, does specify the use of HTTP verbs, but the site in question makes no mention of being RESTful.
By not using the proper verbs, the site does appear to violate the caching rules of REST though, I'll give you that.
- Custom authentication putting credentials in URL
or in headers (neither of which are encrypted over
https)
The entire https payload is encrypted, headers and all. REST says nothing about how authentication should be implemented.
- Return format is done with get arguments instead
of HTTP content negotiation in headers (not so bad)
This falls under the same as using extensions. Though I will agree with you that it is a reasonable compromise in some cases, such as using a browser where you can't reasonably set your own headers.
The query string and headers ARE encrypted in https. And forms auth, that's just a post and a cookie. You can't require API clients to support cookies.
But, and excuse me if I'm wrong on this I'm looking for clarification, you should be forcing API clients to sign their requests with some sort of token - which might as well be a cookie. The difference being that a browser keeps track of the cookies for you, and any other client will have to keep track of any auth info itself (in a sqlite DB, xml file, held in memory, etc)
> sign their requests with some sort of token - which might as well be a cookie
Using a token to sign requests is certainly a good idea, but you probably don't want to pass the token around using a cookie, even over HTTPS. Many browsers don't enforce good cookie security and will transmit cookies in the clear if an attacker can redirect the browser to a non-HTTPS URL on the same domain.
It's been a while since I've used AWS, but I consider their API to be the gold standard. IIRC, they have you concatenate a few fields plus a message length, and sign it with your secret key, and this signature is your authentication and anti-tamper mechanism.
Maybe AtomPub? It respects HTTP methods, uses URIs to represent resources correctly, uses <links> to be discoverable (hypermedia), is self-descriptive, is stateful... It uses Atom/XML instead of JSON, but still...
(but please correct me if it isn't a good example)
"API method urls are all the same .aspx, regardless of method used"
Isn't it more the fact that they spend all of their efforts documenting the URI structure and very little on the media types used - which is very unRESTful.
- Linked to that, a significant part of the documentation is about URLs to hit instead of being about document types
* Misuse of HTTP verbs, a lack of use (by going through POST for everything) would be bad enough but they go through GET which should be 1. safe (should not alter the server's visible state) and 2. idempotent (should be callable n times with the same parameters returning the same result, as long as the server's state was not changed). This API deletes things through GET.
* Lack of use of HTTP headers, instead of using the Accept header the client has to use an informally defined query parameter
* Lack of descriptive content types (linked to previous point)
* Complete absence of use of HTTP status codes (standard or custom), the API returns a status of 200 in all cases
This is pure and simple (and terrible, since it's over GET) RPC tunneled through HTTP.
I'd like to know the same thing. Aside from everyone complaining that they used the term RESTful wrong, is there anything really wrong? It authenticates securely over HTTPS so I am confused what is wrong. Saying it's not complaint I don't really follow either. Thanks for any help.
To me there is nothing really wrong with this API except that they made the mistake of calling it a RESTfull API and, as such, have incurred the wrath of REST purists.
If they were to change the auth method to use a secret key for signing requests and just call it an RPC API then I don't think there would be a problem.
I guess it's also a matter of taste. In any case, my taste is this:
RESTful APIs usually represent CRUD operations, each of these letter can be beautifully mapped to request types:
- CREATE -> POST
- READ -> GET
- UPDATE -> PUT
- DELETE -> DELETE
Second point: {error: false, value: actual_data} If we have an error variable, what is the HTTP error code then good for? Normal Web Servers use the HTTP error codes for a reason. Besides using standard webframework a json containing only "actual_data" means less code, less errors and so forth...
Third point: URLs should represent the hierarchy:
GET /users/43/bookmarks/32442?... is much more beautiful and straight-forward to work with than /api_handler.exe?user_id=43&bookmark_id=32442&operation=get...
Regarding authentication: use a secret API key, that's simple and secure. Everybody does that, from small services to multi million user services like facebook.
As a hint: read the dissertation of Roy Fielding who "invented" REST. In my opinion REST means to exploit HTTP as far as possible instead of using any custom conventions.
I agree with most of what you said, but mapping CRUD directly onto the verbs is perhaps a little simplistic. See http://jcalcote.wordpress.com/2008/10/16/put-or-post-the-res... for a discussion on it. The very short version is that idempotency is important, POST must be used for non-idempotent updates, and there is no reason PUT can't be used to create if you know the ID of the resource.
The discussion here around the article I linked was my first introduction to PATCH. I would really like to see it implemented in Pyramid so my site can take advantage of it.
There are a few comments in this post that touch on the flaws of this API. Here's the gist:
1. HTTP methods are improperly used. The documentation states "All API calls should be sent as a GET HTTP request". GET is used to retrieve resources, and should be cacheable. It should NEVER change the state of a resource. Yet we see in that it's used to delete, create, and modify resources! Examples:
Two things to note: 1) no 'aspx' bullshit, that's implementation and shouldn't be visible to the user, and 2) the ID is in the URI itself as opposed to the params. URIs should seldom, if ever, change. Parameter names may change, URIs shouldn't.
3. Violation of content type retrieval. Let's say I want a folder as XML:
Note that I'm using the Accept header. This allows for flexibility in accessing my resource. With any luck, the server will give me an XML file, and its content type should be "text/vnd.sharefile+xml" (vendor specific content types are best).
4. No link relations in the body. I'm mostly assuming this because I haven't seen sample response bodies, but almost no one does this even though is a crucial aspect of a RESTful service. RESTful services describe the relationship between resources. Suppose I have an ordered collection of files. When retrieving a file, it should describe its relationship in this ordered collection. This is done either via the LINK header, or some sort of scheme in your response body (e.g. JSON Schema describes 'link' attributes).
What would be the RESTful way to specify jsonp? (i.e. wrapping the json response with a jsonp callback function).
A semi-related question - this is something I've wondered for a while - does the XMLHttpRequest single origin policy actually do anything for security? What kind of malicious resource might you try to fetch with an ajax call that can't just be wrapped in a jsonp callback?
That is possible (for stupid caches anyway, on purely technical grounds there should be no difference), it also looks much better to developers/users and is much, much easier to map/dispatch in most frameworks.
I was just talking about the restful part of the thing, your urls can look like this:
That may be so, however he asked what else is wrong with this API. An ugly URL scheme is something wrong with an API in my opinion and lots of people agree (see Rails, etc).
I thank you all for your great responses you have given so far. I meant my question in the restfulness way, but poutine adds best practice advises that I'm more than happy to also know about.
Seeing their API just reminds me something. What is a good way to deal with methods that don't fit in the usual HTTP verbs (GET/POST/PUT/DELETE), like Rename, Grant, Revoke in their case?
In the past I've simply extended the uri to indicate additional methods like, POST /user/foobar/grant. I wonder whether putting those methods as additional HTTP verbs would be better.
Edit: extending HTTP verbs might not work too well with some firewalls/NAT that filter out non-standard verb requests.
Defining additional verbs is certainly allowed, bit it's worth looking to see if your operations would be more easily mapped to a different resource model. For instance, a "rename" operation might work well as a PUT to a foo/name endpoint.
I don't hear people mention Drupal much here at HN. I assumed it was the learning curve; but anyway.. I point your interest over at https://drupal.org/node/783460 where the Services 3.0 module's REST API development framework is demonstrated with an example of creating a simple RESTful CRUD service. For ambitious WebApps & WebAPIs, Drupal provides great scaffolding for development. And it looks like the Drupal8 branch, a few years away, is 'headless' as in being configurable to only be an API endpoint with no "web site" necessary.
Using Drupal to create a RESTful API is like using a tank to hammer in a nail. It takes about 50 lines of code to write a simple RESTful API in pure PHP.
http://nordsc.com/ext/classification_of_http_based_apis.html
Sharefile is not even a HTTP API (since it doesn't use HTTP methods correctly).
For security purposes, authentication can be further increased by a POST of the "username" and "password" through the HTTP Headers as individual headers instead of the query string.
POST https://subdomain.sharefile.com/rest/getAuthID.aspx HTTP/1.1
Content-Type: application/x-www-form-urlencoded
password: yourpassword
username: email@address.com
What... I don't even...