Please review my API for HackerNews

erikpukinskis · on Sept 15, 2010

Not sure if you have some other strategies for the URL scheme, but I'd probably use RESTful paths... like:

/users/{username}/posts instead of /by/{username}

or

/posts/{id}/comments instead of /comments/{id}

And how come threads are indexed by userid but posts are indexed by username? These kinds of things are the things that slip by developers and give them headaches. I can easily imagine not noticing the username/userid switch and being like "WTF!? 404?!" for a while.

I would expect the standard REST paths would a) make it easier to guess the paths and b) allow for simpler URL generation in client apps (you can generate the url for a user and then just tag on /comments or /posts to get the url for those things)

blasdel · on Sept 16, 2010

I'm sorry, but that isn't what RESTful means at all. Structured URIs like you mention can be pretty, but are completely unrelated to REST. In the real world they are often used in an anti-REST way, specified in advance instead of linked via hypermedia. If the client needs to use foreknowledge to construct URI strings, that goes against everything REST stands for.

REST is Hypertext As The Engine Of Application State — the default modus of ActionController::Routing::Routes has nothing to do with it.

A design that uses only opaque UUIDs as names for resources and reveals them to the client via links in the responses is perfect REST. Clean-looking URIs are a distraction, except that they tend to be easier to preserve across software rewrites.

ronnier · on Sept 15, 2010

Great advice, I agree with you. I'll work on changing it and the docs, but leave the existing paths for sometime. I've learned something, this was all worth it :)

pierrefar · on Sept 15, 2010

Looks clean and simple. Well done. I'd add in the documentation comments about caching (on your side and the requester's side) and rate limiting.

For those wanting debugging tools for JSON APIs (a common request for the APIs I operate):

* JSONView, an addon for Firefox that prints nice-looking JSON from the URLs of the API. https://addons.mozilla.org/en-US/firefox/addon/10869/

* Tidy JSON, a command line tool. http://www.raboof.com/Projects/TidyJson/

stwe · on Sept 15, 2010

An API that is subject to change should have a version number as a namespace somewhere in the URL. That way you can have different API versions running and it makes less painful to go forward.

edparcell · on Sept 15, 2010

How timely. I just started writing a library to scrape data from Hacker News because I wanted to put the posts I'd upvoted in the sidebar of my blog.

Link: http://blog.edparcell.com/how-i-added-my-hacker-news-saved-s...

Your API has advantages and disadvantages against this approach: On the upside, it provides a uniform way for all languages to access content from HN, which is really cool.

On the downside, all requests through your API have to flow through your server - this makes me uneasy for two reasons: First that you could switch off your servers, esp. if take-up is high and you are not being compensated sufficiently for running them. And second, because I'm uncomfortable authenticating to an intermediary.

timmorgan · on Sept 15, 2010

Similarly, http://github.com/seven1m/hackernews if you want hit HN directly with Ruby.

Tichy · on Sept 15, 2010

If PG isn't opposed to an API, maybe somebody could hack the HN code to add it natively?

anoopengineer · on Sept 15, 2010

For anyone interested, I have created a Java library wrapping the JSON APIs exposed by Ronnie.

http://github.com/anoopengineer/jhackernews

Currently supports only fetching of News pages - top pages, new pages and ask HN pages. Support for comments and voting to be added soon.

Licensed under Apache 2.0 license.

mcyger · on Sept 15, 2010

Very cool idea. I will definitely start using it on my iPhone.

Regarding security, you are proxying login credentials through your server. Is that correct? I'd suggest putting up information regarding your privacy policy, if you store any credentials information and the security of your server(s).

ronnier · on Sept 15, 2010

That's a good idea. I'll put that up tonight.

FYI, I don't store any data at all. The username and password are required to get an auth token from HN, which is only needed for voting and commenting. The token is what's stored in the cookie that HN issues.

sjbach · on Sept 15, 2010

How about a way to retrieve comments or post ID for a given story URL? I often save articles and read them days or weeks later, and it would be nice if there were a simple way to find the associated HN discussion without risking upvote/story submission using the bookmarklet.

bittersweet · on Sept 16, 2010

I could probably build something like that on top of ronniers API, not sure because I haven't had a chance to look at it and the api page is not loading for me at the moment.

ronnier · on Sept 15, 2010

I'm unable to do that because there's not really a way to do that on HN now. I don't store any data so I have nothing to query against.

ethikal · on Sept 15, 2010

I second this. It would be awesome if you could search by url.

petervandijck · on Sept 15, 2010

Looks really awesome, congrats. So it scrapes hackernews and then exposes the data as a JSON api?

ronnier · on Sept 15, 2010

Yes, and caches the data for a couple of minutes.

jacquesm · on Sept 15, 2010

If the data you get is old then I'd suggest caching it much longer to increase the chance of a hit, it probably will not change anyway.

That would lessen the load on the HN servers considerably, especially if your service becomes more popular.

chopsueyar · on Sept 15, 2010

How often is it hitting the server and how many pages deep does it go?

ronnier · on Sept 15, 2010

It only hits HN when asked, and caches each request for 200 seconds. Additional requests return the cached version instead of re-scrapping HN. It mimics what is on HN. So requesting /page only returns the first page. If you want to go deeper, you need to pass in the next page ID, which is returned when you request /page.

chopsueyar · on Sept 15, 2010

Good work, man!

TamDenholm · on Sept 15, 2010

Very awesome, needs a search, but yes, I'm sure it'll get use.

pvg · on Sept 15, 2010

It sounds like a plea to be smacked with a banhammer, more than anything else.

It seems to support automating things things that are almost certainly better off left un-automated - posting, voting, commenting.

It doesn't support anything that might be interesting to automate - say, asynchronous notification on replies to me, posting or commenting on my url, mention of my name, mention of keywords I care about, etc.

It asks for HN credentials.

Nit, but still a little lame - lifts the HN favicon.

daleharvey · on Sept 15, 2010

ihackernews is currently the best browser for hacker news on the mobile by far (its far better than the "app" in the android market).

this api is just extracting what he already built for ihackernews and allowing others to use it, I would be very very surprised if pg banned it, if it causes any issues then they can pretty surely be sorted out.

alttab · on Sept 15, 2010

I think the original commenter was hinting at the opportunity for the author to provide a service on top of HN data, like monitoring posts, replies, karma levels, or providing more categorized feeds based on preferences.

It could violate terms of service as it could at its most basic level scrape Hacker News for this information, but there hasn't been any issue with it so far.

pvg · on Sept 15, 2010

ihackernews is currently the best browser for hacker news on the mobile by far

Not really relevant to anything I said.

this api is just extracting what he already built for ihackernews and allowing others to use it

There are actually significant downsides to 'others using it'. One is that it becomes a single point of failure for anyone using it. It's essentially a proxy so one abusive user could make HN ban the whole thing and everyone else with it. Same goes for downtime, etc.

Similarly, it introduces a third party in the authentication process for relatively little value and significant risk.

I could well be missing something but I just haven't come up with very many reasons such a service is a good idea to counter the many obvious ways in which it is a bad one.

daleharvey · on Sept 15, 2010

"Not really relevant to anything I said."

By ignoring the fact that this was built for a practical purpose, and only mentioning ways that it could be abused implied that the author had bad intentions when writing it, I was clarifying that for everyone else, he should be thanked for ihackernews at the least.

I didnt say there was no downsides to people using it, but people can figure that out for themselves, there are also significant advantages over 1. not writing code yourself, and 2. caching and sharing the load from this domain, we already know that this site has stability issues and bots can quite easily affect the load, even if all people do with this api is create new ui's for hacker news then it would be worth it, I am pretty surprised that is the only useful thing you can see it being used for, there is obviously a lot of use cases.

pvg · on Sept 16, 2010

By ignoring the fact that this was built for a practical purpose, and only mentioning ways that it could be abused implied that the author had bad intentions when writing it

No it didn't imply anything of the sort. And whether something is built for a practical purpose or not is, in fact, not relevant to whether it's stupid or not. My point was that I think having this is a public web service is stupid and I explained why. The practical purposes you speak of could have been achieved just as easily by releasing the code so people interested in such functionality can use it as a library or host the service themselves.

Robin_Message · on Sept 15, 2010

This looks really handy for getting hold of my raw data. One thing -- parentID for comments I fetched with http://api.ihackernews.com/threads/Robin_Message is blank -- is it meant to be or am I missing something? I'd expect it to be the id of the parent comment, and possibly for there to be an "On" field that takes me up to the top level.

ronnier · on Sept 15, 2010

Thanks, I'll look at this tonight and get it fixed.

Robin_Message · on Sept 16, 2010

    "children":[{"postedBy":"ronnier","postedAgo":"21 hours
    ago","comment":"\u003cfont \u003eThanks, I\u0027ll look
    at this tonight and get it
    fixed.\u003c/fontu003e","id":1694452,"points":1,
    "parentId":1694340,"postId":1694049,"cachedOn":
    "\/Date(-62135575200000)\/","children":[]}]}

Thanks!

thibaut_barrere · on Sept 15, 2010

Very nice!

Two questions, curious as I'm as well in the process of indexing HN, and your API may help me avoid this:

- how much content are you actually indexing ? Do you keep every single post or only the ones that do it on the home page or ask HN ? How far in time did you go ?

- do you have some way to implement a full-text search (eg: posts that contain a specific word, to be accurate) ?

ronnier · on Sept 15, 2010

I don't store or save any data, other than an in memory cache. I just scrape, process, and output the data. Since I'm not storing data, I have nothing to search.

sahillavingia · on Sept 15, 2010

I may very well use this to launch a Hacker News reader for iOS. Is there space for this (would you want it)?

Raphael · on Sept 24, 2010

I couldn't wait for JSONP, so I whipped up a simple wrapper proxy (couldn't believe this didn't exist already).

http://jsonpify.appspot.com/?url=http://api.ihackernews.com/...

pyronicide · on Sept 15, 2010

This is awsome, I was looking for something exactly like this last weekend! No more scraping for me =)

mike-cardwell · on Sept 15, 2010

IIRC there is some Internet Explorer issue involving the "application/json" content type which makes it safer to just use "text/plain". Worth looking up...

RyanMcGreal · on Sept 15, 2010

You usually access an API programmatically rather than via the browser, so this shouldn't be an issue for most use cases.

pmjordan · on Sept 15, 2010

Unless it supports CORS for cross-domain access from the browser: http://www.w3.org/TR/cors/

Which it probably doesn't.

mike-cardwell · on Sept 15, 2010

It says in the first paragraph that he intends to support jsonp in the future. It will matter then.

abp · on Sept 15, 2010

Wich language/framework have you used to create it?

ronnier · on Sept 15, 2010

It's all C#, ASP.NET MVC. I'm just exposing functionality that I had to build in for http://ihackernews.com.

vyrotek · on Sept 15, 2010

Awesome. I'm hoping that you already have a library of HN entities and are using the JsonSerializer? Would it be possible to share that C# library or some client code?

Of course I could just do it myself, but it seemed like silly work going from C# > REST > C# :)

abp · on Sept 15, 2010

Thanks, but haven't read your resume first. After that i thougt that this is the answer. :)

johns · on Sept 15, 2010

Now you should write a client for the API using RestSharp ;)

gsiener · on Sept 15, 2010

I thought PG frowned on this kind of stuff?

ronnier · on Sept 15, 2010

Can you link me? If he does, I'll bring it down.

zackattack · on Sept 15, 2010

etiology of previous frowns suggests frustration due to load placed on the server. in this case you should actually be alleviating load on the server since so many people love to hack on HN for fun.

RyanMcGreal · on Sept 15, 2010

PG frowns on clever hacking?

adrianwaj · on Sept 15, 2010

At some point this API will fail if it gets popular to the point of triggering HN's abuse protector.

robin_reala · on Sept 16, 2010

OpenID login? Everyone seems to forget about this.

auston · on Sept 15, 2010

Thank you!