Hacker News new | past | comments | ask | show | jobs | submit login
Hacker News Official API (github.com/hackernews)
500 points by marchupfield on Aug 21, 2022 | hide | past | favorite | 131 comments



Only using the official API is quite cumbersome, as fetching a story with comments requires one network request per each comment.

Instead, try considering the (more unknown) Algolia API[0]. It supports fetching the entire tree in one request. Unfortunately the comments aren’t sorted, so you’ll still have to use the official one (or parse the HN HTML) if you need that.

For my HN client[1], I parse the HN HTML (as it’s required for voting), and sort the comments from the Algolia API. It’s much faster than recursively requesting the official API.

[0]: https://hn.algolia.com/api

[1]: https://github.com/goranmoomin/HackerNews


Yeah, I built a graphql API for HN for this reason. It will let you fetch all comments, users, and posts in a single request.

https://hngraphql.fly.dev/graphql


Can you share the code behind the API?



thanks


If you already parse the html, what's the point in calling the Algolia API at all?


I believe HN's website has a rate-limiter even for viewing, which the HN API doesn't have.


I found this interesting, looking at the Firebase API. I guess one could get the min/max item ids for the child posts, and query the range of ids (startAt(), endAt()).

Van Puffelen answers some of this here, but he doesn't get into specific queries: https://www.youtube.com/watch?v=66lDSYtyils

He also mentions here that Firestore offers a bit more for WHERE type clauses over Firebase:

https://stackoverflow.com/questions/26700924/query-based-on-...


In response to people's complaints about the usability of the HN Firebase API: yes. We're going to eventually have a new API that returns a simple JSON version of any HN URL. At that point we'll phase out the Firebase API, with a generous deprecation period. I'd be curious to hear people's thoughts about what a generous deprecation period might be.


The great thing about the Firebase API is that it's live, so you can subscribe to updates to a post/comment/profile and see them as they're made (no polling, just websockets). I'd hope any future version would retain this feature.

I'm reluctant to lose it as I'm about to release a browser extension that relies on it pretty heavily :)

[Edit]: this feature is undocumented in the link above, but it's a standard feature of FB and works as expected in all client libraries https://firebase.google.com/docs/database


Hmm. I'm sorry but I can't promise that. It isn't part of the HN API, though I can understand why people use it since Firebase offers it.


So is that to say it's just an unintended consequence of the data being hosted on Firebase? That's wild. Especially as I'd be surprised if most integrations aren't using this feature (IMO it's by and far the best feature of the API).

To demonstrate, here's an example in 18 lines of code that v. efficiently listens to live updates to a profile (yours): https://codepen.io/theprojectsomething/pen/xxWBOWN?editors=1...


It was certainly an unintended consequence of the data being hosted on Firebase. I'm not saying it isn't a valuable feature! And if people are doing interesting and useful things with it, I'm open to implementing something to keep that going—as long as it's simple and there's a clear line between it and "reimplement Firebase", which would be a foolish thing for us to try to undertake.


Thanks, appreciate the insight! I think client pub/sub is the feature request, not firebase clone :)

To that point there's definitely a few OSS solutions out there that provide this kind of functionality out of the box (recent YC alumni among them). But then who knows what problem you're trying to solve. Sounds like this is just rain on the tip of the iceberg.


Oh, does that mean the /v0/updates endpoint will disappear in the new API? I found it to be quite valuable.


It depends on how hard it is for us to implement something like that in HN's software. I'm sorry not to have a clear answer for you guys, but we haven't had a chance to think carefully about this and it will be a while before we do.


I'd say it depends on how straightforward the new API will be. The app I use (Materialistic) is both free and ads-free. So I don't expect the developer to drop everything they are doing to update it if will require a lot of work. So 6 months minimum if everything is well documented, a year if it will be a hassle.


I use materialistic also, but as far as I'm aware that hasn't been an update to the app in years. In addition I believe the materialistic app listing on Google Play is also been removed.

Do you have some updated version that I'm unaware of?


I would say a full calendar year is "generous" from the announcement date


Can you allow login and write operation via the API?

I want to build a HN web client but it's not possible without scrapping the site which won't scale.


Im typing this from the iOS app HACK which somehow allows for writing comments and upvoting etc.


They have to scrape the site in the background to work. While this is doable, any such client will need to follow the HN bot policy which limits the scrapping speed.

Another issue is browser only client. If HN has set proper cors (I didn't check if it is the case), then you cannot build a web client through scrapping without proxy.


Probably not in the first release, but hopefully after that.


6 to 12 months AFTER we have dark mode.


I'd say about 6 months.


Honestly, it would be nice if the Firebase API were updated. You could even build the former on the latter, so JSON urls pull from the API.

Just being able to get an entire thread in a single request alone, and filter queries would make things much easier.


How difficult would it be to host the same Firebase API on top of whatever will be driving the new JSON stuff?


Yes. At the very least some kind of pub/sub interface? Also worth noting that, if it's the data depth issue that's looking to be resolved, the Algolia API is often useful in places the Firebase structure falls short.


If you mean the documented API, it shouldn't be hard. If you mean reimplementing Firebase, that would be another matter.


Id do the opposite, Id build the new json stuff on top of the firebase api.


Just keeping the old API functional for its clients, really.


As a dev who has built a Hacker News client app, I would say at least 12 months.


Wish API would show comment votes instead of only showing them up until 8 years ago when displaying them was turned off on the main site. Aside from allowing users to enable them via their on interface, do analysis, etc — it broke HN’s comment search based on comment popularity.

Related:

https://news.ycombinator.com/item?id=32535637


I’d like to be able to conveniently access my most popular comments. There are quite a few things in there that I might like to write longer-form articles about.

(https://james.darpinian.com/blog/scraping-my-own-hacker-news... gives one way of doing it, by crawling your /threads?id=‹username› page, which does contain the point counts for each comment by you.)


Which would be possible if HN was API first and all data beside email for recovery were public. Otherwise, HN is simply not designed to be responsive to user system design requests — but their have been a massive amount of alternative systems built by users that built things HN would have never made.


(Aside: I finally got round to scraping my comments in that way. My comments follow the sort of distribution you’d expect, ranging from −4 to +76, but there’s one outlier at +283, https://news.ycombinator.com/item?id=27274146. People really hate stale bots on bug trackers.)


I don’t want vote information to be publicly available. Hiding those numbers was good for the site. Users have ways to get their own data. That’s enough.


What difference would exposing counts make? Why should this data be mineable?

We're already helping to create a training set of slightly above average English speaker discourse for free.


If it was up to me, all data would be publicly available aside from emergency recovery email and password.

This would include mod commutations and mod logs; unless it was security related, which would have a formal process with deadlines for release.

—-

To address you specific question as why the comment voting data would be of use, among other reasons, it would allow users to have custom filters. For example, posts and comments with high up/down votes tend to be flamewars. Mods also random boost/ding posts/comments too.


It's not going to happen. Showing comments led to constantly flamewars about unfair commenting. Even without the comment scores, the mods are still constantly having to remind people to stop arguing about scores. The decision not to display or make available other people's scores was litigated intensely; it is now part of the premise of the community.


By “showing comments led” do you mean showing comment points/votes? Also, if comment viewable comment points are gone forever, then the “popularity” filter should be removed from HN’s search and the old voting data removed from the public API.

To me, if users have concerns about “unfair voting” to me the solution is not to hide them. Also, if comment points aren’t important, why even allow them? They are obviously important and if a users is not available to respect guidelines not to complain about them, what’s difference between that an guidelines related to comments about being downvoted?


Like I said, this has been heavily litigated. So much so that I think you'd have difficulty coming up with an argument that hadn't been made already about it. Not making other people's comment scores available is, as I said, a part of the premise of the community. There are other communities that work differently. There isn't a single optimal set of policies.


You’re right, obviously the best way to secure the community was to obscure the votes of people that were upvoting toxic content instead of the mods/community just publicly tagging and flagging the related undesirable content and penalizing the voting weights of anyone upvoting that content.

If votes being public impacts voting patterns, listing top users, having usernames publicly associated with comments, being able to follow given user’s comments, etc — for sure does too. What’s the difference?


You're arguing here as if there is an argument to be had. I'm trying to help you understand that there is no such argument. Your claims have been considered and rejected hundreds of times over the last 10 years. It'd behoove you to consult the search bar below and read some of those old discussions. This isn't a case where the community made a rash decision and just doesn't realize how problematic it is; you're talking about what is probably the single most deliberate community management decision HN has ever made. It's not going to change.

A moment later

Maybe it'll help if you get your head around the goal of HN. There are a bunch of things you personally seem to want to accomplish with HN. But HN itself has just one goal: to nurture curious conversation. Other sites have other, equally valid goals: propagating the news as quickly as possible, or relentless focus on a particular topic, or getting the best Q&A pairs to the top of Google's search results. HN, though: curious conversation. That's the whole ballgame.

Anything that drags on curious conversation is a non-starter on HN, even if it might make a bunch of things better for you.

We don't have to guess about whether making comment scores available is a drag on curious conversations; it manifestly was, for years. People have an innate, visceral reaction to comment scores they feel are unjust, and they talk about them, and those conversations choke the landscape like kudzu.

A lot of things about HN make more sense when you accept the premise of the site, and understand that HN will make most sacrifices it can come up with to optimize for that premise.


Just to be clear, intentionally not responding, since you repeatedly in my opinion ignored my points and contrary to your claims, did not engage me out of intellectual curiosity.


I think tptacek is saying something meaningful in suggesting that there's already a long history to this discussion. Personally I disagree that the soul of HN is about curiosity. If you removed all the explicit comments about curiosity and showed the site to new tech people, I don't think there would be consensus on curiosity. If you showed a bunch of people the Rust subreddit, there would be consensus that it's soul is Rust.

Personally I see this site as the Reddit of tech with 20% of its flavor coming from YC companies. That is how it behaves, as opposed what moderators say the site is about.


'curious conversation' is a much loftier, woolier and syntheticier goal than 'Rust' and HN strives towards it imperfectly. The measure, though, is 'does this or that change bring us closer to it' not 'can people unfamiliar with the site achieve mindmeld with dang by staring at the front page'.


I hear you, but what is the endgame with this? Does it lead to a better HN?

Lobste.rs actually exposes even less and doesn't provide an API because the users haven't consented to such intrusions.

Personally, I don't want to be able to review every vote for or against my HN account. It wouldn't be productive and would likely lead to me not wanting to be here at all. It'd be like having a super strong mirror and then looking at your skin and being confronted with too much information about the reality of how nasty looking and imperfect being human is. Basically not a healthy pastime.

FWIW, if you're truly passionate about this, it's not difficult to setup your own site with the lobste.rs open-source code. Maybe there is a whole subset of like-minded folks.

The lobste.rs origin story may also be of interest to you. The TL;DR is jcs was being singled out and abused by pg on HN. Thankfully there is no evidence dang has (or would ever) do this, it was certainly toxic behavior on Paul's part. Being a moderator isn't for everyone, it's a Certain Thing, and net-positive mods who avoid power tripping are truly rare gems on the Internet. Open information would guarantee such abuses couldn't happen in the dark, but at what cost to the best aspects enabled by the HN ecosystem?

References:

https://news.ycombinator.com/item?id=4452568

https://news.ycombinator.com/item?id=32112665


Again, just expressing my opinion, but another example would be it would allow custom ranking algorithms like PageRank, TrustRank, etc.

And yes, aware of Lobste.rs mod logs, but Lobste.rs is basically dead compared to HN — largely because it’s an HN clone.


This is a website where people post links and other people comment below them. Most front page discussions don't surpass 50 replies. I don't really think we should care so much about this place to the point that we're willing to employ state of the art data analysis tools just to read a blog post and see if someone has critiqued it in the comments


This kind of stuff would probably just seed the sort of neuroticism that makes forums worse.


Yes, it is. And if you feel that way, that's fine. Why gatekeep?


What am I gatekeeping, exactly?


For starters, that you only want HN to be used the way you want to use it; which for sure it not the way I want to use it and more than likely significant number of other users too. If that’s not gatekeeping, then what is?


> Lobste.rs is basically dead compared to HN — largely because it’s an HN clone.

My outsider opinion is that lobsters is dead because the sign up process is designed to keep people out. I appreciate there's also the network effect of people coming here because people come here, but I for sure would visit lobsters too if I could participate without having to know a friend to get past the velvet rope


> This would include mod commutations

In 2019 I set up a bot to store titles for HN stories as soon as they appear on the "new" page, so that I could then compare them to the current version.

I had also made a browser extension for FF and Chrome that compared the current version of any HN story to the stored one, and displayed a warning if they differed.

There was zero downloads of the extensions, so they fell into disrepair and don't work anymore, but the cron job for storing the titles is still active.

If there is any interest I could resuscitate the extensions, or some other system to make that information useful.


I wrote a script that will allow you to use cookies to download your comment scores [1]. I also did some analysis of my own scores [2]

[1] https://github.com/superb-owl/hacker-news-comments

[2] https://superbowl.substack.com/p/commenting-on-hacker-news


I found the official API quite a pain to work with while building myself an alternative "hn ui"[0]. I mostly switched over to using the Algolia API[1], which is a lot more "fun" to work with. I can only recommend checking that one out as well.

[0]: https://hnhub.dev/

[1]: https://hn.algolia.com/api/


I wrote a fairly simple "Who is hiring?" browser (using this API) that fetched the post listings and stored them locally in an HTML file accessible by pointing your browser at file:///... . It supported filtering by "Remote", "Interns", and "Visa" with buttons, had an ad hoc Regex filter, and the ability to remove posts using a delete button and not see them again (or to restore all deleted posts) using the localStorage api.

It was indeed barebones but I'd like to think it helped me find a job. I would link the code (it's on GitHub) but today the "Who is hiring?" posts include a few considerably more advanced and capable "searchers" right in the top of the post:

> Searchers: try https://kennytilton.github.io/whoishiring/, https://hnjobs.emilburzo.com, https://news.ycombinator.com/item?id=10313519.

It was fun; I enjoyed working on it.


I'd still like to see yours if you're up to sharing! :)


Hi, thanks for your interest!

I am definitely happy to share, but I didn’t expect any interest and I posted my comment under my HN pseudonym account. However when I link to the repo on GitHub that thin veneer of obscurity will disappear and I’d prefer that it did not. Being a programmer, I am quite familiar with workarounds and I am happy to offer you couple of different options:

1. You could add an email address to https://news.ycombinator.com/user?id=maxique and I can email you there - I’m happy to share my name and contact info in a one-off request, just not on an ongoing basis for anyone to see.

2. I could clone the repo onto my current machine and delete the .git dir, leaving the code and the README.md intact, produce an archive of that stuff, and link it here in this thread. While I’m sure it may be possible to use a code search tool to find the repo on GitHub after that, I am not terribly bothered by that.

I am open to other options as well if you have any suggestions. Finally, the code was written for whatever version of the Ruby interpreter (MRI) was current in 2016, in case that will dissuade your interest.

EDIT: I have a bias to action, appreciate interest in my work, and figured it was likely that you would see my reply at some later point and then I'd have to see your reply to my reply at some even later point, and we'd continue the loop, and eventually your interest would wane. So I went with option two. Here you go: https://we.tl/t-Bu257EdQn9

This is my first time using WeTransfer; let me know how it works out for you. I actually originally had a link that I hosted on file.io (which I've also never used before), but it looks like that gets automatically deleted after one download, which seems slightly excessive to me.


I wrote a simple hn data downloaded based on this api. Last I remember I was able to download the entire corpus by letting it run overnight.

https://github.com/ashish01/hn-data-dumps


Oh this might be exactly what I need. Giving it a try now. Thanks!


Firebase is owned by google. There are plenty of OSS solutions to create this exact realtime thing without getting too close to the elephant in the room.


I don't think it was owned by google when hn started using it for the API.


I think using Google or not is pretty irrelevant, but I looked because I was curious.

This repo started in 2015, and Firebase was bought in 2014. I'm really surprised; I thought it was way more recent than that.


The oldest snapshot from the domain on the wayback machine is from 12 days before it was acquired, and the API might be older than that.


Switch to Supabase?


This is so awesome. I like very much the spirit of giving access to the api as-is. With no fuss.


I'm currently writing an HN posts score tracker (basically it plots score variations over time since they were in new for each popular post) and it's really handy that you can just subscribe with eventsource [0] and get realtime update when something changes without having to manually poll each post wasting resources. Not yet sure about the latency though.

[0] https://github.com/jpopesculian/reqwest-eventsource/



Nice one, wish there was something similar for user rankings (with karma points).


I know but it does not provide detailed and raw data, making it impossibile to do a bunch of interesting in-depth analysis. Also, I like to build this kind of stuff to explore async rust capabilities.


Also https://upvotetracker.com/

Gives a few more variables, and the different sample times show a different set of transient effects.


Tiny project I built using the same API https://github.com/imnitishng/hkrnws


If you want to see a small example of this API being used, here's a utility to generate an RSS feed of replies to comments you post that you can self-host:

https://gist.github.com/Q726kbXuN/15e61acc003bb6d46a458001fe...


Anyone know of a good tool to get something like push notifications when specific topics come up on HN? I've been thinking maybe I should build one, but I also feel like someone must have already done it by now.


You can check this [1] service that I prepared for myself then made it publicly available. For now it’s only sending popular stories as notification. Planning to add keyword feature.

[1] https://hnn.avci.me/


Would you consider adding regex support as well?


Don’t have any suggestions but as an alternative to notifications, I wrote a script to check the HN API for keywords such as “outage” “downtime” “breached” and show the results on a TV screen in my office.

Has been quite useful for the ops side of what I work on!


Cronjob to the Algolia Search API, send to your email.


There's been several, but they always get abandoned after a year or two.


Similar to this, I created "HN Faves API", which is an API to fetch a user's favorited posts and comments on HN. Interestingly, HN doesn't have an official API for this.

On GitHub: https://github.com/plibither8/hn-faves-api

You can try it out here: https://hn-faves.mihir.ch


I've never used Firebase before. Took a little while to dig through the docs, but finally would the web/js version for observing and get:

https://firebase.google.com/docs/database/web/read-and-write


I use this API in a few places such as a custom New Tab extension I made for myself on Chromium based browsers:

https://github.com/overshard/newtab/blob/master/newtab.js#L1...

Haven't had any issues with it!


If you're looking to get your comment scores, the API sadly will not oblige.

I wrote a script that uses your browser cookies to scrape them from the website here: https://github.com/superb-owl/hacker-news-comments


Annoying how it doesn't show the text or anything for dead items.

Is there any way to query this API as if showdead is switched on?


I used to query flagged comments from time to time using the API, just out of some morbid curiosity. It definitely used to work for that; I am not sure if flagged items are in the same class as dead ones, though, or if they have since deliberately eliminated this capability. I eventually found more interesting things to do :)


The problem was that many third party apps and clients were displaying flagged and [dead] comments as if they were normal comments. Then we'd get angry emails about all the horrendous comments we were "condoning" when in fact the users posting them were as banned as banned could be.


For a few years I am working on the side on a HN client. One thing that is really sad is that there is now way to handle authentication properly as a 3rd party. The user basically has to trust you with her/his password or you have to "steal" the auth token from a login.


Again, it would be also great if we have a user's personal API to get our private info either.


Agreed. An read only API for my upvoted stories and comments would be great so I can preserve this data and use it outside of HN.


Also be nice if per private user behavior, that a user was able to authorize API access to it without risking account lose.


It always amuses me when the API link in the footer floats up as a submission every few years.


It's almost as if a color scheme of grey on grey makes features hard to notice.


Show HN: DeepHN – Full-text search of 30M Hacker News posts and linked webpages

20210413 https://news.ycombinator.com/item?id=26791582


The API is pretty nice to work with, I’ve used it to build an open source alternative HN client[0].

I do wish it would support login though, but you can’t have it all.

0. https://modernorange.io


What is the use case? I can't imagine anything besides getting a real time view on the frontpage or likes of my own comments that a monthly static data dump wouldn't solve.


Probably 3rd party clients that offer quality of life / usability improvements over the website.

E.g. I use Hackers [1][2] because it is much easier to read stories on mobile and supports dark mode. Unfortunately, it doesn't support commenting or downvoting (yet?)

[1] https://apps.apple.com/us/app/hackers-for-hacker-news/id6035... [2] https://github.com/weiran/Hackers


I am the developer of HACK (available on iOS, Android and MacOS) which allows commenting, upvoting, downvoting, favoriting, submitting new posts and also supports push notifications for replies. It is the top Hacker News client in the App Store.

https://apps.apple.com/ca/app/hack-for-hacker-news-developer...


Writing this reply with HACK, thanks for the app! If you are comfortable with it, could you share rough revenue estimate for the app of such popularity? How do you approach promoting such a specific app?


Thank you! I'll give it a download


Has anyone done a mass archiving of HN via this API?

It would be a shame for this vast trove of expert knowledge and nerd sniping to vanish into the ether after the next hard drive apocalypse or similar.


I’d like to build following and alerts for commenters I like here… is it possible on top of this or Algolia?



Why not look at the API and see for yourself?


It would be interesting to see some live readout of the item updates (with context somehow).


Would it be feasible to make a Firebase-to-Supabase connector?

Having a self-hosted private HN Algolia (perhaps via Elasticsearch) would be nice. Algolia's forced fuzziness and limited number of results per page are a drag. Why can't they provide any search controls? Or at least support the "-" operator to exclude a term.


Would be nice if HN was API first, which address number of the requests on this page.


Is this expected to be a read only api, or is write planned for the future?


Huh has the link to this on the bottom of the page been there since jan1?


There was a link to the API documentation at the bottom for as long as I can remember using HN, but I’m not sure if it always pointed to the GitHub-page linked in this post.


What interesting projects could be built/are built on top of this?


I've been hacking on a read-only client in Godot for a while. I don't know if that's "interesting" though.


> The v0 API is essentially a dump of our in-memory data structures.

TIL that HN doesn't use a "proper" / persistent DB. I wonder then how HN handles data persistence. Is it as simple as creating write-ahead logs + checkpointing every some duration?


Sure they do, what they are saying is that the data format is just the raw format they use in memory after loading it from a database.


No they don’t. This forum was created as a MVP of a webapp using PG’s Arc Lisp. All user and comment data are stored as tables serialized to text files or in memory. You can find an old version of the code at http://arclanguage.org


So they do use a database? In the form of text files. The comment I was responding to was edited, and before it was asking if they stored everything in memory and how they handled reboots.


Since there is no update shouldn’t the headline have the year?


An official RSS/Atom feed would be great too.



Ugh no... No idea how I've missed that.

Thanks!


I was actually wondering how I knew about it :p

I'm not sure it's linked anywhere visible, but it does have an entry in the html head....


There's also /showrss

I'm not sure if there are more tho, I know /ask doesn't have one.


How hard is it for HN to make its own API? so I could do something like:

curl https://api.news.ycombinator.com/


My understanding is that this is the official HN API. I think Firebase is a YC company. Others can correct me if I'm wrong.

See the link at the bottom of every HN item page.

Edit: I took a look, it seems like Firebase is owned by Google now, but it was original a YC-funded startup, judging by this page: https://www.crunchbase.com/funding_round/firebase-pre-seed--...


Yes, Firebase is listed on YC’s list of companies they have funded here:

https://www.ycombinator.com/companies/firebase


How much does HN cost to run I wonder.


You may be able to determine that from this: https://news.ycombinator.com/item?id=16076041


Hopefully, v1 will include webhooks.


I wish it had auth




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: