Only using the official API is quite cumbersome, as fetching a story with comments requires one network request per each comment.
Instead, try considering the (more unknown) Algolia API[0]. It supports fetching the entire tree in one request. Unfortunately the comments aren’t sorted, so you’ll still have to use the official one (or parse the HN HTML) if you need that.
For my HN client[1], I parse the HN HTML (as it’s required for voting), and sort the comments from the Algolia API. It’s much faster than recursively requesting the official API.
I found this interesting, looking at the Firebase API. I guess one could get the min/max item ids for the child posts, and query the range of ids (startAt(), endAt()).
In response to people's complaints about the usability of the HN Firebase API: yes. We're going to eventually have a new API that returns a simple JSON version of any HN URL. At that point we'll phase out the Firebase API, with a generous deprecation period. I'd be curious to hear people's thoughts about what a generous deprecation period might be.
The great thing about the Firebase API is that it's live, so you can subscribe to updates to a post/comment/profile and see them as they're made (no polling, just websockets). I'd hope any future version would retain this feature.
I'm reluctant to lose it as I'm about to release a browser extension that relies on it pretty heavily :)
[Edit]: this feature is undocumented in the link above, but it's a standard feature of FB and works as expected in all client libraries https://firebase.google.com/docs/database
So is that to say it's just an unintended consequence of the data being hosted on Firebase? That's wild. Especially as I'd be surprised if most integrations aren't using this feature (IMO it's by and far the best feature of the API).
It was certainly an unintended consequence of the data being hosted on Firebase. I'm not saying it isn't a valuable feature! And if people are doing interesting and useful things with it, I'm open to implementing something to keep that going—as long as it's simple and there's a clear line between it and "reimplement Firebase", which would be a foolish thing for us to try to undertake.
Thanks, appreciate the insight! I think client pub/sub is the feature request, not firebase clone :)
To that point there's definitely a few OSS solutions out there that provide this kind of functionality out of the box (recent YC alumni among them). But then who knows what problem you're trying to solve. Sounds like this is just rain on the tip of the iceberg.
It depends on how hard it is for us to implement something like that in HN's software. I'm sorry not to have a clear answer for you guys, but we haven't had a chance to think carefully about this and it will be a while before we do.
I'd say it depends on how straightforward the new API will be. The app I use (Materialistic) is both free and ads-free. So I don't expect the developer to drop everything they are doing to update it if will require a lot of work. So 6 months minimum if everything is well documented, a year if it will be a hassle.
I use materialistic also, but as far as I'm aware that hasn't been an update to the app in years. In addition I believe the materialistic app listing on Google Play is also been removed.
Do you have some updated version that I'm unaware of?
They have to scrape the site in the background to work. While this is doable, any such client will need to follow the HN bot policy which limits the scrapping speed.
Another issue is browser only client. If HN has set proper cors (I didn't check if it is the case), then you cannot build a web client through scrapping without proxy.
Yes. At the very least some kind of pub/sub interface? Also worth noting that, if it's the data depth issue that's looking to be resolved, the Algolia API is often useful in places the Firebase structure falls short.
Wish API would show comment votes instead of only showing them up until 8 years ago when displaying them was turned off on the main site. Aside from allowing users to enable them via their on interface, do analysis, etc — it broke HN’s comment search based on comment popularity.
I’d like to be able to conveniently access my most popular comments. There are quite a few things in there that I might like to write longer-form articles about.
Which would be possible if HN was API first and all data beside email for recovery were public. Otherwise, HN is simply not designed to be responsive to user system design requests — but their have been a massive amount of alternative systems built by users that built things HN would have never made.
(Aside: I finally got round to scraping my comments in that way. My comments follow the sort of distribution you’d expect, ranging from −4 to +76, but there’s one outlier at +283, https://news.ycombinator.com/item?id=27274146. People really hate stale bots on bug trackers.)
I don’t want vote information to be publicly available. Hiding those numbers was good for the site. Users have ways to get their own data. That’s enough.
If it was up to me, all data would be publicly available aside from emergency recovery email and password.
This would include mod commutations and mod logs; unless it was security related, which would have a formal process with deadlines for release.
—-
To address you specific question as why the comment voting data would be of use, among other reasons, it would allow users to have custom filters. For example, posts and comments with high up/down votes tend to be flamewars. Mods also random boost/ding posts/comments too.
It's not going to happen. Showing comments led to constantly flamewars about unfair commenting. Even without the comment scores, the mods are still constantly having to remind people to stop arguing about scores. The decision not to display or make available other people's scores was litigated intensely; it is now part of the premise of the community.
By “showing comments led” do you mean showing comment points/votes? Also, if comment viewable comment points are gone forever, then the “popularity” filter should be removed from HN’s search and the old voting data removed from the public API.
To me, if users have concerns about “unfair voting” to me the solution is not to hide them. Also, if comment points aren’t important, why even allow them? They are obviously important and if a users is not available to respect guidelines not to complain about them, what’s difference between that an guidelines related to comments about being downvoted?
Like I said, this has been heavily litigated. So much so that I think you'd have difficulty coming up with an argument that hadn't been made already about it. Not making other people's comment scores available is, as I said, a part of the premise of the community. There are other communities that work differently. There isn't a single optimal set of policies.
You’re right, obviously the best way to secure the community was to obscure the votes of people that were upvoting toxic content instead of the mods/community just publicly tagging and flagging the related undesirable content and penalizing the voting weights of anyone upvoting that content.
If votes being public impacts voting patterns, listing top users, having usernames publicly associated with comments, being able to follow given user’s comments, etc — for sure does too. What’s the difference?
You're arguing here as if there is an argument to be had. I'm trying to help you understand that there is no such argument. Your claims have been considered and rejected hundreds of times over the last 10 years. It'd behoove you to consult the search bar below and read some of those old discussions. This isn't a case where the community made a rash decision and just doesn't realize how problematic it is; you're talking about what is probably the single most deliberate community management decision HN has ever made. It's not going to change.
A moment later
Maybe it'll help if you get your head around the goal of HN. There are a bunch of things you personally seem to want to accomplish with HN. But HN itself has just one goal: to nurture curious conversation. Other sites have other, equally valid goals: propagating the news as quickly as possible, or relentless focus on a particular topic, or getting the best Q&A pairs to the top of Google's search results. HN, though: curious conversation. That's the whole ballgame.
Anything that drags on curious conversation is a non-starter on HN, even if it might make a bunch of things better for you.
We don't have to guess about whether making comment scores available is a drag on curious conversations; it manifestly was, for years. People have an innate, visceral reaction to comment scores they feel are unjust, and they talk about them, and those conversations choke the landscape like kudzu.
A lot of things about HN make more sense when you accept the premise of the site, and understand that HN will make most sacrifices it can come up with to optimize for that premise.
Just to be clear, intentionally not responding, since you repeatedly in my opinion ignored my points and contrary to your claims, did not engage me out of intellectual curiosity.
I think tptacek is saying something meaningful in suggesting that there's already a long history to this discussion. Personally I disagree that the soul of HN is about curiosity. If you removed all the explicit comments about curiosity and showed the site to new tech people, I don't think there would be consensus on curiosity. If you showed a bunch of people the Rust subreddit, there would be consensus that it's soul is Rust.
Personally I see this site as the Reddit of tech with 20% of its flavor coming from YC companies. That is how it behaves, as opposed what moderators say the site is about.
'curious conversation' is a much loftier, woolier and syntheticier goal than 'Rust' and HN strives towards it imperfectly. The measure, though, is 'does this or that change bring us closer to it' not 'can people unfamiliar with the site achieve mindmeld with dang by staring at the front page'.
I hear you, but what is the endgame with this? Does it lead to a better HN?
Lobste.rs actually exposes even less and doesn't provide an API because the users haven't consented to such intrusions.
Personally, I don't want to be able to review every vote for or against my HN account. It wouldn't be productive and would likely lead to me not wanting to be here at all. It'd be like having a super strong mirror and then looking at your skin and being confronted with too much information about the reality of how nasty looking and imperfect being human is. Basically not a healthy pastime.
FWIW, if you're truly passionate about this, it's not difficult to setup your own site with the lobste.rs open-source code. Maybe there is a whole subset of like-minded folks.
The lobste.rs origin story may also be of interest to you. The TL;DR is jcs was being singled out and abused by pg on HN. Thankfully there is no evidence dang has (or would ever) do this, it was certainly toxic behavior on Paul's part. Being a moderator isn't for everyone, it's a Certain Thing, and net-positive mods who avoid power tripping are truly rare gems on the Internet. Open information would guarantee such abuses couldn't happen in the dark, but at what cost to the best aspects enabled by the HN ecosystem?
This is a website where people post links and other people comment below them. Most front page discussions don't surpass 50 replies. I don't really think we should care so much about this place to the point that we're willing to employ state of the art data analysis tools just to read a blog post and see if someone has critiqued it in the comments
For starters, that you only want HN to be used the way you want to use it; which for sure it not the way I want to use it and more than likely significant number of other users too. If that’s not gatekeeping, then what is?
> Lobste.rs is basically dead compared to HN — largely because it’s an HN clone.
My outsider opinion is that lobsters is dead because the sign up process is designed to keep people out. I appreciate there's also the network effect of people coming here because people come here, but I for sure would visit lobsters too if I could participate without having to know a friend to get past the velvet rope
In 2019 I set up a bot to store titles for HN stories as soon as they appear on the "new" page, so that I could then compare them to the current version.
I had also made a browser extension for FF and Chrome that compared the current version of any HN story to the stored one, and displayed a warning if they differed.
There was zero downloads of the extensions, so they fell into disrepair and don't work anymore, but the cron job for storing the titles is still active.
If there is any interest I could resuscitate the extensions, or some other system to make that information useful.
I found the official API quite a pain to work with while building myself an alternative "hn ui"[0]. I mostly switched over to using the Algolia API[1], which is a lot more "fun" to work with. I can only recommend checking that one out as well.
I wrote a fairly simple "Who is hiring?" browser (using this API) that fetched the post listings and stored them locally in an HTML file accessible by pointing your browser at file:///... . It supported filtering by "Remote", "Interns", and "Visa" with buttons, had an ad hoc Regex filter, and the ability to remove posts using a delete button and not see them again (or to restore all deleted posts) using the localStorage api.
It was indeed barebones but I'd like to think it helped me find a job. I would link the code (it's on GitHub) but today the "Who is hiring?" posts include a few considerably more advanced and capable "searchers" right in the top of the post:
I am definitely happy to share, but I didn’t expect any interest and I posted my comment under my HN pseudonym account. However when I link to the repo on GitHub that thin veneer of obscurity will disappear and I’d prefer that it did not. Being a programmer, I am quite familiar with workarounds and I am happy to offer you couple of different options:
1. You could add an email address to https://news.ycombinator.com/user?id=maxique and I can email you there - I’m happy to share my name and contact info in a one-off request, just not on an ongoing basis for anyone to see.
2. I could clone the repo onto my current machine and delete the .git dir, leaving the code and the README.md intact, produce an archive of that stuff, and link it here in this thread. While I’m sure it may be possible to use a code search tool to find the repo on GitHub after that, I am not terribly bothered by that.
I am open to other options as well if you have any suggestions. Finally, the code was written for whatever version of the Ruby interpreter (MRI) was current in 2016, in case that will dissuade your interest.
EDIT: I have a bias to action, appreciate interest in my work, and figured it was likely that you would see my reply at some later point and then I'd have to see your reply to my reply at some even later point, and we'd continue the loop, and eventually your interest would wane. So I went with option two. Here you go: https://we.tl/t-Bu257EdQn9
This is my first time using WeTransfer; let me know how it works out for you. I actually originally had a link that I hosted on file.io (which I've also never used before), but it looks like that gets automatically deleted after one download, which seems slightly excessive to me.
Firebase is owned by google. There are plenty of OSS solutions to create this exact realtime thing without getting too close to the elephant in the room.
I'm currently writing an HN posts score tracker (basically it plots score variations over time since they were in new for each popular post) and it's really handy that you can just subscribe with eventsource [0] and get realtime update when something changes without having to manually poll each post wasting resources. Not yet sure about the latency though.
I know but it does not provide detailed and raw data, making it impossibile to do a bunch of interesting in-depth analysis. Also, I like to build this kind of stuff to explore async rust capabilities.
If you want to see a small example of this API being used, here's a utility to generate an RSS feed of replies to comments you post that you can self-host:
Anyone know of a good tool to get something like push notifications when specific topics come up on HN? I've been thinking maybe I should build one, but I also feel like someone must have already done it by now.
You can check this [1] service that I prepared for myself then made it publicly available. For now it’s only sending popular stories as notification. Planning to add keyword feature.
Don’t have any suggestions but as an alternative to notifications, I wrote a script to check the HN API for keywords such as “outage” “downtime” “breached” and show the results on a TV screen in my office.
Has been quite useful for the ops side of what I work on!
Similar to this, I created "HN Faves API", which is an API to fetch a user's favorited posts and comments on HN. Interestingly, HN doesn't have an official API for this.
I used to query flagged comments from time to time using the API, just out of some morbid curiosity. It definitely used to work for that; I am not sure if flagged items are in the same class as dead ones, though, or if they have since deliberately eliminated this capability. I eventually found more interesting things to do :)
The problem was that many third party apps and clients were displaying flagged and [dead] comments as if they were normal comments. Then we'd get angry emails about all the horrendous comments we were "condoning" when in fact the users posting them were as banned as banned could be.
For a few years I am working on the side on a HN client. One thing that is really sad is that there is now way to handle authentication properly as a 3rd party. The user basically has to trust you with her/his password or you have to "steal" the auth token from a login.
What is the use case? I can't imagine anything besides getting a real time view on the frontpage or likes of my own comments that a monthly static data dump wouldn't solve.
Probably 3rd party clients that offer quality of life / usability improvements over the website.
E.g. I use Hackers [1][2] because it is much easier to read stories on mobile and supports dark mode. Unfortunately, it doesn't support commenting or downvoting (yet?)
I am the developer of HACK (available on iOS, Android and MacOS) which allows commenting, upvoting, downvoting, favoriting, submitting new posts and also supports push notifications for replies. It is the top Hacker News client in the App Store.
Writing this reply with HACK, thanks for the app! If you are comfortable with it, could you share rough revenue estimate for the app of such popularity? How do you approach promoting such a specific app?
Would it be feasible to make a Firebase-to-Supabase connector?
Having a self-hosted private HN Algolia (perhaps via Elasticsearch) would be nice.
Algolia's forced fuzziness and limited number of results per page are a drag. Why can't they provide any search controls? Or at least support the "-" operator to exclude a term.
There was a link to the API documentation at the bottom for as long as I can remember using HN, but I’m not sure if it always pointed to the GitHub-page linked in this post.
> The v0 API is essentially a dump of our in-memory data structures.
TIL that HN doesn't use a "proper" / persistent DB. I wonder then how HN handles data persistence. Is it as simple as creating write-ahead logs + checkpointing every some duration?
No they don’t. This forum was created as a MVP of a webapp using PG’s Arc Lisp. All user and comment data are stored as tables serialized to text files or in memory. You can find an old version of the code at http://arclanguage.org
So they do use a database? In the form of text files. The comment I was responding to was edited, and before it was asking if they stored everything in memory and how they handled reboots.
Instead, try considering the (more unknown) Algolia API[0]. It supports fetching the entire tree in one request. Unfortunately the comments aren’t sorted, so you’ll still have to use the official one (or parse the HN HTML) if you need that.
For my HN client[1], I parse the HN HTML (as it’s required for voting), and sort the comments from the Algolia API. It’s much faster than recursively requesting the official API.
[0]: https://hn.algolia.com/api
[1]: https://github.com/goranmoomin/HackerNews