I built this small app in my spare time to aggregate books recommended on Hacker News. I personally find books recommended on HN to be super helpful, so I think this is the way that I can contribute back.
This book aggregation idea is not new. A bunch of sites have done similar things [1, 2, 3].
Yet one common limitation of those sites is that they have limited recall (i.e. not able to get a comprehensive set of book mentions), and thus don't paint an accurate picture of what the top books are. They're all based on insufficient rules, e.g., looking for Amazon Links. As you can see from my app, people often do not include Amazon links when recommending a book.
I wondered, why can't we just match book names? Well, not so easy. Some books have pretty short names, e.g. Meditations [4], or Steve Jobs [5]. Some book name might as well be the name of a movie, e.g. Ready Player One [6]. Simply matching the names of the books would produce a whole lot of irrelevant results.
This is where Deep Learning comes into play. Recent advances in large NLP models (transformers and BERT in particular) have made machine language understanding unprecedentedly accurate. It enables me to fine-tune a BERT model on a couple thousand labeled HN comments and predict accurately whether each word in a comment is part of a book or not - a task commonly termed as Named Entity Recognition (NER).
As a result, my app is able to present a whole lot more results while maintaining desirable accuracy. For example, NER works pretty well on the tough examples I mentioned ([4, 5, 6]). Compared to prior sites, my app captures 9-50X more mentions and thus presents a much more complete picture of what books are recommended on HN.
Furthermore, I've made sure that the comments are presented well in the UI because the recommendations are just as useful as the books. I highlighted the mentioned book name, and used a custom NLP-based ranking function to sort the comments. These are non-trivial improvements over prior sites, which I hope you can find useful.
Nevertheless, this app is not without limitations: 1) matching book names would fail when two books have the same or similar names; 2) although not often, this approach would wrongly classify some short stop-word names [7] and 3) sometimes NER fails to see that the commenter actually hates the book. These problems can be alleviated with more Deep Learning. For 1), one can use BERT to learn the authors mentioned which can be used as a filtering criteria. 2) and 3) should be fixable with more training data (currently there are only ~4,000 hand-labeled HN comments).
Lastly, I'd like to especially thank my gf who helped me label ~1,000 comments, which boosted the model accuracy by 5 percent! I also want to thank the people who create and maintain the HackerNews big query dataset [8]. And of course, thank everyone on HN who recommends books to others.
Hope you enjoy this app! Feedback and suggestions are welcome :)
Nice. The Hacker News archive contains a wealth of great information. I've previously performed similar extractions like OP but with grep and SQL. I've also looked for people who have accurately predicted the stock market (I did identify one pro investor. He's now into NFTs). I've found so much cool stuff, spending whole nights looking for interesting users and reading their entire post histories and being blown away by many insightful posts. I've been considering making a blog consisting entirely of insightful HN posts that I come across.
i think someone should start a fund that exclusively invests in startups that get torn to shreds on HN, “Why would anyone use this, I could build it in a weekend!”.. often means the startup goes on to reach a billion dollar valuation.
here's my idea: understand the sentiment of HN comments using BOW models on successful startups already lunched, invest in the next ones.
Sounds like an IFF fallacy— sure, HN hates some thing that turn out to be successful, but lots of things HN hates really are terrible (or turn out to be successful for unrelated reasons, like the team was great and they managed to pivot away from the terrible idea).
Anyone technically minded used FTP back in the days. So why is there a need for Dropbox (at that time).
That is what makes it hard to invest if you think too much or "know" too much. You get blinkers that prevent you seeing what become obvious successes because of UX improvements for the non-technical crowd.
It’s not like… that’s precisely the scenario being referred to. The problem is that’s always the example, which suggests that the dropbox event is likely closer to the exception than the rule. And it’s existence is used to deflect all criticisms, which is generally an indefensibly dumb strategy.
Knowing the subject well may put blinders in some situations… but what’s the alternative? Know the subject poorly, flail about wildly and hope you land something by pure chance? Obviously knowledge isn’t the problem here; you need it to qualify if it’s a good decision or not. It’s the over-specialization, combined with the lack of empathy for the average user that derived the dropbox event.
You mention the Amazon links are not affiliate links. As a default, that's a nice move, but I believe you are within your rights to add a toggle to enable affiliate links. The money probably isn't the point but it's nice to make enough to buy even a single beer or coffee from a side project, and honestly I believe just about everyone would toggle the option if they found the tool useful.
I would go further and suggest just having a link to wikipedia or goodreads or some other non-money-generating site and one (always with affiliate tags) to amazon.
IMO it's not a "bad surprise" to a user that an amazon link is an affiliate one, it's just annoying when the only way you can get information on a book is through affiliate links.
> As a default, that's a nice move, but I believe you are within your rights to add a toggle to enable affiliate links.
I'd go even further. It's a loud minority that even cares at all about this and they make it feel like everybody agrees with them, but most people don't care and you're totally within your rights to do it. You should go for it!
The downside of affiliate links that I'm aware of is that Amazon makes available the entirety of any orders credited to an affiliate, so if multiple items are ordered the affiliate is able to see the entire order.
Clicking through an affiliate link associates future orders for 24 hours, and once an order is placed no further orders are associated. (If an item is added to the cart while associated with an affiliate link it stays associated for 90 days, but it's not clear to me what information is provided to the associate regarding other items in the order if that item is eventually purchased.)
You mentioned transformers and BERT for large NLP models. I've been playing around with this too and it's a really powerful approach. Have you used spacy-transformers? [0]
The approach is pretty cool and can be used with BERT, GPT-2/Hugging Face etc.
I'm just starting to experiment with GPT-J and thinking of trying this approach also [1].
Anyway, totally awesome project and the results are really good. This stuff really is almost unreasonably effective!
This is a really good application of it. Getting NER right for something like book titles with so much name collision with other domains and entity types is really hard, and this works great on something that most people would never realize would be so hard!
This is really impressive! Can you please elaborate more on the way you labeled the data? I think usually there is a lot to learn from labeling methods.
I generated training comments by matching book names. Roughly, there are one in five of those comments that actually have a book mention. Then I use the Doccano labeling tool to label the tokens in the comments.
Ah right now ranking is really rudimentary: I just use TextBlob's SA library to get a sentiment score for the comment, and combine it with the comment length. In theory we can also use BERT to get a sentiment score, which should be more accurate I guess? Love to hear your thoughts.
Simple is good! Especially for ranking as the objective is hard to define. Having looked at some samples on your site, imo it's good enough :)
Though if you wanted to try other things just for fun, maybe:
- Count matching NER, comments with a lot of book recommendations tend not to detail why they like the specific one currently filtered
- maybe down weight comments that are too short (after the matched NER is subtracted) as they seem to just have a title+link?
Not ML but while I'm here, on Android the comment section appear fine but has an horizontal scroll that seems spurious, with lots of blank space to the right (Galaxy S10, chrome & firefox)
Right, the model is not perfect with limited training dataset I have (we hand labeled 4,000 - which is already tons of work for a side project). But the intention was to filter out negative ones.
You did a stellar job here thanks so much for this addition to the community !
On labeling, if you have a method statement or some go-by referance I am sure you would get some support here - I know I would help ! Maybe package a few blocks of 100 unlabeled comments with a readme & see what happens ?
It got some titles completely wrong too. For instance, “Open: an autobiography” by Andre Agassi was erroneously listed high in the title list as “139 comments,” however most of the comments are recommending various titles that start with “Open” many/mostly related to titles with “Open source” on the name— but the ML is attributing them all to the Agassi book..
Lessons: My Path to a Meaningful Life by Gisele Bündchen, the top model, is probably the most out of place recommendation :)
None of the comments is about the book obviously, they just mention the word "Lessons".
> My own comment about how I hated Thinking, Fast and Slow seems to be counted as a recommendation.
What is the level of sentiment analysis in natural language processing? Would it be easy to add the feature, to recognize whether the book was mentioned in a positive or negative light?
> I have not yet read the good book Atlas Shrugged but be sure to check it out based on your recommendation.
You're delusional. Where did I ever recommend reading
Atlas Shrugged? Ayn Rand is nuts.
If you want to see some amusing "recommendations" I'd check out The Communist Manifesto by Karl Marx and what comments it's drawn. I think the network trying to find recommendations needs to incorporate more sentiment analysis.
i.e "Guards Guards by Sir Terry Pratchett is a great book" vs. "I've never read anything as slow and uninteresting as The Two Towers by J.R.R. Tolkein" or "I thought Seveneves by Neal Stephenson was good - but it probably should've been two separate books with the second half actually having some meat to it."
Yeah, I'm seeing some issues with Code by Petzold citing comments that are talking about e.g. Code Complete or just code in general, but with such a generic name (and given the forum) it's actually pretty impressive to me that most comments are identified correctly.
Edit: another one that is tough is Open by Agassi - seems most of these comments do not actually have anything to do with the book. I would guess most one-word titles will have similar issues.
That's correct observation. I'm guessing it has to do with whether the words after Open are indicative enough to the model that they should be brought in together with Open. As I said in other comments, with more training data this issue will likely go away. And these tough comments are the best candidates.
It also seems to have trouble with books that aren't available on Amazon -- a fair number are available through other vendors or free on various websites, and if Amazon doesn't sell it, it's not being caught.
It works great, I've already seen a couple of things I fancy reading now.
I am amazed that the app can find any relevant comments mentioning "It" by Steven King, for example. That seems like magic. A few IT and It's are in there too, but still impressive.
I was intrigued to see "Twilight" quite high on the list of fiction, doesn't seem like something the HN demographic would read. Turns out most comments were about the board game / video game Twilight Struggle, or Zelda Twilight Princess, and those comments mentioning the book were not doing so in a particularly positive way.
"One Hundred Years Of Solitude" is often commented as "100 Years Of Solitude" so the system can miss a few comments there.
For an improvement, it'd be handy if the comments that mention other books had a way to link you to that book, so you could see other comments. For example, if you click Dune, and see someone saying "yeah, I liked Dune, and also x, y and z", then those other books should be clickable too, so you could see what other people say about them. Seems like the information could already be there in the database as a list-like comment could potentially appear in many book recommendations, but who knows.
Very cool. This one’s wrong though: “Zero: The Biography of a Dangerous Idea”. Comments are talking about other books with “zero” in the tile, such as Thiel’s “Zero To One”. Perhaps parse longer titles first, and eliminate them, before matching for shorter titles? Great MVP. Had in fact been thinking about how great it would be to gather book data from HN myself just yesterday. So am really happy to see that someone actually made it. Plus, it looks great and is fun to use.
Thanks. In theory this is the model's fault that's not learning "Zero to One" should be considered as a whole book. One limitation I mentioned in my root comment. Should be fixable with more training data!
I did something similar with RoBERTa and my own Kindle library to graph (with D3.js) all mentions/citations between my books (which books cites another books I have). I sorted the final graph by publication date to see some cool historical patterns of books citing another older books [1]
I also manually annotated ~1000 book mentions, but I combine RoBERTa with string search (I list all titles I want to search a priori) to reduce the number of false positives. I also augumented the dataset with thousands of books titles and metadata from goodreads.
The medium post is amazingly written! I basically did the same thing - and you beat me with the data augmentation piece. I tried using nlpaug [0] but it didn't improve the model performance. I'll definitely try swapping book titles around.
A few years ago I found an article that was something like '100 short books everyone should read before they're 40'. It was a mix of fiction and non-fiction. I've never been able to find it again! But I really liked the list because these are books you can consume in a few hours and may be life changing.
I remember a few of the titles: Games People Play, Meditations, The Prince, The Art of War. (I suppose it may have been non-fiction only, although I think The Awakening may have been on there.)
That's a bit like saying watching porn is more for personal development than fun. Perhaps you'll learn something, but it's incidental.
I've learned a lot from HN. But it wouldn't be good to fool myself into thinking that an employer wants to fund my personal development in this regard. Otherwise, they'd pay me to HN all day.
The crux of the issue is that it's impossible to work 8 hours every day. We all invent lies to fill the downtime.
Is all that hyperbole really necessary? Each new sentence seems primed to leak edge and corner cases. Without giving more attention to such a rhetorical blind spot, I wonder how one could imagine they know the crux from the passenger side door.
If it’s mistaken, it should be easy to explain why. Otherwise I’m inclined to believe it’s merely an uncomfortable truth.
Would your employer pay you to HN all day? If not, precisely how much of your day are they comfortable with you HN’ing? Are you sure it’s officially approved?
There’s often good intel on here that I have not been exposed to on other sources. Obviously spending 40 hours a week reading HN and getting into politics arguments on here is a waste, but there’s plenty of relevant news for most folks in tech if you stick to those topics.
Do you collect stats, how assertions line up with facts? Otherwise, what may seem likely and catchy, might just as well be opinionated, unsubstantiated and patchy.
HN really likes Neal Stephenson. I've never read a book of his that I didn't love, so will be definitely looking though more of the recommended fiction from the community here.
I really liked Seveneves, but totally understand why someone wouldn't like parts of it. I am curious as to which third you skimmed though as the book is effectively three different books with very different moods.
The first third is very intense, I also was a bit fatigued for that middle third. When I have reread Seveneves I have skipped it. I really like the last third, but it does feel like a completely different book.
Only one I haven't read. Skip it I guess. Reading DoDo now, so verdict's out, but I absolutely adored every other of his books. Fall as a pleasant surprise, since it seemed controversial.
Reamde would have been my first recomendation to introduce someone to Stevenson. Constant action, an unusually direct storyline for him, and not too too much time spent off in the weeds.
Of course some might argue those are the best things about his books, but while Chryptonomicon is perhaps my favorite sci-fi novel I've read, I think it takes a certaim type of person to enjoy a book where the plot gets interupted for 5 page descriptions on how to eat Captain Crunch or a who fucked who of the Greek pantheon.
1. I regret you earned $0 for helping me spending so much on books. Have you considered setting up affiliate links or a donation button? Maybe affiliate links as a service will be your next project.
2. The Amazon links are for Amazon.com, but I'm in Canada. Maybe easy internationalized Amazon affiliate links will be your next project.
After reading such comments here on HN, last month I got myself a local library card and it has turned out to be a great decision! I am using Libby app to get digital books and even audiobooks! Absolutely fantastic
The Libby app has also made some great usability improvements lately for people who have multiple library cards. If you live in an area with multiple library systems (like city and county), it's totally worth signing up for both.
Or just order used books from thriftbooks.com. It’s the only place I buy books for the last few years now. Cheapest prices (almost) always, but even if they’re not quite the cheapest for a particular title, the free shipping on any order over $10 always puts it over the top.
Note also that addall.com used book search doesn’t include thriftbooks in the results, so I just always go straight to thriftbooks and don’t bother searching addall.com anymore..
You know what, if commenter OP finds value in the services offered; and wishes to compensate the author of the software - just gonna say - I have no problem with that.
I regret that some people seem to think that any sort of compensation for services rendered or monetization in any way is automatically bad or wrong somehow.
If you walked into someone's house and looked at their bookshelves, and they had most of these on them, or their books were mostly in the union of a subset of these, I'd wonder what one might speculate about them.
Looking forward to the criticism that results from mapping the co-ordinates of this ontology, as one could weave a narrative around most of the books that aggregates them into types and categories themselves, then transmit the criticism without the substance to some believers, which could codify into an "anti-HN" ideology (which is just a peculiar form of fan club.) Calling "hacker-critical" as a pseudo academic backlash trend now, and a "hacker studies" course designed to encircle these ideas with criticism as levers to manage people who have them. Really, if you aren't using AI to create predictive levers about people's beliefs and behaviors to manage and extract value from them, what are we doing with it. :)
Super cool to create this though, as it would be really interesting for other comminities, potentially subreddits.
The algorithm is definitely missing some recommendations in it's current form, but I suspect if it weren't I'd be pretty close to your description.
Edit: I just did an HN search for the few books I couldn't find with this app and was able to find comments recommending all of them. Not sure if this means I need to branch out or just that HN reads a lot..
Cool project, and cool resultats. As an anthropologist who reads HN as a way to keep abreast of the tech community and tech insights, its interesting to see atlas shrugged as one of the most often recommended books. Interesting and maybe slightly disturbing. HN would make for quite interesting source material for someone who wanted to study tech culture.
I'd be careful about that generalization. This software seems to be going more by mentions than by recommendations - e.g. the top reply to https://news.ycombinator.com/item?id=16323808 ("Ask HN: Which are the most damaging books you've read?") is being counted as a recommendation.
Sentiment analysis is hard. In fact I've never seen it work yet.
Great work but do note that the list basically looks slightly better than an amazon list (atlas shrugged lol). I think some effort into more useful ranking (looking for metrics of controversiality or maybe page rank) might make it more useful!
I am also curious to know if the # of votes is integrated into the ranking at all, possibly weighted. Could also attempt NLP Text Sentiment analysis to influence the model as well.
This is not a criticism of the work done, but I think the top 20-40 mentions are extremely obvious and a regular reader might be able to guess a good portion of these recommendations. What is really interesting - and started at with the “categories” — is tying the recommendations to explicit context. I didn’t dive too deep into the recommendations, but are the categories by book category, or by originating topic of conversation? It’s a narrow distinction, but a useful one. I’d love to see deep learning pull up a hierarchy of conversational topics on hacker news and match recommendations to those trees.
BTW, going through that list, I see why I love the HN crowd. 70 % of those books I’ve read myself, and did so before coming to HN. There must be some strong personality type filtering going on.
I think it's been quite obvious there's some personality type filtering going on, as with most online communities. I'm quite curious how it'd be quantified. Surely software engineers, startup founder, ADHD, INTJ, and Meyers-Briggs-is-bogus types are overrepresented. Might tell us a bit more about ourselves...
I'm pretty sure there's more Warhammer 40k books than there are days in the year... It's like someone heard the term "space opera" and thought that meant "soap opera in space".
Recommendations would include comments like, "This novel is really the one that ties the previous 37 books together." or "You might want to skip the next dozen books if you're squeamish about things that ooze."
While I don't consider the 40k books on par with the better science fiction out there, I do enjoy that they bring a bit of scale and what it means to space. It's a different take from the rosy, post-scarcity, future of space. Bad things are really bad. Unattended good things turn bad on their own just from drift.
Then there is there is the unashamed embrace of over-the-top in so many different ways.
I've always considered 40k a satire of the entire sci-fi genre (and in many ways a satire on modern politics). In that way, I find it quite refreshing. And your statement of "unashamed embrace of over-the-top" resonates quite well.
So I'm very curious how you managed to find book titles, I ran into a lot of issues trying to figure out, for example, with "Clean Code" whether to search for "Clean Code" or "Clean Code: A Handbook of Agile Software Craftsmanship" since people mentioning the book used both instances. And of course someone mentioning just "Clean Code" might be referring to the concept not the book. I ended up settling on `${titleMinusColon} - ${author}` but I'd love to know what your approach was given that you used deep learning to search.
EDIT: Just read your comment below on your approach, very interesting!
Found it interesting that I couldn't find results for Knuth (The Art of Computer Programming) or SICP on here. Maybe the casual way we refer to these texts is hard to detect as a reference to a book, or their importance is just implied community knowledge?
Not much - but it needs a new set of training data for research papers. Btw - there seems to be an existing website for this already: https://www.hackernewspapers.com/ Although it only looks for posts.
I'd assume that Arxiv links are often there. So it's a problem that can be addressed with an easier solution (just looking for Arxiv links).
This is awesome, thanks for putting it together! For me, I've had the most luck with HN fiction recommendations so I went there first. The distinction between "Literature & Fiction" vs "Science Fiction & Fantasy" seems to be a bit arbitrary. For example The Hobbit is in one category but LOTR is in the other, Neal Stephenson and Andy Weir are in both, etc. It might make sense to just merge all the fiction together. Plus that way you can short-circuit any debates about "is science fiction literature" :)
Edit: another little nit: it looks like quite a few books list audiobook narrators as coauthors
Interesting idea, but this is mentions of books, not recommendations. It includes comments by someone that's reading the book, has it on their reading list, or read it and thought it was terrible.
The intention was to only show recommendations. But because of limited training data (we hand labeled ~4000 comments), the model wasn't able to filter out bad ones effectively. More training data should be able to solve it.
Nice! I built a similar NER for book recommendations fine-tuned on a manually labeled dataset of book recommendations mentioned in podcast transcripts.
The whole project is open-source and I already added a few podcast shows with all their book recommendations (I have to add a lot more though): https://github.com/JohannesHa/PodcastBookLibraryMonoRepo
This is awesome. The best thing is that it's so fast to navigate. I like how the HN comments are styled just like on HN.
A couple of thoughts:
* It would be great if each book were to have its own URL (for sharing).
* Consider allowing the search to allow author input, e.g. if I want to find the book 'Who' by Geoff Smart, the single-word title isn't specific enough to show that book at the top of the search results.
If the dataset were perfect, maybe. But, if a book with a single-word title has only few comments, it's plausible that most/all of those comments are false matches.
In the case of the book I searched 'Who', showing it in 4th position seemed about right.
What data source are you using for the books, authors and covers? I looked at OpenLibrary [1] but the covers are not the same, so I suppose it is something else? Maybe Amazon directly somehow?
A loss which might be worth debugging (maybe it contributes to the whole pattern of losses -- didn't dig deeper): "Brave: A Teen Girl's Guide to Beating Worry and Anxiety" is never explicitly mentioned, but all ~60 misclassified references are actually referring to "Brave" browser or "Brave New World" book.
Does it take into account negative reviews/comments? I have seen that Why we sleep is being recommended in the 6 months tab, but, while it was received with a lot of praise, it was soon critizised by others researchers in the field and I would expect that the HN crowd would have followed that trend.
When I labeled the comments, I didn't label books that were criticized. So in theory the model should filter out negative reviews. But currently the training dataset is pretty limited in size so you still can see some negative ones. I suspect that with more training data this problem will go away.
Can’t find my recommendations for J. Scott Turner’s “The Extended Organism”
To summarise: organisms evolving to change the environment around them to their benefit. I went to Foyle’s one day with butlying a book on Termite mounds in mind, that is one chapter in the book.
I found out too late that UCL had hosted a talk by Dr Turner a year too late.
Neither list seems to include much fiction. I've been reading "Diaspora" based on HN recommendations, as well as "Snow Crash", "Cryptonomican", and "The Martian".
("Cryptonomican" was a good story, but I really hated all the jumping around every five pages. "Diaspora" has sort of so-so writing, but very hard-math sci-fi and quite interesting ideas. I think it's the "hardest" sci-fi book I've read, which includes "The Martian" and the red-green-blue mars books.)
I just bought Cryptonomican. Looking forward to reading it.
The book I am reading right now is also chosen based on HN recommendations (The Talent Code). And I am about to read the GTD book by Allen. (I hate self-help generally with very particular exceptions)
This is brilliant. Thanks for doing this. One tiny request -- can you please link the primary title to the homepage. I want to be able to click somewhere to come back to the home page after browsing around a bit.
After packing/archiving my library of physical books around 2010-11, I went all digital. However, I came back and try to stay roughly at 1:4 (physical:digital) book ratio when my daughter complained that me reading on the Kindle, "Are you really reading a book."
I re-started buying physical books in 2018. I have a knack of buying books recommended by Hackernews comments and the curated list of people I follow on Twitter. I re-started with less than 10 physical books around mid-2018. Between me and my daughter, we might have crossed 200+ physical books. I need to figure out a better way to deal with this.
Will do, good point! For now you can just delete all paths so you can go to https://hacker-recommended-books.vercel.app/ which btw redirects to the top book of the all-time list of all categories.
This is very impressive, well done on deploying this.
95% of every book I have ever read or owned is in the first 20 pages.
Its almost just as fun to read the comment chain about each book.
You must be independently wealthy because I know no one cares if their is an affiliate link. I believe affiliates are always paid to the last cookie you have.
I'm reminded of Goodhart's Law... So long as your project remains secret it'll be valuable. Once someone sees money being made from it, it'll kick off ingenuine recommendations... anyway... high quality problem to have I guess!
But, what's wrong with using Amazon affiiliate links? If anything, monetizing would be great since it would give you more incentive to maintain this wonderful application? And it doesn't cost us users anything.
Nice work! I noticed it accurately picked out solaris and associated a few recent comments, none of which were about the operating system.
There was a fantastic HN comment[0] which actually spurred me to buy and read it. Do your queries go back far enough to pick this one up? It's an interesting example where one sentence mentioning the author alludes to an association with the title of the book in a sentence that is explicitly about the OS.
Love it, will you do a write-up on how to replicate this with other sources? I'm currently analyzing both Indie Hackers and StartupsForTheRestOfUs Interview Transcriptions and this could be a fun analysis!
It's a really interesting project. And I am sure it's really hard.
I was curious how many times some common textbooks were mentioned but didn't find them via the search, which could be user error. But to give a specific example. None of the books in this comment thread were found:
Noticed an issue. Some, but not all, comments referencing Strunk and White's Elements of Style are showing up instead as Erin Gates' Elements of Style: Designing a Home & a Life
Good catch! This is the limitation mentioned in my root comment - the algorithm will fail when two books have similar names. The partial solution is to look at authors too when available. Something to be included in the future.
Amazing and super useful: If I start reading today, and I read a book a day, it'll only take 112 years to finish, assuming that no additional books will be recommended in the next century.
Interestingly I recently read "A Fire Upon the Deep" based on a HN recommendation. A quick search showed it has many mentions, yet it's not listed in your app at all.
Slightly off from post. The best book recommendation I got is from one of the following ways, dedicated recommendation service or app never worked for me:
- told by friend
- someone I admire read book and commented on it
- I'm working on some problem, during that exploration i came across books.
- random people mentioning book on platform like HN on a topic/post of my interest.
Yeah, in my experience a lot of 'general' book reviews are super critical and don't really try to hook you. Going through several reviews, you just come away with the collected gripes and nitpicks of what is otherwise a good book.
I find that I get sold on a lot more when it is just a random single comment on some thread somewhere that focuses on a single aspect of a book.
If you can find a hyper specific subreddit/forum/etc. for a sub-genre you like, then you will spend more time reading books than reviews...
> random people mentioning book on platform like HN on a topic/post of my interest.
Same! Some of my favorite book recommendations have especially come from this one, I don't know why but a one line comment on a HN thread of "what book changed your life" has become my favorite way for discovering books.
Interestingly it cannot differentiate between the different harry potter recommendation (the original books, fanfics, and that book on philosophy that mentions harry potter)
"Atlas Shrugged" is a polarizing book: people tend to either love it or hate it. And the people who love it love to tell other people how great it is, whereas many of the people who hate it just don't talk about it (because there's generally little need to talk about the badness of bad books).
I think a book list is more useful if it has some books in it that some love and some hate, rather than only books that no one minds very much. Maybe some of them will turn out to be ones I love.
Great idea. I think it's interesting how the hivemind decides what are great books. There are some objectively great books in the list, but also very debatable ones.
For example, The Design of Everyday Things has some interesting content, but I found it almost unreadable. It's such a poorly organized and written book that its design seems to go against almost every rule discussed in the actual content. Keeps surprising me that this is such a highly praised book.
This is amazing! I cashed in a bunch of Audible credits on this one. Having the comments right there was super helpful to understand the context of the mentions and other books that were related.
Do you think it would work on podcast transcripts? The formatting of the titles wouldn't be nearly as regularized so it might have quite a few false positives/negatives, but there's a lot of book conversations out there.
I have another suggestion. Can you please link the Amazon links with the option they have to lead me to my home-country's (correct) link? I don't know how but I have seen ways where they open to the right store for me, otherwise, it is too many clicking or copy-paste-searching.
I just bought 5 Children's Books and I don't mind if you can benefit from the affiliate link that goes from your account.
The longer extracts are more useful than the shorter extracts.
For Brave New World, I noticed the first 100 - 200 comments are short, and not useful as reviews so much as indicators of preference. Then after that, the comments are longer, and hence, more useful because they explain something.
It would be useful to be able to filter word length so as to be able to distinguish between Opinion Mode vs Review Mode.
In anticipation of getting flagged into oblivion, am I the only one who's disappointed in this selection of books?
Of course, taste is subjective, and it should perhaps be expected that much of the list is in line with what is read by the general public, but many of the books are either presenting fact or attempting to convince the reader of the veracity of a certain viewpoint. I'd like to read more open-ended works that ask for interpretation on the part of the reader or, at the least, don't explicitly spell out what they want the reader to walk away with. (certainly some books here fit the bill, e.g. Infinite Jest, Pride & Prejudice, etc.). Again, interests are subjective.
I think that's just the nature of pulling books from HN comments - a lot of those comments are trying to convince people of a viewpoint, so it seems unsurprising that this is the kind of list you'd end up with.
Not good or bad, just a function of where they're coming from.
And as for book recommendations, Children of Time by Adrian Tchaikovsky.
Personally I wouldn't recommend others' books to someone who is left unfulfilled by such a huge list. I would rather recommend writing or other subjectively-pinned activities, to hold the subject accountable and help them stay out of the critic zone long enough to find their way into more fulfilling growth.
I’ve frequently suggested Patterns of Democracy and it doesn't seem to have picked it up, despite apparently picking up books with only a single mention.
The problem with reading book lists like this is that nobody has time to read all the books. That's a ton of crap out there and I want HN to help me filter through them.
Thus the problem with existing solutions is NOT "limited recall" or "insufficient rules" or "no Amazon link".
And the problem with this "solution" is that there is no justification for why a book is great and applicable to my circumstances, and people have to trust your black box. Otherwise I'm likely to waste my time, just like reading books from any other crappy recommendation engine.
With a deep learning model reducing all the reviews to "book names" you've successfully removed the value of the book discussions themselves. Therefore, for me this engine and all similar engines are strictly worse than simply going through the actual big threads themselves, i.e. https://news.ycombinator.com/item?id=21900498
Edit: I've just seen the embedded comments by switching to a desktop browser. It's a nice addition. However, for me to make sure I'm not wasting my time going through arbitrary books and comments, I would need to know why a book is ranked highly compared to other books. And I want to be sure that ranking is tailored to me, at a very, very high accuracy.
> With a deep learning model reducing all the reviews to "book names" you've successfully removed the value of the book discussions themselves.
It literally shows each comment in full that it extracted the book name from. It also includes a link to the comment in the original thread. What more could you possibly want?
Oh, I was on mobile and could not see the comments section. It's interesting for sure. But what I want in particular is to learn why a book is ranked highly compared to other books. And I want to be sure that ranking is tailored to me, at a very, very high accuracy.
There's no way around the black box element of a book review, but Nassim Taleb suggests waiting a few decades and, if the book is still well known, then reading it.
Wow. What a helpful piece of advice (I guess he's smarter than the rest of us so it's hard to understand the genius of his strategy). Any mention of the cost of missing out on the content in the book for a few decades?
The idea behind reading older books is that they're already proven to be useful - it's a filter for you to spend less time on useless information. It's generally called the "Lindy effect"
Love this question. I could imagine him suggesting reading academic papers for cutting edge things; like his 'barbell' excercise strategy of mostly walking with occasional HIITs.
I know this but I wasn't able to fix it. Would love suggestions on how to keep the scroll position in one div (for the books) but not the other (for the comments) when doing client-side navigation using Next.js...
I was going to make this suggestion too, but now that I saw this comment, instead I'll suggest asking on the next.js subreddit? It seems somewhat active, maybe someone there can help - https://old.reddit.com/r/nextjs/
This is great. I just read Permutation City, which I coincidentally see recommended on HN all the time, so I was surprised not to see it in the search results or the top of the fiction or scifi lists. Any idea why that is?
Interesting lack of 1984, even though it is mentioned way too often. The lesser known "Animal Farm" and other dystopias like "Brave New World" and "Fahrenheit 451" are here.
Unrelated to the above, but it would also be nice if the site could search by author (I don't seem to get any hits when putting in author names) or even year of publication.
There's suspiciously few misspellings in the names of the books, even with words like 'millionaire' and 'righteous'. There's also 'Calculus' with 132 comments and just browsing through the comments I can find at least four different books referenced and some just talking about calculus in general.
That said it's an interesting project that clearly took some effort to put together.
Edit: Saw that the author commented on 'books with similar names. Many of the comments I saw had the books authors name in them as well, next iteration should perhaps look for and match on those as well.
Interesting. I wonder what my book's title needs to do to be classed as a book title by this engine. It's actually mentioned in the same comments as a few of the peers in its topic (Rust).
Very useful. Just one thing: You should reduce the long tail and filter out all books which have been mentioned in only one comment. I think these books are not representative for hackernews.
Thank you for this app. I would consider removing the nested scroll bar for the comments. Also, on iPhone, in the last 6 months view I don’t see any links to Amazon.
God bless you! This is amazing stuff. Could you write some sort of a whitepaper on this topic ? It seems a really good text extraction/cleanup project.
A cursory looks shown me that most if not all books do have good review on Amazon. Somehow, I hoped that HN'ers would point to book with bad reviews (that is, a misunderstood book)..
Thinking loud here - what is the difference between Google Search Algorithm & AI Based deep learning? They both are trying to do same I guess - that is structuring unstructured data?
If you write a series of blog posts about these books, and talk about whether we should read them or not, that would be totally cool to see on HN and also very meta.
In some cases there are books that some user recommended and then the book listed is not that one.
As an example, lots of people are recommending Spivak's "Calculus", or "Calculus Made Easy", from some other author, but the one listed is "Calculus" from James Stewart.
Same happens for the book "Calculus: Early Transcendentals".
Sun Tzu's Art of War is also repeated with two different editions.
Surprised to see “A pattern language” on there. I’ve read most of it in preparation for building my house. It’s more of a dictionary than a book but it’s unbelievably useful. It’s just a huge list of little things that an architect would notice over the span of his career. Little things that are important but not obvious. If you’re building a house, another really good book is “what not to build.”
I also recommend “Islamic imperialism” from Yale, “the bomb in my garden” by mahdi obeidi and “nothing to envy.”
(Plus the fact that it's a good book on its own terms. At least, it is so far as I can tell; I am not an architect and maybe some of the advice in it is actually terrible. But it seems almost always reasonable and frequently insightful, and it's well written, and the "pattern language" idea that software engineering borrowed from it is a nice one. (Though the software-engineering borrowings don't generally amount to actual pattern languages as opposed to miscellaneous grab-bags of alleged patterns.)
I built this small app in my spare time to aggregate books recommended on Hacker News. I personally find books recommended on HN to be super helpful, so I think this is the way that I can contribute back.
This book aggregation idea is not new. A bunch of sites have done similar things [1, 2, 3].
Yet one common limitation of those sites is that they have limited recall (i.e. not able to get a comprehensive set of book mentions), and thus don't paint an accurate picture of what the top books are. They're all based on insufficient rules, e.g., looking for Amazon Links. As you can see from my app, people often do not include Amazon links when recommending a book.
I wondered, why can't we just match book names? Well, not so easy. Some books have pretty short names, e.g. Meditations [4], or Steve Jobs [5]. Some book name might as well be the name of a movie, e.g. Ready Player One [6]. Simply matching the names of the books would produce a whole lot of irrelevant results.
This is where Deep Learning comes into play. Recent advances in large NLP models (transformers and BERT in particular) have made machine language understanding unprecedentedly accurate. It enables me to fine-tune a BERT model on a couple thousand labeled HN comments and predict accurately whether each word in a comment is part of a book or not - a task commonly termed as Named Entity Recognition (NER).
As a result, my app is able to present a whole lot more results while maintaining desirable accuracy. For example, NER works pretty well on the tough examples I mentioned ([4, 5, 6]). Compared to prior sites, my app captures 9-50X more mentions and thus presents a much more complete picture of what books are recommended on HN.
Furthermore, I've made sure that the comments are presented well in the UI because the recommendations are just as useful as the books. I highlighted the mentioned book name, and used a custom NLP-based ranking function to sort the comments. These are non-trivial improvements over prior sites, which I hope you can find useful.
Nevertheless, this app is not without limitations: 1) matching book names would fail when two books have the same or similar names; 2) although not often, this approach would wrongly classify some short stop-word names [7] and 3) sometimes NER fails to see that the commenter actually hates the book. These problems can be alleviated with more Deep Learning. For 1), one can use BERT to learn the authors mentioned which can be used as a filtering criteria. 2) and 3) should be fixable with more training data (currently there are only ~4,000 hand-labeled HN comments).
Lastly, I'd like to especially thank my gf who helped me label ~1,000 comments, which boosted the model accuracy by 5 percent! I also want to thank the people who create and maintain the HackerNews big query dataset [8]. And of course, thank everyone on HN who recommends books to others.
Hope you enjoy this app! Feedback and suggestions are welcome :)
[1] https://news.ycombinator.com/item?id=15169611
[2] https://news.ycombinator.com/item?id=10924741
[3] https://news.ycombinator.com/item?id=12365693
[4] https://hacker-recommended-books.vercel.app/category/0/all-t...
[5] https://hacker-recommended-books.vercel.app/category/1/all-t...
[6] https://hacker-recommended-books.vercel.app/category/0/all-t...
[7] https://hacker-recommended-books.vercel.app/category/12/past...
[8] https://news.ycombinator.com/item?id=19304326
P.s. The amazon links are NOT sponsored. This app is free of monetization.