Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

vessenes · 2025-11-06T20:40:01 1762461601

OK, I just added books until you told me I had too many. Fun idea! I have a couple of suggestions:

* UI - once someone clicks "Add" you really should remove that item from the suggested list - it's very confusing to still see it.

* Beam search / diversification -- Your system threw like 100 books at me of which I'd read 95 and heard of 2 of the other 3, so it worked for me as a predictor of what I'd read, but not so well for discovery.

I'd be interested in recommendations that pushed me into a new area, or gave me a surprising read. This is easier to do if you have a fairly complete list of what someone's read, I know. But off the top of my head, I'm imagining finding my eigenfriends, then finding books that are either controversial (very wide rating differences amongst my fellow readers) or possibly ghettoized, that is, some portion of similar readers also read this X or Y subject, but not all.

Anyway, thanks, this is fun! Hook up a VLM and let people take pictures of their bookshelf next.

kace91 · 2025-11-06T23:56:36 1762473396

(From the site) >If you visit the "intersect" page, you can input multiple books and find the set of users that have read all of those books. This can be useful for finding longer tail books that weren't popular enough to meet the threshold. For instance, if you like reading about the collapse of the Soviet Union, you could put in "Lenin's Tomb" and "Secondhand Time", and see what other books the resultant users have read.

This is how filmaffinity works, which is the best recommendation system I've tried. They have a group of several dozen 'soulmates', which are users with the most similar set of films seen and ratings given; recommendations are other stuff they also liked, and you get direct access to their lists.

>then finding books that are either controversial or possibly ghettoized

Naively, I’d say the surprises are going to be better if you filter more different friends, rather than more controversial books among your friends. As in “find me a person that’s like me only in some ways, tell me what they love”. Long term this method is much better at exposing you to new ideas rather than just finding your cliques holy wars.

dbl000 · 2025-11-06T21:40:13 1762465213

Echoing what everyone else has said here - awesome site, love how fast it was.

I did notice that when I put in a single book in a series (in my case Going Postal, Discworld #33) that tended to dominate the rest of the selection. That does make sense, but I don't want recommendations for a series I'm already well into.

Also noticed that a few books (Spycraft by Nadine Akkerman and Pete Langman, Tribalism is Dumb by Andrew Heaton) that I know are in goodreads and reviewed didn't show up in the search. I tried both author's name and the title of the book. Maybe they aren't in the dataset.

It did stumble with some books more niche books (The Complete Yes Minister). Trying the "Similar" button gave me more books that were _technically_ similar because they were novelizations of British comedy shows, but not what I was looking for.

For more common books though it lined up very well with books already on my wishlist!

costco · 2025-11-06T22:01:35 1762466495

Yes I would say the handling of series is probably the biggest problem. Once my test metrics got to a point I was happy with and my quality spot checks passed (can I follow the models recommendations from one generic history book to Steven Runciman, also making sure popular books don't always dominate the results), I was ready to release because I had been working on this project for so long. The solution is probably using the transformer model to generate 100-200 candidates and then having a reranker on top.

walletdrainer · 2025-11-07T01:31:10 1762479070

Not just series, but I seem to mostly get a list of other books from the same authors.

The recommendations from other authors are good, but as far as I can tell I’ve read every single one of them.

Continuing to aggressively add everything it recommends eventually does seem to result in some interesting books I wasn’t familiar with, but I also end up with more and more books that are of zero interest to me.

For what it’s worth, I started with:

  Infinite Jest David Foster Wallace
  Europe Central William T. Vollmann
  Gravity’s Rainbow Thomas Pynchon
  White Noise Don DeLillo
  One Hundred Years of Solitude Gabriel García Márquez

It is possible that there simply aren’t many books like these in existence, so the pool of relevant recommendations gets exhausted fairly quickly. I’d guess trending towards unrelated popular books is also just a feature of the source data, that largely sums up my experience with goodreads anyway.

Very cool project though. I did end up ordering a couple of new books, so thank you very much.

mscbuck · 2025-11-06T22:54:21 1762469661

Awesome site and speed!

My advice from someone who has built recommendation systems: Now comes the hard part! It seems like a lot of the feedback here is that it's operating pretty heavily like a content based system system, which is fine. But this is where you can probably start evaluating on other metrics like serendipity, novelty, etc. One of the best things I did for recommender systems in production is having different ones for different purposes, then aggregating them together into a final. Have a heavy content-based one to keep people in the rabbit hole. Have a heavy graph based to try and traverse and find new stuff. Have one that is heavily tuned on a specific metric for a specific purpose. Hell, throw in a pure TF-IDF/BM25/Splade based one.

The real trick of rec systems is that people want to be recommnded things differently. Having multiple systems that you can weigh differently per user is one way to be able to achieve that, usually one algorithm can't quite do that effectively.

blehn · 2025-11-06T22:09:04 1762466944

You should filter out authors from the input books in the output. If liked a book by an author, surely I'd read more of their work if I wanted to — recommending them isn't helpful. Along the same lines, I think interesting recommendations tend to be the ones that (1) I like and (2) I didn't expect. The more similar the recommendations are to the input, the more likely I already know them, and the more likely to create a recommendation echo chamber.

Semaphor · 2025-11-07T04:19:00 1762489140

> You should filter out authors from the input books in the output.

No, or at least make it configurable.

I’d agree for series, but not for Authors, just because I once read a book by someone doesn’t mean I even know they have other stuff, the list of Authors I read and enjoyed is very long.

honkycat · 2025-11-07T00:45:13 1762476313

yep, was gonna say this. Getting recommended all of the same books I've already read isn't great

yoz-y · 2025-11-06T21:03:14 1762462994

It works pretty well in the sense that after inputting only a few quite diverse books it gave me recommendations for a lot of books that I’ve already also read and enjoyed.

I would also really like a possibility to add negative signal. It did also recommend books that seemed interesting to me but I ultimately didn’t like.

Overall quite impressive.

majormajor · 2025-11-07T01:25:28 1762478728

Neat! It's a validation of the model that 75%+ of the recommendations are things I've read and also enjoyed, with a few "read, didn't like" and some more "didn't read, don't really want to."

But I think to break the content-bubble effects to find the longer tail, some way to reject or blacklist things - and have that be taken into effect in the model - might help.

daemonologist · 2025-11-07T03:49:24 1762487364

Likewise, I put in six of my favorites and had already read (and enjoyed) 29 of the 30 recommendations (I'll have to check out Blindsight by Watts). Working great but it would be cool - as with pretty much every recommendation algorithm ever - to have more of a "discovery" capability.

robertritz · 2025-11-07T02:55:06 1762484106

To add to this Youtube afaik uses multiple models to sprinkle in new content alongside your usual recommendation just for this.

maxglute · 2025-11-07T03:57:45 1762487865

Would be nice to not recommend books by author already added for novel discoverability.

gen220 · 2025-11-07T03:25:28 1762485928

The best way I’ve found for finding predictably enjoyable fiction is to read interviews with the authors I like, and read about the works and authors they admire or are influenced by. Or who they exchanged letters or communications with, if they’re long dead and no interviews proper exist.

Strongly recommend giving that a try yourself. And trying to build an algorithm around it!

Here’s an example: Tolstoy really admired Turgenev, who was friends with Theodore Storm and Gustave Flaubert, and greatly admired Gogol.

If you like Anna Karenina you’ll probably find something of value in Torrents of Spring, Immensee, Madame Bovary or Dead Souls.

It spiders out pretty quickly!

zeroq · 2025-11-07T02:48:42 1762483722

Great work!

Some five years ago I was day dreaming about recommendation engine for movies where you could say "hey Ciri, give me a good gangster flick", and it will come up with something that you haven't seen yet but you'd definitely love.

To my amazement almost everyone, even true AI believers, thought it was impossible to achieve. :d

But my question is - having such huge dataset, do we really need AI for it? SASRec/RAG is sexy, but could the same result be achieved with simple ranking and intersections like lastfm did in the past with music?

Some twenty years ago I came up with an idea of "brain" data structure for recommendations where you have all your items (books, movies or articles) modeled as a graph, and whenever you pick something it makes a ripple effect, effectively raising scores in cascade of every adjecent item.

Just like your brain works - when you stumble upon something new it immediately brings back memories of similar things from the past. I never had the opportunity to implement it and test in real life scenario, but I'd be surprised if a variant of this is not widely used across different recommendation systems, like Amazon.

esafak · 2025-11-07T02:55:34 1762484134

last.fm used a primitive machine learning algorithm too, else what are you going to rank by?

zeroq · 2025-11-07T04:12:44 1762488764

Did they? I recall similar site back from as far as from 2008. Might be them or something similar.

Anyway. I can totally see such site running purely on statistics. Every song, every artists, every genre is a bucket. You listen to a song you put a drop in these buckets. Once there's enough water running we can compare you to other users and their buckets.

It might be hard to run it on scale in real time, but c'mon, it's leetcode junior level assignment level of complication.

varenc · 2025-11-06T21:08:55 1762463335

I love this site, and the approach! Great seeing someone making good use of Goodreads data.

Sadly my experience with the book recommender isn't too great because of the 64 book limit. If I import either the most recent or least recent 64 book, 95% of the books it recommends to me are books I've read. Though it was helpful for spotting a few books I've read that I didn't log on Goodreads. Guess I'm pretty consistent.

costco · 2025-11-06T21:17:59 1762463879

I think I will expand the input books limit (sadly requires retraining) and or the output books limit of 30.

lifeisstillgood · 2025-11-07T04:13:32 1762488812

Goodreads - “hey those user written comments belong to us, you need to pay us”

HNUser - “OpenAI told you to go swivel until they made a billion and you accepted that. Samesies “

walthamstow · 2025-11-06T20:51:20 1762462280

Works pretty well with cookbooks. Very cool work.

One suggestion would be to make the search less strict on diacritics. Searching for popular cook J. Kenji López Alt was only successful if I entered the correct O.

MattGrommes · 2025-11-06T21:28:01 1762464481

This is cool but I'd love the option to filter out the author of the book you entered. I put in Shroud by Adrian Tchaikovsky and almost all the books are others by him, which is fine but doesn't really mix up the stuff I'm reading.

aj_hackman · 2025-11-06T20:34:05 1762461245

Thank you! Because of this, "The Making of Prince of Persia: Journals 1985–1993" by Jordan Mechner is on its way to my house.

qingcharles · 2025-11-06T20:59:36 1762462776

You definitely will not regret that purchase. It's a very enjoyable read.

simlevesque · 2025-11-06T23:31:59 1762471919

The How it works it way too short :) I'd love to see some scripts, know the hardware you use, etc...

By the way you could use Summa FTS Wasm + Duckdb Wasm to have the same website without any backend except file hosting. Maybe even just Duckdb Wasm with it's FTS would be enough. Summa FTS is very similar to meilisearch in essence because they're both derived from Tantivy.

https://izihawa.github.io/summa/quick-start/

costco · 2025-11-06T23:44:33 1762472673

I use a Hetzner server with Ryzen 7 3700X and an SSD.

I think I could get the model to work with ONNX web but it'd be a 2GB download so the user experience wouldn't be too great. My Meilisearch index is ~40GB but I don't know how much that could be compressed down.

Here's how the similar page for books is generated, which I forgot to mention on the "how it works" page: https://gist.github.com/chris124567/8d06d64bfe827cb7f6121f93...

mcbrit · 2025-11-06T21:15:21 1762463721

I don't know. I entered, trying to be popular but at least slightly? opiniated:

Tigana, Hyperion, A Fire Upon the Deep, Blindsight, Moby Dick

and I got a list. Sure, read all that or wasn't interested for reasons, I added (only Neuromancer on initial recommendations):

Neuromancer, VALIS, Quantum Thief, Towing Jehovah.

List did not get more interesting.

Book recommendations are still kind of difficult.

mcbrit · 2025-11-06T21:19:29 1762463969

If I provide that list, a (real) person doesn't ask me if I've read the Hobbit.

teaearlgraycold · 2025-11-06T21:27:15 1762464435

I don’t think past liked books are nearly enough information to provide a good book for you today. You need a lot more information about the state of someone’s mind.

mcbrit · 2025-11-06T21:30:09 1762464609

You're talking to a dude. (in my case.) I mentioned 8 books.

I won't tell you exactly what to do, but one way to do it is to measure your surprise with me choosing each of those 8 books when you provide a recommendation back to me of what I should read next. I think I get kind of that experience talking to someone about books.

The algorithm didn't do that.

teaearlgraycold · 2025-11-06T22:04:17 1762466657

Talking to someone about books gives you so much more information than a book list. Their expressions, their accent, their energy level, their clothes, and many other things help to provide supplemental information.

foresterre · 2025-11-06T23:24:53 1762471493

It seems to work decently even with just one or two titles for popular titles, but less so for the niche.

For example, the title "Impro: Improvisation and the Theatre" by Keith Johnstone, linked by another article posted to HN today gives back the following suggestions:

- Truth in Comedy: The Manual of Improvisation by Charna Halpern - Steve Jobs by Walter Isaacson - 1984 by George Orwell - Harry Potter and the Sorcerer's Stone (Harry Potter, #1) by J.K. Rowling - Sapiens: A Brief History of Humankind by Yuval Noah Harari - The Alchemist by Paulo Coelho - The Tipping Point: How Little Things Can Make a Big Difference by Malcolm Gladwell - Dune (Dune, #1) by Frank Herbert

It's a bit unfortunate that all suggestions are fairly popular titles, which are fairly easy to find, while the unpopular or niche may be just as well written but a lot harder to find.

Within niche topics or books, it is also usually harder to provide multiple similar enough titles up front.

costco · 2025-11-06T23:33:27 1762472007

It's recommended that you put at least 3 books in. If you would like recommendations just based on one book, click the similar button on the book, it should take you to this page: https://book.sv/similar?id=297914

NitpickLawyer · 2025-11-06T21:01:14 1762462874

Interesting. I tested it with sci-fi, and it definitely recommends good books, but not sure how accurate it is at surfacing the sub genres / themes. For example for [aurora -ksr, seveneves, project hail mary, ender's game] it gave me dune. Which is a great book, but not in the "first-ish contact" style I hoped it would be.

Another thing I noticed is that it tends to recommend 2nd and 3rd books in a series, which is a bit so-so. If I add the first book in a series, I probably already read the whole series...

28304283409234 · 2025-11-06T21:10:58 1762463458

Came here to say this (recommending book 2 and 3 in a trilogy). Great app otherwise!

androng · 2025-11-06T22:03:13 1762466593

I tried to import my book list with "Import goodreads" button and inputting https://www.goodreads.com/user/show/68515148-andrew but it said "import failed, see console"

costco · 2025-11-06T22:04:59 1762466699

Worked for me, could be due to server being overwhelmed

Here is the URL with your books: https://book.sv/#52752877,46049530,18437030,52480873,3260654...

rapatel0 · 2025-11-07T00:18:01 1762474681

So I tried a few disparate books independently:

- Guns Germs and steel - The Alchemist - The Ramayana (a few others)

Harry Potter and the sorcerers stone came up in all of them near the top. :D

costco · 2025-11-07T00:35:47 1762475747

> Note 1: If you only provide one or two books, the model doesn't have a lot to work with and may include a handful of somewhat unrelated popular books in the results. If you want recommendations based on just one book, click the "Similar" button next to the book after adding it to the input book list on the recommendations page.

cfraenkel · 2025-11-07T01:44:37 1762479877

FYI, on this android tablet (android v12 / FF 144.0.2), the 'start typing a book title...' field doesn't do anything. On the Mac, it brings up a list of matches to select from.

giobox · 2025-11-07T01:24:19 1762478659

Considering how much treasure has been poured into building recommendation engines for just about everything online, books have always been very difficult for me to find recommendations that work. Interested to try it!

sodality2 · 2025-11-06T22:13:29 1762467209

This is fantastic!!! I've added many results to my want-to-read list, they're very on-point from very few inputs. It would be really cool to import from a user ID, where you can choose some subset of your read list to inspire new suggestions, while excluding all books in your want-to-read and already-read lists. But that's an ongoing scrape to maintain, it's a cat and mouse game you probably don't want to start. I wonder what the legal status of scraped training data is... if you don't reproduce any of the review data I presume you're fine?

costco · 2025-11-06T22:22:16 1762467736

You can import the first or last 64 books of your read, to-read, or currently-reading shelves if you press the "Import Goodreads" button and provide your Goodreads ID.

sodality2 · 2025-11-06T22:34:49 1762468489

D'oh, didn't even notice that button :P Wow, that greatly improved the recommendations, it even found a book I wouldn't say is particularly related to the others but I found it interesting-sounding. Thanks for such a cool site!!

Jayakumark · 2025-11-07T00:49:12 1762476552

Cool work, How much did it cost to train ? Will the source for training be open source ?

spullara · 2025-11-06T23:36:59 1762472219

I would love to be able to filter the resulting list by removing certainly all books that in the same series but I think removing all books by authors that I have already listed would be great to get new things that I haven't already read. The resulting recommendations maybe included 1 new book for me.

xkbarkar · 2025-11-06T21:47:42 1762465662

Have nothing to add that hasn’t already been commented. Like the entries in the add list stay. Other than that, my recommendation list keeps coming up with books I have already read and loved and I am hitting the limit :(.

So filtering would be great,

I have seen a few versions of the same books listed more than once.

Loved this. Hope you get to tune it a little.

Also, thank you for not ruining the site with a single popup, email subscription list offer, chatbot, wheelspin from hell anywhere.

Blessings from the popup hating part of the interwebs.

jamesponddotco · 2025-11-06T20:36:35 1762461395

The recommendations are pretty good; even though I only input six books, it was enough for it to recommend books I have on my wish list. Definitely going to play around some more. Plus, the website is super fast, very impressive.

Any chance we could get an API going at some point? Are you planning to open source the work?

I'm interested in the scrapping of Goodreads too. I'm building a book metadata aggregation API and plan on building a scrapper for Goodreads, but I imagine using a data center IP address will be a problem very fast. Were you scrapping from your home network?

costco · 2025-11-06T20:47:06 1762462026

Thank you for the compliments :) I used 50-100 datacenter proxies. I just logged requests made by the iOS app with Charles and then recreated the headers to the best of my ability though the server did not seem to be very strict at all. Worth noting though that static residential proxies are not too expensive these days anyways.

Re the API: The model does actually run fairly well on CPU so it probably wouldn't be too expensive to serve. I guess if there is demand for it I could do it. I think most social book sites would probably like to own their recommendation system though.

goatsi · 2025-11-06T21:13:07 1762463587

Speaking of sustained scraping for AI services, I found a strange file on your site: https://book.sv/robots.txt. Would you be able to explain the intent behind it?

costco · 2025-11-06T21:47:30 1762465650

I didn't want an agent to get stuck on an infinite loop invoking endpoints that cost GPU resources. Those fears are probably unfounded, so if people really cared I could remove those. /similar is blocked by default because I don't want 500000 "similar books for" pages to pollute the search results for my website but I do not mind if people scrape those pages.

dbl000 · 2025-11-06T21:42:59 1762465379

I would love an API or the dataset if you could share it somehow! Just to play around with my own book lists.

_virtu · 2025-11-06T21:51:26 1762465886

Hey OP I’m building a bookclub app. Do you happen to have an api I could plug into? I’d love to add this to our member suggestions section.

nickthesick · 2025-11-06T22:40:25 1762468825

I have a web app https://bookhive.buzz which is a GoodReads alternative based on BlueSky’s protocol. I scrape all of the book data from Goodreads too.

I would love to be able to add a recommendation system based on this.

nsypteras · 2025-11-06T21:11:05 1762463465

I'm impressed it recommended so many books i've already read and liked! I have a big reading backlog but once it's whittled down I will likely come back to this. One feature request would be to also show a "why this is recommended" for each recommendation so I can further narrow down the list for what I'm looking for

the_coffee_bean · 2025-11-07T02:25:12 1762482312

Amazing work. Do you plan on publishing the training code?

qingcharles · 2025-11-06T21:02:02 1762462922

I put in a bunch of books and hit recommendations and... I'd already read 95% of them, so at least we know it works well! (checking out the other 5% now)

p.s. one idea: when you click [Add] on the recommended books list, it should remove it from that list

p.p.s. if there is a way to filter out the spam "Summary of ____" books, that would be good too

jacquesm · 2025-11-06T21:47:01 1762465621

I have a hard time remembering titles of books I've read if they are not directly related to the subject matter. No problem remembering the content though. With movies I remember both.

skayvr · 2025-11-06T21:06:27 1762463187

I've worked in recommender systems for a while, and it's great to see them publicized.

SASRec was released in 2018 just after transformer paper, and uses the same attention mechanism but different losses than LLMs. Any plans to upgrade to other item/user prediction models?

costco · 2025-11-06T21:13:17 1762463597

I'm not an expert by any means but as far as sequential recommendations go, aren't SASRec and its derivatives pretty much the name of the game? I probably should have looked into HSTUs more. Also this / sparse transformers in general: https://arxiv.org/pdf/2212.04120

skayvr · 2025-11-06T21:34:44 1762464884

There's a few alternatives, but SASRec is a good baseline for next-item recommendation. I'd look at BERT4Rec too. HSTU is definitely a strong step forward, but stays in the domain of ID models. HSTU also seems to rely heavily on some extra item information that SASRec does not (timestamps).

Other models include Google's TIGER model which uses a VAE to encode more information about items. Similar to how modern text-to-voice operates.

costco · 2025-11-06T22:20:27 1762467627

Thank you for the recommendations. I didn't try BERT4Rec because I assumed it would perform the same or worse as what I already had after having read https://dl.acm.org/doi/pdf/10.1145/3699521. The TIGER paper seems interesting - I definitely want to explore semantic IDs in general and also because I think it could allow including more long-tail items.

bigskydog · 2025-11-06T21:29:10 1762464550

Recommend OneRec which is an improvement of HSTU and it recently became open source

noir_lord · 2025-11-06T20:42:04 1762461724

It has a tendency to recommend books in the same series as are input (putting aside that if I like a book in a series I've likely already read the series).

It did suggest Murderbot Diaries (not on the input but a series I have read and did like) and an Adrian Tchaikovsky I hadn't read :).

costco · 2025-11-06T20:51:26 1762462286

It's explicitly trained to predict the next book read in a sequence, which is why you get that behavior. There's probably a better way for me to handle it rather than having 5 books from the same series tend towards the top though.

noir_lord · 2025-11-06T22:00:29 1762466429

If you have the data to know the other books in a series maybe split the results so you have "books in series" in one column and "books not in a series mentioned" in the other but other than that it did a better job than Kindle recommendations which are often hilariously off the mark.

bananaflag · 2025-11-06T20:48:30 1762462110

Yeah the hardest problem for recommendation systems is to find non-Star Wars books which are like some specific Star Wars books and unlike some other Star Wars books. I would say it's AGI-complete ;)

noir_lord · 2025-11-06T22:11:36 1762467096

Ironically that is one of the few uses where I've found an LLM to actually be useful.

ChatGPT does a fairly good job at letting you negate/refine whatever it was you where looking for.

dylan604 · 2025-11-06T23:52:39 1762473159

I would expect a recommend of Star Trek if it were AGI-complete just to troll

fennec-posix · 2025-11-06T22:57:12 1762469832

Very neat. Even found a couple Cold War-setting books to read and an entire series of 6 books on the same topic, All from searching up Team Yankee.

Thanks for the new reading list :D

comrade1234 · 2025-11-06T20:40:24 1762461624

I gave up on goodreads reviews. I've been burned too many times by highly rated books that weren't that good. If you're into (horny) ya romance fantasy then goodreads is great, but it's not for me. I haven't really found a substitute.

owenversteeg · 2025-11-06T21:07:05 1762463225

Any broadly used ratings system is total garbage. Goodreads ratings, Google Maps ratings, Amazon reviews, Vivino for wine, et cetera. Even assuming the reviews are real and genuine, most people just aren’t good at writing reviews, and the handful that are often have wildly different criteria than you. Someone already commented with one enthusiast site - and sure, enthusiast sites are often better than the mainstream option (see also: CellarTracker for wine) but honestly my advice is to get good at determining the quality of the thing yourself. For books there are a ton of hints about what you’ll be getting. “NYT Bestseller”, “xyz book club”, certain publishers, who’s quoted on the back, when was it published, who wrote it? All of those things can help you rapidly identify books. I personally dislike most modern books and prefer the “classics”, so a lot of this is only useful as a negative signal, but even then there are positive signals, for example a reference to a much older book.

HeinzStuckeIt · 2025-11-06T21:34:24 1762464864

GR is also great if you are into academic nonfiction, Classics, poetry, etc. The site does, after all, let you track and review any publication with an ISBN. What my peers and I use it for is worlds apart from the romance novel or LGBT young-adult book reviewing community that often puts GR in the news, and far away from all the drama that rages around genre fiction.

jamesponddotco · 2025-11-06T20:45:32 1762461932

I'm not into the social aspect, so Goodreads was never an option, but Hardcover[1] seems like a pretty good alternative.

[1]: https://hardcover.app

jimmoores · 2025-11-06T21:36:35 1762464995

I unexpectedly liked this. I thought the recommendations were actually useful.

parkersweb · 2025-11-06T22:23:25 1762467805

I sadly didn’t share that experience - I fed it my goodreads most recent - but it largely picked up on 2 or 3 series I’ve been slowly working my way through so that most of the recommendation list was ALL the other books in the series (and the spin-off series) so I didn’t really get anything useful…

nwhnwh · 2025-11-06T21:30:58 1762464658

I entered "Alone Together: Why We Expect More from Technology and Less from Each Other" and I received books about Steve Jobs, Harry Potter and "The Subtle Art of Not Giving a F*ck". Like how???

costco · 2025-11-06T21:34:44 1762464884

If you want recommendations solely based on one book, please try the similar page: https://book.sv/similar?id=13566692

These seem to fit the description you are going for better. The model is trained to predict the next book in the sequence. Those other books you listed happen to be very popular, so in the absence of information about you (only having 1 book), the model will tend to recommend those.

BeetleB · 2025-11-06T21:38:27 1762465107

> Provide 3+ books for best results.

__alexander · 2025-11-06T21:19:45 1762463985

Care to share the scrapped data? I would love to play around with it.

costco · 2025-11-06T21:26:27 1762464387

Not sure if I can. At the very least book descriptions most likely could not be distributed. There is an academic dataset with around 200M reviews though: https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html

demaga · 2025-11-06T21:22:31 1762464151

I am not sure about legal side of things here, but a Kaggle dataset would be really cool

guelo · 2025-11-06T21:37:36 1762465056

I'm surprised he got that much data. Goodreads uses several tricks to try to stop scrapers, for example pagination only works up to a few pages.

jacquesm · 2025-11-06T21:45:36 1762465536

They might send him a bill for use of resources.

cjaackie · 2025-11-07T02:22:30 1762482150

I’m wondering about how ethical it is to load down a resource in this way, open to opinions. There is a mention “I didn’t hammer down the servers” but what does that really even mean? The site isn’t being used as intended and just curious how other people feel about that.

thinkcontext · 2025-11-05T20:57:26 1762376246

I'm impressed! It didn't take many books for it to start suggesting other books that I liked and it showed me several solid choices I'm adding to my queue.

esafak · 2025-11-06T20:37:23 1762461443

It is interesting that you chose a contextual recommender when you would think book affinity is not very susceptible to context. Did you try other models too?

stevage · 2025-11-06T22:32:21 1762468341

This is great. would be really nice to be able to reject suggestions though.

djoldman · 2025-11-06T21:19:22 1762463962

Can you share the details about the Meilisearch instance? How big is the box and database size?

costco · 2025-11-06T21:31:59 1762464719

Everything (namely Meilisearch, Postgres and the web server in Go) besides the model inference is running on a Hetzner server with a large SSD and an "AMD Ryzen 7 3700X 8-Core Processor." The data.ms directory is about 40GB. Once the HN traffic dies down I will probably move the model back to the Hetzner server so I don't have to pay $0.15/hour for an A4000.

skerit · 2025-11-06T20:37:46 1762461466

Please make this for tv series too!

dbingham · 2025-11-07T01:20:59 1762478459

See, now this is an excellent use of LLMs (if we're going to be using them at all). Low stakes if it gets shit wrong, but can provide some really useful and surprising answers!

One request, it would be nice to not have to add Goodreads, since I don't use it. I've love to be able to enter a couple of book titles or an author and just get recommendations!

costco · 2025-11-07T01:40:23 1762479623

You don't have to import your Goodreads profile. You can type titles and authors in the box and find books to add to the list that way.

momocowcow · 2025-11-06T21:03:50 1762463030

Whatever I put in, it wants me to read Sapiens :_(

oever · 2025-11-06T22:30:07 1762468207

Can confirm. Stallman, Torvalds, Orwell, Harari

https://book.sv/#2300585,644416

jauntywundrkind · 2025-11-06T21:35:36 1762464936

Where do nice scrapes like this end up? Are there BitTorrents out there for scrapes like this?

Honestly this would finally be the web2.0 we all wanted & hoped for. It's against majesty that it's all captured owned user content that is legally captured by essentially public message boards/sites.

tristor · 2025-11-06T22:42:34 1762468954

Two bugs to know about. First, you are using a deprecated API call that fails in Firefox. Second, you are using an HTTP endpoint that fails to upgrade to HTTPS to call the GoodReads API, which also fails with HTTPS-Only enabled in both Chrome and Firefox.

The idea seems good, but since I can't import my GoodReads successfully, it's hard for me to try

costco · 2025-11-07T00:33:48 1762475628

I use `fetch` on relative endpoints so that's odd. There shouldn't be any external API calls on my website other than whatever the Cloudflare captcha uses. I also use HTTPS-only in Chrome and did not experience any issues. I just tested Firefox with HTTPS-only on/off and Safari on my phone and I was able to import shelves for multiple users. Are you sure that you do not have any privacy settings on (can you access your shelf in Incognito mode)?

submeta · 2025-11-06T21:17:51 1762463871

Like the idea! Wondering: Weren’t the early LLMs trained on data in Goodreads as well? I can upload and ask ChatGPT as well, and it will give me similar recommendations, no?

brailsafe · 2025-11-07T00:37:44 1762475864

In some sense, it seems to work well, but the results are sort of nothing special and that's not what I'd personally hope for. I put in three books that are unrelated and got results that compare to a standard book store, either from the same series or other meme startup tech bro recommendations that I'd often literally see on the same shelf. I can't say it's not good, because obviously that's how people browse books and that's what you'd get from reviews, which is perhaps why I never consult reviews for anything.

I put in Thinking in Systems and got a bunch of engineering management stuff which I don't care about. Deep work of course gave me all the rich dad poor dad, steve jobs bio, tim ferriswheel crap which shouldn't surprise me at all. Girl with the dragon tattoo gave me the rest of the series.

Thematic similarity + popularity just seems boring, I'd like something that surfaces unusual deep cuts that I wouldn't necessarily find at the book store on the same shelf, but maybe that I could find if I went to a great library and might be out of print, or that I could find on libgen.

With these:

- Thinking In Systems: A Primer

- Paddle to the Amazon: The Ultimate 12,000-Mile Canoe Adventure

- The Elements of Typographic Style

I was kind of hoping to at least get "Grid Systems in Graphic Design" or something, but mostly got Alchemist, Zen', Into the Wild, almost comically mainstream cuts that of course in some cases I've already read or could find in a Cupertino trash can, not that any of them are not worth reading necessarily, but very typical.

An option to surface rarer choices that combine signals from all the books on the list would be neat, like in the above case, the least read real adventure book that somehow touches on the economics of places travelled through with musings about signage or that just happens to use a similar prose that Robert Bringhurst used to make print design theory not dull. Recommendations that only someone with a real sweaty and weird venn diagram of genuine personal deep interests might conjure up, and that a normal person might say "why the hell would I ever read that" but that otherwise amazing books that are just slept on and might never have found a market, or maybe thematically dissimilar+ conceptually similar in aggregate + unpopular. I'd like to be able to input a seed of inspiration that I haven't been able to find the next deeper step in, rather than all the books on how to start a startup in the garage I don't have. If it's James Hoffman's book on brewing coffee at a high level, I wouldn't want another YouTubers book on brewing coffee at a high level, I'd want the Physics of Filter Coffee, or something in an adjacent sphere grid / tree branch that gives me a way to pursue depth AND breadth but not necessarily the same book by someone else, or the same book with different characters. If I've found a seedling or a mushroom, I'd like to explore the root system of that fruiting body, and then at a certain point find a new seedling based on what I've learned so far, or the one video with 50 views that's somehow the best explanation of how to handle back-pressure in highly concurrent systems after I've realized that I don't know shit about concurrency, but not so deep in the stack that I can't bridge the gap; make the series for me.

Granted, my take here might just be an indictment of reviews in general, or at least those sourced from a generic site like goodreads/amazon which is all about popularity and armchair criticism.

costco · 2025-11-07T01:14:47 1762478087

I would agree the results are generally OK but do not feel magical in most cases (I think in some specific cases they do though). The results can be not great if you add books across many disciplines. For instance if you add "The Elements of Typographic Style" and "The Design of Everyday Things" (https://book.sv/#671857,18518), you do get "Grid Systems in Graphic Design" but under its German name "Rastersysteme für die visuelle Gestaltung."

cyrusradfar · 2025-11-07T02:09:56 1762481396

I think this is cool and super fast -- kudos on whatever tech you needed to tackle to make it so.

I don't see anyone saying safety or ethics, so I'll just put it out there that it has some safety and ethical considerations you should consider.

Consider "inflammatory" books and how they could be used to harm a group of people. Although I recognize folks post this "publicly", I think the intersection feature provides more than Goodreads.

Let's say, people who have read "Mein Kampf" & "The Anarchists Cookbook" or some other combination that say "Antifa" to the current regime.

I'd recommend you have a list that you consider private, always and allow Users to add to that list so it's more scalable. If folks try to intersect with anything in that list, you can warn that you don't allow intersection with private books.

Anyway, super fun demo!

zeroq · 2025-11-07T02:30:33 1762482633

It always baffled me how we censor "Mein Kampf" but we - as a society - are super fine with either Alex Jones shouting about lasers and lizard people or Joe Rogan leaving an open mic to people claiming there are nuclear plants and space stations buried under pyramids [1].

Mein Kampf is absolutely terrible piece of literature not by it's message but by it's quality. It's exactly something I would expect to find in Alex Jones cell if we would sentence him to a year of solitary confinement.

[1] just a tiny exaggeration

colechristensen · 2025-11-07T02:43:47 1762483427

I don't want "safety" or "ethics" if the requirements for them are banning or hiding books based on somebody's ideology whether or not it agrees with mine.