> a reading ease of 74.9 (fairly easy) Well, Joyce did say that it was just a lo...

acabal · 2024-05-31T16:13:50 1717172030

Yes, it's a bit silly! The reason the score is so off is because we use the Flesch Reading Ease algorithm[1] to calculate it, which was designed for the US Navy to be able to score technical manuals. It works very well for most prose too... except highly modernist prose!

https://en.wikipedia.org/wiki/Flesch-Kincaid_Reading_Ease

AlbertCory · 2024-05-31T16:42:59 1717173779

Thanks. Maybe a simple fix is: don't use it for fiction. Since that's not its intent.

acabal · 2024-05-31T17:01:57 1717174917

It works just fine for fiction. Ulysses is a very special edge case in the pantheon of all literature, so it's no surprise it doesn't work well for this one case.

AlbertCory · 2024-05-31T20:22:28 1717186948

> It works just fine for fiction

how about some other well-known novels and their scores?

throwup238 · 2024-05-31T23:08:51 1717196931

You can sort their list by reading ease: https://standardebooks.org/ebooks?page=21&per-page=48&sort=r...

A lot of John Stuart Mill and John Dewey

AlbertCory · 2024-05-31T23:52:51 1717199571

OK, I tried that. Among the hardest fiction:

Moll Flanders Tristram Shandy Gulliver's Travels Robinson Crusoe

dudinax · 2024-06-01T00:42:11 1717202531

The Sun Also Rises is easier than Winnie the Pooh. I can buy that.

bryanrasmussen · 2024-05-31T19:15:23 1717182923

How's it handle Finnegans Wake?

Avid8329 · 2024-05-31T19:28:49 1717183729

Ulysses has mostly "real" words while Finnegans Wake is largely made of portmanteaus. It'll be interesting to see the results!

lapetitejort · 2024-05-31T22:31:33 1717194693

I just ran it and got "segmentation fault (core dumped)". Is this one of Joyce's silly sentences he's famous for?

AlbertCory · 2024-05-31T23:53:26 1717199606

He was such a futurist, that Joyce /s

robin_reala · 2024-05-31T19:25:00 1717183500

Got another 11 years to wait before that enters the US public domain, unfortunately.

hedora · 2024-05-31T16:56:54 1717174614

As an end-user of Standard Ebooks, I've found it works pretty well on average.

readthenotes1 · 2024-05-31T18:49:23 1717181363

Reading ease: each word makes sense. -25.1 points for no 4 words in a row making sense.

retrac · 2024-05-31T19:40:26 1717184426

It's fairly easy... if you also speak French, Italian, Latin, and probably Ancient Greek. I don't and I know I missed a lot. I remember a lot of bilingual French/English wordplay through worked. He was multilingual and the puns/kennings are also.

comonoid · 2024-06-01T15:05:11 1717254311

That should read "a reading ease: Ulysses".

perihelions · 2024-05-31T16:16:34 1717172194

- "a reading ease of 74.9 (fairly easy)"

Yeah, that's a very unnecessary misuse of AI.

Is there an open-source human rating site for serious books, in how difficult they are to read—how tedious, how erudite, how much pain you have to go through to get whatever reward you think you get at the end? With Ulysses near the edge of one axis, Moby Dick demarcating another... Surely this is all common knowledge to bookish people, but, where do they write it down?

robin_reala · 2024-05-31T16:22:19 1717172539

Hardly AI, just a simple Python function that implements the Flesch reading ease algorithm: https://github.com/standardebooks/tools/blob/effcf0f6db05729...

xandrius · 2024-05-31T16:25:27 1717172727

Everything is AI to an untrained person.

hedora · 2024-05-31T16:57:32 1717174652

Also, these days, most AI is just a simple python program.

tiagod · 2024-05-31T18:14:49 1717179289

Well, sure, because all the complexity emerges in the weights

erikpukinskis · 2024-05-31T17:46:40 1717177600

AI stands for “artificial intelligence” and I think an algorithm which decides how easy a book is to read qualifies as some sort of intelligence.

picture · 2024-05-31T17:57:15 1717178235

The input to the algorithm is literally three numbers: total words, total sentences, total syllables. If this counts as AI, then your thermostat or film camera feels pretty AI too.

https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readabi...

wan23 · 2024-05-31T18:47:54 1717181274

Perhaps we could use AI to give us a score for how AI a given AI is

logicallee · 2024-05-31T20:33:21 1717187601

ChatGPT gives it a score of 0.

https://chatgpt.com/share/c1e700bc-7353-427e-a953-1e234f5e96...

erikpukinskis · 2024-06-03T20:07:41 1717445261

Good data point, but a little bit biased.

perihelions · 2024-05-31T19:14:00 1717182840

Right, and whether you call it "General AI" or "trivial Python script" my complaint stands–that it's a misfeature for the user, the novice reader user, the English-as-a-foreign-language user, who relies on a machine review that tell them reading Joyce is "easy English". That would seriously suck if that happened to someone, though I assume that's statistically unlikely (particularly given Joyce-is-difficult-English is a widely-known meme). It'd be an unpleasant experience, like being told glue is tasty on a pizza.

I *get* that my opinion is an unpopular and minority one, so I accept the downvotes and ridicule, fine. This is the minority viewpoint I hold, I stubbornly stand by; and the hill I will die on. That it's disrespectful to users to inject unvetted machine scoring into book reviews; it's a malfeature and should not be a socially accepted practice. Treat the human user with awed respect; where you can help them, help, and where you don't know, say nothing—don't let loose some talking Python script. The user doesn't know the limitations of your script; the user doesn't know the language you posted on your page isn't authoritative language and is prone to major errors.

rpdillon · 2024-05-31T20:09:36 1717186176

> That it's disrespectful to users to inject unvetted machine scoring into book reviews

Very, very far from being unvetted. This algorithm has been used, unchanged, for the 50 years since Flesch–Kincaid was developed. I've used this metric for my entire life as a rough indicator of difficulty, and it is widely accepted. But it's a limited metric: it has two factors for difficulty that generally rate text as more difficult if it has more words per sentence and more syllables per word. It's a good heuristic, but as with all heuristics, there will be edge cases, and Ulysses is one of them.

As I do with all critiques, I guess I'd ask you to make a better suggestion for Standard Ebooks. Given their resources, and the available alternative of "have a panel of diverse humans read every book and grade its difficulty", your position is dangerously close to letting the perfect be the enemy of the good. Is your argument that Standard Ebooks would be a better product if they didn't include reading ease metrics? If so, I respectfully disagree.

> Treat the human user with awed respect; where you can help them, help, and where you don't know, say nothing—don't let loose some talking Python script. The user doesn't know the limitations of your script; the user doesn't know the language you posted on your page isn't authoritative language and is prone to major errors.

I don't think this is fair. Reading ease has flaws, but is widely accepted (although seemingly poorly understood, despite its simplicity). The guy who runs readable.com (DaveChild) responded to a post on Reddit about reading scores a few years back (that thread was also filled with tons of misinformation about how this is some black-box AI algorithm that's making everyone stupid), but his comment was quite well-grounded:

> Readability scores are fairly crude, almost by design, because they were all created at a time when they had to be worked out without computers. But they do give a decent idea of the overall readability of a piece, and that helps you to see if your content is too wordy. They are not, by themselves, an indicator of quality. They are not a substitute for proofreading and editing. But they are a useful tool to have in your arsenal.

This is a balanced, practical opinion. Life is filled with proxy metrics that are flawed, from insurance risk and credit ratings to SAT scores and the ability to do whiteboard-coding. In context, I think Standard Ebooks made exactly the right choice to incorporate some measure of reading ease in their offering, even if it doesn't get it 100% right 100% of the time.

AlbertCory · 2024-05-31T20:25:20 1717187120

I see several people calling this an edge case. That might well be, but how about giving us something to compare it to, in the realm of early- or pre-20th century novels?

perihelions · 2024-05-31T20:24:15 1717187055

- "Very, very far from being unvetted. This algorithm has been used, unchanged, for the 50 years since Flesch–Kincaid was developed."

I mean that the instance is unvetted: the machine score is generated automatically, and placed on the website automatically, and no human in the loop checks if it's reasonable or not. Not that the general algorithm is un-reviewed.

- " But they do give a decent idea of the overall readability of a piece, and that helps you to see if your content is too wordy. They are not, by themselves, an indicator of quality. They are not a substitute for proofreading and editing. But they are a useful tool to have in your arsenal."

This is very fair.

- "Life is filled with proxy metrics that are flawed, from insurance risk and credit ratings"

And a lot of them are very rightly illegal to score algorithmically in the EU (for important decisions), without manual oversight, because of the possibility of egregious and unaccountable machine error. The trend of abdicating human agency is not overall a wholesome one.

I'm coming from a place were I do read books (despite the fact I write HN comments like an illiterate stoned baboon, I'm trying my hardest really I am), and they come lovingly edited by obsessed people who put probably thousands of hours into editing each one, individually, with commentary essays that are up to 50-100 pages long, fastidiously crafted to guide the novice explorer. Standard Ebooks is neither a publisher not attempting to replace publishers. But: it's viscerally disturbing to me to see robots taking the hallowed place of human scholars in annotating—in this narrow example, scoring–books, and when they go badly wrong like this Joyce example, it's very upsetting, and makes me (irrationally?) think there's some terribly dangerous cultural normalization for replacing authentic human intelligence with fake, stupid, hopelessly lost machine imitations. And we'll lose many valuable things and our humanity in the process.

I sincerely apologize to anyone I've annoyed with this (I infer I've annoyed a lot of people). I'm just very upset with seeing fake machine stuff everywhere.

AlbertCory · 2024-05-31T20:35:23 1717187723

re "unvetted" and difficulty thereof : there are already reviews of its difficulty elsewhere on the Web, e.g. from Goodreads:

https://www.goodreads.com/review/show/6752242 https://www.goodreads.com/review/show/4827595524

1. Telemachus. Difficulty : 0 2. Nestor. Difficulty : 0 3. Proteus Difficulty : 9 4. Calypso. Difficulty : 5 5. The Lotus Eaters. Difficulty : 4 6. Hades. Difficulty : 3 7. Aeolus. Difficulty : 5 8. The Laestrygonians. Difficulty : 5

etc.

squigz · 2024-05-31T17:59:26 1717178366

Who even mentioned AI here?