Hacker News new | past | comments | ask | show | jobs | submit login

> a reading ease of 74.9 (fairly easy)

Well, Joyce did say that it was just a lot of jokes.

=========================

Episode 14 - Oxen Of The Sun

DESHIL HOLLES EAMUS. DESHIL HOLLES EAMUS. DESHIL HOLES Eamus. Send us, bright one, light one, Horhorn, quickening and wombfruit. Send us, bright one, light one, Horhorn, quickening and wombfruit. Send us bright one, light one, Horhorn, quickening and wombfruit.

Hoopsa, boyaboy, hoopsa! Hoopsa, hoyaboy, hoopsa! Hoopsa, boyaboy, hoopsa.

Universally that person's acumen is esteemed very little perceptive concerning whatsoever matters are being held as most profitable by mortals with sapience endowed to be studied who is ignorant of that which the most in doctrine erudite and certainly by reason of that in them high mind's ornament deserving of veneration constantly maintain when by general consent they affirm that other circumstances being equal by no exterior splendour is the .....




Yes, it's a bit silly! The reason the score is so off is because we use the Flesch Reading Ease algorithm[1] to calculate it, which was designed for the US Navy to be able to score technical manuals. It works very well for most prose too... except highly modernist prose!

https://en.wikipedia.org/wiki/Flesch-Kincaid_Reading_Ease


Thanks. Maybe a simple fix is: don't use it for fiction. Since that's not its intent.


It works just fine for fiction. Ulysses is a very special edge case in the pantheon of all literature, so it's no surprise it doesn't work well for this one case.


> It works just fine for fiction

how about some other well-known novels and their scores?


You can sort their list by reading ease: https://standardebooks.org/ebooks?page=21&per-page=48&sort=r...

A lot of John Stuart Mill and John Dewey


OK, I tried that. Among the hardest fiction:

Moll Flanders Tristram Shandy Gulliver's Travels Robinson Crusoe


The Sun Also Rises is easier than Winnie the Pooh. I can buy that.


How's it handle Finnegans Wake?


Ulysses has mostly "real" words while Finnegans Wake is largely made of portmanteaus. It'll be interesting to see the results!


I just ran it and got "segmentation fault (core dumped)". Is this one of Joyce's silly sentences he's famous for?


He was such a futurist, that Joyce /s


Got another 11 years to wait before that enters the US public domain, unfortunately.


As an end-user of Standard Ebooks, I've found it works pretty well on average.


Reading ease: each word makes sense. -25.1 points for no 4 words in a row making sense.


It's fairly easy... if you also speak French, Italian, Latin, and probably Ancient Greek. I don't and I know I missed a lot. I remember a lot of bilingual French/English wordplay through worked. He was multilingual and the puns/kennings are also.


That should read "a reading ease: Ulysses".


- "a reading ease of 74.9 (fairly easy)"

Yeah, that's a very unnecessary misuse of AI.

Is there an open-source human rating site for serious books, in how difficult they are to read—how tedious, how erudite, how much pain you have to go through to get whatever reward you think you get at the end? With Ulysses near the edge of one axis, Moby Dick demarcating another... Surely this is all common knowledge to bookish people, but, where do they write it down?


Hardly AI, just a simple Python function that implements the Flesch reading ease algorithm: https://github.com/standardebooks/tools/blob/effcf0f6db05729...


Everything is AI to an untrained person.


Also, these days, most AI is just a simple python program.


Well, sure, because all the complexity emerges in the weights


AI stands for “artificial intelligence” and I think an algorithm which decides how easy a book is to read qualifies as some sort of intelligence.


The input to the algorithm is literally three numbers: total words, total sentences, total syllables. If this counts as AI, then your thermostat or film camera feels pretty AI too.

https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readabi...


Perhaps we could use AI to give us a score for how AI a given AI is



Good data point, but a little bit biased.


Right, and whether you call it "General AI" or "trivial Python script" my complaint stands–that it's a misfeature for the user, the novice reader user, the English-as-a-foreign-language user, who relies on a machine review that tell them reading Joyce is "easy English". That would seriously suck if that happened to someone, though I assume that's statistically unlikely (particularly given Joyce-is-difficult-English is a widely-known meme). It'd be an unpleasant experience, like being told glue is tasty on a pizza.

I *get* that my opinion is an unpopular and minority one, so I accept the downvotes and ridicule, fine. This is the minority viewpoint I hold, I stubbornly stand by; and the hill I will die on. That it's disrespectful to users to inject unvetted machine scoring into book reviews; it's a malfeature and should not be a socially accepted practice. Treat the human user with awed respect; where you can help them, help, and where you don't know, say nothing—don't let loose some talking Python script. The user doesn't know the limitations of your script; the user doesn't know the language you posted on your page isn't authoritative language and is prone to major errors.


> That it's disrespectful to users to inject unvetted machine scoring into book reviews

Very, very far from being unvetted. This algorithm has been used, unchanged, for the 50 years since Flesch–Kincaid was developed. I've used this metric for my entire life as a rough indicator of difficulty, and it is widely accepted. But it's a limited metric: it has two factors for difficulty that generally rate text as more difficult if it has more words per sentence and more syllables per word. It's a good heuristic, but as with all heuristics, there will be edge cases, and Ulysses is one of them.

As I do with all critiques, I guess I'd ask you to make a better suggestion for Standard Ebooks. Given their resources, and the available alternative of "have a panel of diverse humans read every book and grade its difficulty", your position is dangerously close to letting the perfect be the enemy of the good. Is your argument that Standard Ebooks would be a better product if they didn't include reading ease metrics? If so, I respectfully disagree.

> Treat the human user with awed respect; where you can help them, help, and where you don't know, say nothing—don't let loose some talking Python script. The user doesn't know the limitations of your script; the user doesn't know the language you posted on your page isn't authoritative language and is prone to major errors.

I don't think this is fair. Reading ease has flaws, but is widely accepted (although seemingly poorly understood, despite its simplicity). The guy who runs readable.com (DaveChild) responded to a post on Reddit about reading scores a few years back (that thread was also filled with tons of misinformation about how this is some black-box AI algorithm that's making everyone stupid), but his comment was quite well-grounded:

> Readability scores are fairly crude, almost by design, because they were all created at a time when they had to be worked out without computers. But they do give a decent idea of the overall readability of a piece, and that helps you to see if your content is too wordy. They are not, by themselves, an indicator of quality. They are not a substitute for proofreading and editing. But they are a useful tool to have in your arsenal.

This is a balanced, practical opinion. Life is filled with proxy metrics that are flawed, from insurance risk and credit ratings to SAT scores and the ability to do whiteboard-coding. In context, I think Standard Ebooks made exactly the right choice to incorporate some measure of reading ease in their offering, even if it doesn't get it 100% right 100% of the time.


I see several people calling this an edge case. That might well be, but how about giving us something to compare it to, in the realm of early- or pre-20th century novels?


- "Very, very far from being unvetted. This algorithm has been used, unchanged, for the 50 years since Flesch–Kincaid was developed."

I mean that the instance is unvetted: the machine score is generated automatically, and placed on the website automatically, and no human in the loop checks if it's reasonable or not. Not that the general algorithm is un-reviewed.

- " But they do give a decent idea of the overall readability of a piece, and that helps you to see if your content is too wordy. They are not, by themselves, an indicator of quality. They are not a substitute for proofreading and editing. But they are a useful tool to have in your arsenal."

This is very fair.

- "Life is filled with proxy metrics that are flawed, from insurance risk and credit ratings"

And a lot of them are very rightly illegal to score algorithmically in the EU (for important decisions), without manual oversight, because of the possibility of egregious and unaccountable machine error. The trend of abdicating human agency is not overall a wholesome one.

I'm coming from a place were I do read books (despite the fact I write HN comments like an illiterate stoned baboon, I'm trying my hardest really I am), and they come lovingly edited by obsessed people who put probably thousands of hours into editing each one, individually, with commentary essays that are up to 50-100 pages long, fastidiously crafted to guide the novice explorer. Standard Ebooks is neither a publisher not attempting to replace publishers. But: it's viscerally disturbing to me to see robots taking the hallowed place of human scholars in annotating—in this narrow example, scoring–books, and when they go badly wrong like this Joyce example, it's very upsetting, and makes me (irrationally?) think there's some terribly dangerous cultural normalization for replacing authentic human intelligence with fake, stupid, hopelessly lost machine imitations. And we'll lose many valuable things and our humanity in the process.

I sincerely apologize to anyone I've annoyed with this (I infer I've annoyed a lot of people). I'm just very upset with seeing fake machine stuff everywhere.


re "unvetted" and difficulty thereof : there are already reviews of its difficulty elsewhere on the Web, e.g. from Goodreads:

https://www.goodreads.com/review/show/6752242 https://www.goodreads.com/review/show/4827595524

1. Telemachus. Difficulty : 0 2. Nestor. Difficulty : 0 3. Proteus Difficulty : 9 4. Calypso. Difficulty : 5 5. The Lotus Eaters. Difficulty : 4 6. Hades. Difficulty : 3 7. Aeolus. Difficulty : 5 8. The Laestrygonians. Difficulty : 5

etc.


Who even mentioned AI here?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: