Hacker News new | past | comments | ask | show | jobs | submit login

There is no generally accepted definition in linguistics, but AI researchers have come to the consensus that a word is one or more LLM tokens ;)

I’m not a linguist, but I would define a word as a part of a sentence composed out of one or more syllables, with word boundaries either implicitly or explicitly specified by different methods in different languages, e.g. by using pauses, longer or shorter phonemes, by using accents, rhythm, or intonation, or simply by remembering words as part of learning a lexical vocabulary.

A word is something that can be categorized as to which part of speech it belongs (noun, verb, adjective, adverb, etc.)

Depending on the languages it’s not always clear whether prefixes and/or suffixes are part of the word a separate words.

Similarly with compound words - do they count as a single or multiple words?

A short sentence in one language may enter another language as a single opaque word.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: