Hacker News new | past | comments | ask | show | jobs | submit login

> The fact that the linguistic attributes are difficult to put boundaries on is extremely common for linguists: we won’t even claim to tell you what the definition of “word” is!

The definition of a "word" is always straightforward: a word is an atomic unit of language.

However, which units are or aren't atomic varies according to what it is you're measuring.

Lexically, "catch fire" is an atomic entity, which cannot be understood as the sum of its parts. It's just one part, and it needs its own dictionary entry, separate from "catch" and from "fire".

Syntactically, "catch fire" is definitely not atomic, because the past tense is "caught fire". From this perspective, it's enough to know "catch" and "fire".

Syntactically again, we can see that "an elephant" is in variation with "two elephants" / "my elephant" / "every elephant" / etc., and it's clear that "an elephant" is not atomic, but is understood as the composition of "a(n)" with "elephant".

Phonologically, as the citation-form spelling above hinted, "an elephant" is atomic; the article cannot exist independently and must attach to another word. Without knowing what that word is, you won't know how the article is pronounced.

Specialized terms for both of these types of phenomena exist - lexical words that are too large to be syntactic words are called idioms; syntactic words that are too small to be phonological words are called clitics. But the general lesson is that, despite the definition of "word" being clear, membership in the category varies according to what aspect of the language you're looking at.




By god you've done it, you've solved linguistics!


This doesn't even begin to cover things, even for English. First of all, "catch fire" is at least partly understandable from its constituent parts - "catch" has a great variety of related meanings, and they all have to do with something taking hold of something else; I'm sure any English speaker who has encountered both words would intuit the meaning of "catch fire" without any problem, especially if they also encountered "to catch a cold". Of course, the meaning is slightly different, and it is quite invariant.

Your analysis of the phonetic atomicity is also unsatisfactory. First, the article can very well be pronounced independently - I can say "English has a single indefinite article, with two forms: 'a' or 'an'". Secondly, the 'a' form can be pronounced in two different ways, depending on how you want to highlight it within a sentence: "he ate a piece" could use the schwa, or the "long a" if you want to highlight the article itself "he ate [ay] piece, not your piece". So the article's pronunciation can change independently of the word it is applying to. Finally, in at least some English accents, many words can be pronounced differently in certain sequences than others - for example, in modern Southern English, an "r" sound is introduced almost always in speech when a word that ends in a vowel sound is followed by another word that starts with a vowel sound, e.g. "I saw-R-it". By your description, neither "saw" nor "it" are individual word phonologically, since you don't know how they will be pronounced unless we know the following word.

Overall, the atomicity of a linguistic construct is highly debatable, even in a particular context.


> First of all, "catch fire" is at least partly understandable from its constituent parts - "catch" has a great variety of related meanings, and they all have to do with something taking hold of something else

If you want to analyze it that way, you'll find that the semantics are the reverse of what you predict: when you catch fire, it's the fire that takes hold of you.

> First, the article can very well be pronounced independently - I can say "English has a single indefinite article, with two forms: 'a' or 'an'".

This argument is predicated on forgetting the difference between use and mention. What part of speech would you say an is in that sentence? Is it an article?

> Secondly, the 'a' form can be pronounced in two different ways, depending on how you want to highlight it within a sentence

Yes, problems arise when you need to place sentence-level stress on a feature that is too weak to bear stress. The same problem occurs for any English clitic, including 's, which in the general case doesn't even include a vowel. Most notably here, there's nothing about this specific to a before consonants; the rules for placing stress don't know what word you're following a with. If you need to stress an, the usual choice is /æ/. But also notably, when native speakers do this, they recognize it as a problem - it's just one they may not be able to work around.

> Finally, in at least some English accents, many words can be pronounced differently in certain sequences than others - for example, in modern Southern English, an "r" sound is introduced almost always in speech when a word that ends in a vowel sound is followed by another word that starts with a vowel sound, e.g. "I saw-R-it". By your description, neither "saw" nor "it" are individual word phonologically

This is not a word-level phenomenon in any way; intrusive R also occurs between syllables of a single word, as long as there's an appropriate vowel-vowel sequence. Placing one between saw and it would not normally be viewed as altering the pronunciation of either word (Which one do you think is altered? I guess by nonrhotic standards it would have to be it), but as the application of a general rule.

Placing /n/ between a and elephant is not the application of a general rule, it's the application of a rule specific to a.

> Overall, the atomicity of a linguistic construct is highly debatable, even in a particular context.

You're saying that people argue over which items count as words, not that they argue over what it means to count as a word.


> If you want to analyze it that way, you'll find that the semantics are the reverse of what you predict: when you catch fire, it's the fire that takes hold of you.

That is one of the meanings of catch, just like when you catch a cold, the cold takes hold of you, or when you catch your foot on something, that thing took hold of your foot.

> This argument is predicated on forgetting the difference between use and mention. What part of speech would you say an is in that sentence? Is it an article?

Fair enough, though I would still argue that being able to make a noun out of the article in this way relies on them having a stable, recognizable, individual pronunciation.

> This is not a word-level phenomenon in any way; intrusive R also occurs between syllables of a single word, as long as there's an appropriate vowel-vowel sequence.

Well, we are trying to define what a "word" even is, so you can't bring this distinction in. A priori, "saw it" could be a word, just as my whole comment could be a single word. We are trying to come up with a formal definition of what it means to be a word; if we want "an elephant" to be a single word and "saw-r-it" to be two words, we need to come up with a distinction between these that doesn't presuppose that "saw" and "it" are separate words.

> Placing one between saw and it would not normally be viewed as altering the pronunciation of either word (Which one do you think is altered? I guess by nonrhotic standards it would have to be it), but as the application of a general rule.

Depending on the exact accent, not all words follow this rule. In certain accents, at least, it is quite specific to words that have an 'r' in their spelling (well, to words that historically had an r sound that was lost, and is usually preserved in the spelling), so "four o'clock" would get a linking R, but "saw it" would not. So at least in these cases, by your definition, we'd have to say that "four" is not an individual word, phonetically speaking. Also note that linking/intrusive R doesn't appear inside morphemes, normally, only between morphemes, or between morphemes and suffixes. So you get [Kafka-r-esque] in certain accents, but never inside, say, "dais".

> Placing /n/ between a and elephant is not the application of a general rule, it's the application of a rule specific to a.

This could also have been a general rule, that happens to apply to a single word in modern English. Regardless, as I have mentioned before, if you want to define "phonological word" as a unit whose exact pronunciation is only knowable when you have all parts present, then lots of phrases are phonological words in English, unless you add a lot of exceptions to your definition.

> You're saying that people argue over which items count as words, not that they argue over what it means to count as a word.

These are not that different in practice. If you have a good formal definition, then you should be able to say for any object whether it is a word, a part of a word, or a sequence of words. If you can't do that, you don't really have a definition. The definition you gave basically equates word with atomic, but then if "atomic" is not well defined, or even definable, then we're back to square one of not knowing what a word actually means.


Relevant xkcd: https://xkcd.com/793/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: