Hacker News new | past | comments | ask | show | jobs | submit login
"shi1 shi4 shi2 shi1 shi3" - a Chinese tongue twister (yellowbridge.com)
77 points by soyelmango on Sept 7, 2010 | hide | past | favorite | 52 comments



This is almost certainly written in literary classical Chinese (even if written recently). Basically, in Chinese, words that were once pronounced differently merged in pronunciation in modern Mandarin, though since the writing system is quite divorced from sound, the written forms never changed (at least, not until Simplified Chinese under Mao).

What that means is that a lot of classical Chinese literature has a bit of this problem, in that if you read it aloud, it is tricky to fully comprehend. (This poem/story takes it to an extreme, obviously.) For a long time, up through the early 20th century, it was fairly common for written things to be written in this literary Chinese even if the writer would say their ideas completely differently. A "write as you speak" movement has largely changed that: in modern written Mandarin, many words are two or three syllables long, and written with two or three characters. Etymologically, the words may have derived from compounding monosyllabic words, but in the modern spoken (and now also written) language, the sub-word syllables are simply no longer words in any meaningful sense.

(An analogy on a much smaller scale in English: many dialects of English merge /ɪ/ and /ɛ/ before a nasal sound, so that "pin" and "pen" sound identical. Speakers of those dialects will often say "straight pin" or "ink pen" (or "pig pen" or "cow pen") to distinguish which kind of pin/pen they mean; they may or may not make the distinction in writing. So the writing is unambiguous, the speaking is unambiguous, but reading a written thing is ambiguous. When it's just one or two words it's no big deal, but imagine that nearly every word suffered from that problem....)

EDIT: It turns out there's a Wikipedia page on the poem (naturally, should've checked that first). A lot of this is covered there: http://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_D...


One solution to the "many characters sounding the same" problem was turning Chinese into a tonal language. I guess technically that makes them sound different. but even with tones (in modern mandarin Chinese there are 4 tones; in older time there were more; Cantonese reportedly still has 9).

Even with tones though, there are still too many characters sounding exactly the same. In a Chinese dictionary (you might want to pause for a moment and imagine what a Chinese dictionary looks like. if you come across an unknown English word in writing, you can look it up by spelling. but what's the equivalent thing to do for a chinese character?), under any one sound specification, you can easily find 10-20 characters.

where I disagree is that, yes, classic written Chinese is tricky to understand, but not in the sense that they are mostly one-syllable words. they are tricky to understand in the same sense that Shakespearean (or Chaucerian) English was tricky to understand to modern English speakers - due to unfamiliar vocabulary and old-style grammar. if this opinion is valid, I further propose that, when the Chinese talk, they might rely more heavily on semantics to parse sentences in real-time, since based on sound alone there might be too many character candidates. in other words, they use what's being talked about to do some pretty heavy-handed proning as they process incoming syllables.


On Chinese dictionaries. I actually don't think there is any difference from English ones, at least in the way I use them. When I use an English dictionary I use the word's spelling. I don't check how the word is pronounced when I do this. When I use a Chinese dictionary I use the number of strokes that make up the character, which puts all the characters in a sort of order not unlike an alphabetical one. I also do not need to know how the word is pronounced when I do this.

There is no such thing as "turning Chinese into a tonal language." Chinese is tonal to begin with. For native speakers, a different tone does sound as distinct as a different phoneme. The problem the poem exemplifies is the sort we run into when we reduce the language into only the phonetics -- a problem I'm not convinced is unique to Chinese. In fact, just the post above pointed to an example of such in English.


The overloading of the shi sound in mandarin approaches absurd levels.

I've always liked:

四 是 四 , 十 是 十 , 十 四 是 十 四 , 四 十 是 四 十 , 四 十 四 只 石 狮 子 是 死 的 sì shì sì shí shì shí shí sì shì shí sì sì shí shì sì shí sì shí sì zhī shí shī zǐ shì sǐ de. Translation: 4 is 4, 10 is 10, 14 is 14, 40 is 40, 44 small stones are dead

Then get a southerner with say a Sichuan or yunnan accent to say this (in mandarin). They cannot properly pronounce shi (they say it like si). The above just sounds like an angry bee. Given that shi and si is used a lot this becomes a real pain for a non native speaker.

Makes this tough when you're buying something that is 44 rmb and you cant tell whether they said "is 14" or "44" or what.


My impression (from very brief study of Mandarin) is that Mandarin has a lot more homonyms than most languages, e.g. lǐ has a ton of meanings: http://en.wiktionary.org/wiki/l%C7%90 It also seems to me that a lot of Mandarin phonemes are much closer together than most languages, e.g. ch and q are both a similar "ch" sound (likewise sh and x). Or maybe these sounds just seem similar to me because of my English upbringing?

Do linguists have a way to quantitatively measure how close together the sounds and words are in a language? Some sort of Shannon entropy measure, maybe? Or a way to measure how spread out words are in "phonetic space". I couldn't find anything, but I'd like to know if there's a way to measure these things objectively.


On the first part: See my other post for more, but basically, much (not all) of the homonymy is from older and/or written-only forms; and yes, ch and q don't sound any closer to a native Mandarin speaker than, say, sh and s do to a native English speaker or u and ou to a native French speaker.

On the second part: no defined measure that I know of. It would be a little tricky in that language is a moving target, everyone speaks it slightly differently, and even for a single speaker the "location" of a particular phone is more of a probability distribution even after you factor out varying context. That said, there definitely are charts that map out the space and take a stab at identifying the prototype location of each phone in the sound space, so it's not entirely implausible that you could summarise that with a distance measure. I strongly suspect that the value of the measure would not vary much among languages with similar-size phoneme inventories, though.


In real life speech, people don't say such things in Chinese, just as English speakers don't say "Suzie sells sea shells by the sea shore". If they do, they'll say it slower to avoid ambiguities, maybe rephrase it as well, e.g. "You remember Suzie? She's into sea shells. She sells them by the beach."


Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. :-)

http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo...


If we remove the adjectives formed from proper nouns, turn the nouns into "people", turn the verbs into "bully", and add a pronoun and an adjective, we get

    people whom other people bully bully people.
This makes sense.

But when we remove the pronoun and adjective and we get

    people people bully bully people
which is as far I can tell gibberish (without changing meanings such that they don't correspond with the original sentence). It can't have anything to do with many people persons, because in the original there's an adjective there.

If we assume that, just as there are social adept people, whom we call "people people", there are also socially adept buffalo called "buffalo buffalo", we can get "Buffalo buffalo buffalo buffalo buffalo". But to go beyond that is to claim an incoherent sentence missing many words is coherent and grammatical.


It seems to me that if you replace "whom other" with "that" you end up with a similar meaning:

    people that people bully bully people
I've found it fairly common to hear people drop the "that" when speaking - so in this case:

    people people bully...
the first people refers to the people being bullied while the second people refers to the people doing the bullying. It's a bit clearer if you replace some of the nouns - for instance:

    dogs people bully bully cats
which I believe is a perfectly legal English sentence.


dogs people bully bully cats

Aha! It finally makes sense. Thanks.

You should edit the wikipedia article, because your explanation is way better than the one contained therein.


"Buffalo buffalo" is supposed to indicate "buffalo from Buffalo", the place.

And "people people bully bully people" makes sense to me if I think about it, although it doesn't sound natural due to the word repetition.

Maybe "people others bully bully others"?


Wikipedia has a good interpretation:

[Those] buffalo(es) from Buffalo [that are intimidated by] buffalo(es) from Buffalo intimidate buffalo(es) from Buffalo.


Even though it was tough to go through slough in the snow, I ploughed through without a thought.


I've heard Slough is tough enough to go through even without snow :P


And there's still space for 'thorough' and 'trough' in that story...


Here's the wikipedia entry on this poem along with the audio.

http://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_D...


It's interesting to compare the original version in Classical Chinese with the vernacular Chinese one.

The Classical Chinese version just sounds (err, reads) so much more awesome, like a smooth-talking wisecrack who can't be bothered to dispense extra breaths, while the vernacular version reads like a seven-year old's essay.

I always had the impression that Classical Chinese was painstakingly crafted and preserved by a small group of elitists obsessed with aesthetics and abhorred ease-of-learning. The ultra-minimalist syntax seems very "inorganic", but it makes every line poetry and invitation for puns. As a result, most people were illiterate, but if you were well-read, boy, the fun you could have with words.



...that's what Shi said. :-)


Hmmm, there is potential here for multi-lingual homophonic exercises...


Stop this shi7 before it gets out of hand.


I downvoted you, and expect to be downvoted myself ...

But I laughed.

/me is, apparently, twelve.


Its an amusing one because its easier for foreigners learning mandarin to say then cantonese speaking chinese who then learn mandarin to say.

In my chinese class in the north they just it as a way of testing southern chinese ability at speaking mandarin.


While its true that native cantonese speakers can generally be identified, its not the case that they lack the ability to speak mandarin. They just speak mandarin with a southern regional accent. Native mandarin speakers from taiwan are also easily distinguished from beijingers.

Generally, cantonese speakers actually pick up mandarin a lot more easily than mandarin speakers pick up cantonese.

Adopting an accent can be a lot harder than adopting a dialect because 1. your brain doesn't parse the new sounds well if you didn't grow up hearing them and 2. you have no experience creating these sounds and its a lot harder to learn to produce them. There's a lot of literature about native accent adoption in second language acquisition that suggests an age dependency due to increased neural plasticity in youth (even though there's a lot of dispute about that critical period for language (not accent) acquisition).


the number after the word only indicates what tone it is. always good to review which tone is which :

shi1 = shī

shi2 = shí

shi3 = shǐ

shi4 = shì


True, it is also good to have an idea which tone means what. As far as I know it's: high-level, raising, falling-and-then-raising and falling (the shape of the dashes over i gives the hint) http://en.wikipedia.org/wiki/Mandarin_phonology#Tones


That is awesome! Illustrative diacritics, who would have thought? I still would love to hear the sentence spoken.


http://www.mdbg.net/chindict/rsc/audio/voice_pinyin_cl/shi1.... (shī)

http://www.mdbg.net/chindict/rsc/audio/voice_pinyin_cl/shi2.... (shí)

http://www.mdbg.net/chindict/rsc/audio/voice_pinyin_cl/shi3.... (shǐ)

http://www.mdbg.net/chindict/rsc/audio/voice_pinyin_cl/shi4.... (shì)

For the falling-then-raising (shǐ) changing the "shi3" to "chi3" in the URL gives a better idea, I think, for what the inflection sounds like.


Agree on the last point -- at least, for Taiwan. In fact, those idealized graphs caused me a bit of a headache early on. Native mandarin speakers in Taiwan tend to drop the end of the 3rd tone, so it falls slowly then holds at the bottom for a minute, then rises slightly or even just cuts off.


I've asked a few Japanese people to say "She sells sea shells by the sea shore..." and it comes out sounding a lot like that... Tricky because Japanese uses Shi and Chi sounds but not Si on its own.

Don't worry, they get plenty of revenge on me.


there is a whole lot of Chinese stuff like this:

http://gist.github.com/568487


Ironically, when pronouncing this you don't move your tongue at all.


Not quite true: although the tongue is not as obviously involved as it is in stop sounds (like /t/ or /k/), friction between the tongue and the air are what make the sound /ʂ/ (here transcribed as "sh"), while during a vowel sound (/i/), the air passage is open. So the tongue does have to move a little relative to the roof of the mouth; just not very much.


I really don't move my tongue when I say these in a row. Is this because I'm speaking with a Beijing accent? Apparently what's coming out for me is: ʂ̺ɻ̩. Or should that combination also force me to move my tongue?

When I lay off the accent and move the i to the front of my mouth a bit, I do seem to move my tongue a little. I'm not sure I'm doing it right though, the i's in non-Beijing Mandarin are actually always a bit stressful for me, but I'm pretty sure I'm pronouncing my "shi" correctly for Beijing.


It's likely that your tongue isn't moving much. It's also possible that the tongue isn't moving front-to-back (as it would for the English word "she") or that it isn't moving with respect to your lower jaw or that it isn't moving very much compared to nearly any other set of sounds you might utter. But the tongue must be moving if you are making any differentiated sound at all there.


I like to use this dictionary to check pronunciation, meaning and stroke order:

http://www.mdbg.net/chindict/chindict.php?page=worddict&...


Not exactly a tongue twister, but here's something similar from Korean: 가가가가가? (gagagagaga?)

In the Southeastern (Gyeongsangdo) dialect of Korean, this means "Does he have the surname 'Ga' ?"


"This thread is useless without" mp3s. In all seriousness, I'd love if it somebody took a crack at this because I'd love to hear it. (And I'm sure the karma would flow.)



Found one on a spam site

http://www.dvpod.cn/TC9KZ2E4/


It'll be boring since most of the time you hear will be shi.


A recording a friend of mine made: http://kassens.net/shi-shi-shi-shi-shi.mov


Imagine how hard it must be to make understandable text-to-speech / speech synthesis for Chinese.


...and how hard it would be for voice recognition systems too.


That's unlikely. How do Chinese people write a sentence on the computer? Old systems would have you write the letters, eg.: "shi", then display a long list of all the possible characters with that sound (for all 4 tones). Newer systems use context to figure out the right characters for the whole sentence at once (probably using naive Bayes --- see Peter Norvig's spell checker), so that I can just type "Wo shi Jianada ren" and the output would be the correct 6 Chinese characters. I never need to specify the tones.

So I would assume that voice recognition could do even better by analyzing tonal information.


Thanks for the link to Peter Norvig's spell checker - interesting reading there.


A Spanish tongue twister: Tres tristes tigres trigaban trigo en un trigal.


The most complicated twister I know is in Russian:

  Карл у Клары украл кораллы, а Клара у Карла украла кларнет
(Karl u Klary ukral korally, a Klara u Karla ukrala klarniet — Karl stole corals from Klara and Klara stole a clarinet from Karl). Audio: http://en.wikipedia.org/wiki/File:Russian_tongue-twister_-_K...

Alternating "rl" and "lar" make it tricky.


What does it mean? Three sad tigers...


Three sad tigers ate wheat in a wheat field.


something something ate corn in a cornfield?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: