Hacker News new | past | comments | ask | show | jobs | submit login

I'm a native English speaker. I studied a little French and speak (and can read/write) some German (with limited vocabulary). For the English speaker, German grammar is painful enough (3 genders, agreement of nouns and adjectives by case, number and article, separable verbs, etc) but, based on what reading I've done, German is still "Latin lite".

A friend of mine (also native English speaker) lives in Taiwan and has learnt traditional Chinese (rather than simplified, as is now taught and used in China). The character memorization, to me at least, is horrifying. But the grammar is fairly simple (apart from tonal variation in words).

The evolution of languages (linguistics I guess) interests me greatly so first up, thanks for the post.

With all these different language systems, one has to wonder how they evolve.

Several events in history are of particular interest.

The first is what happened to English. English in the 10th century was basically the same language as German (Althochdeutsch or Old High German, to be precise). In 1066, the Normans conquered England, bringing French which became the official and court language of England for several centuries.

This had several important effects:

Firstly, many words migrated to English from French and Latin (although there were Latin words previously). Often the English form was "low brow" whereas the Latin/French version were "high brow".

Secondly, without a central authority enforcing a language standard, the language evolved hugely. Old English for Modern English speakers is basically unreadable. Middle English is mostly comprehensible. This was a massive change.

Along the way, English basically lost the concept of case (apart from pronouns), gender (again, apart from pronouns) and word agreement. Grammar was also greatly simplified (almost everything in English is done on word position rather than word ending like most Indo-European languages).

The TL:DR version of this is that there is a reasonable case to be made that a central authority actually stifles language "innovation". If true, one could argue that how arbitrary (rather than regular) a language is is a measure of the state control over that language during its history.

I don't presume to argue that this is true but it's an interesting idea.

The second interesting historical event was the switch in 1929 from the Arabic alphabet to a Latin alphabet for Turkish, which had a massive increase in literacy in the following years (since the alphabet is simple and phonetic).

The third interesting event is the rise of computers. I would argue that is was almost inevitable that this would happen in a country with either the Latin or Cyrilic alphabets. The reasons are:

1. Limited number of characters (think: keyboards and character sets); and

2. Limited variation in those characters (in English: uppercase and lowercase). Compare that to Arabic.

IMHO (1) is incredibly important. In Mandarin, if I want to tell you a new character I have to show you. There is no way to describe it. In English, a new word can be communicate verbally. I don't think you can overstate how important this distinction is.

Asian computer use has evolved a number of schemes to get around these issues, such as character combinations to represent certain characters (which, again, you have to learn) and, more recently, the use of graphics pads to draw characters.

The last thing I wanted to mention was this paper [1], which shows a mathematical relationship between languages becoming regular and the frequency with which words are used (the more a word is used, the longer it takes to become regular).

Anyway, enough rambling on my part.

[1[: http://www.physorg.com/news111241495.html




There is a certain amount of misinformation in this post that I'd like to clear up:

1) In the 10th century, (Old) English and (Old) German were already somewhat divergent, and not really mutually intelligible, but closely related.

2) The same goes for Norman and French. While in the same language family, these were not quite the same language.

3) Have you read any Chaucer? I'm not sure I'd call Middle English "mostly comprehensible". It takes a lot of work and a good ME dictionary at hand. By contrast, Early Modern English (think Shakespeare) is more in the line of "mostly comprehensible".

4) It's not just the fact there was no central authority that caused English to massively change. There still isn't a central authority, but change has been slow and non-radical for the last 500 years or so. Having a central authority doesn't really prevent change, either, although it might help to slow it down.

5) It's not at all clear what you mean by "arbitrary" in your claim about language change and state control, but you certainly have not backed up any claim about languages under "state control" being more "regular".

6) The alphabet switch for Turkish did not succeed because the Latin alphabet is "simple and phonetic". The Arabic alphabet is also simple and phonetic, as is (or as could be) any alphabet. It was not a great fit for the Turkish language, however, and the Latin alphabet (with a few additions) mapped better.

7) But much more important than any facts about the Latin alphabet was the fact that Kemal Ataturk was a dictator who pushed through a massive literacy programme on the population (who, it should also be said, was basically supportive of this goal). Had he done an alphabet reform instead---adding a few letters to the Arabic alphabet to make it a better fit for Turkish---and accompanied it with the same literacy policy, it'd do just as well. (Better, arguably, because it would have left the writings of the Ottomans much more accessible to the literate modern population.)

You might be right about computers arising in alphabet countries (I'd broaden that to any non-ideographic writing system, including Greek, Korean, and those of the Indian subcontinent and southeast Asia), but it might be more accurate to say that if computers had arisen first in China they would have looked very, very different. (Even in Japan, they might have gone the route of kana-only systems first, in a similar way to how early western machines had ALL CAPS interfaces.)


On the simplification of roman script, I've just travelled through Vietnam, and learned that Ho Chi Minh did a similar thing to Ataturk - he mandated that Vietnamese be written in Roman script (as set down by a Portuguese bloke several centuries ago) and not in the traditional Chinese script. His reasoning was that the easier it is to learn, the better for the general population.

I can't say what Vietnam looked like before the change, but certainly today there's writing blazed over everything in the cities - and there's very little in the way of images or pictographs to indicate what a shop might sell to an illiterate (I may have been unaware of other indicators). They do make prodigious use of diacritics to adapt Roman script to Vietnamese, though.

It was while puzzling over these diacritics that I finally realised that English uses (needs?) these as well, we just don't write them down - wind (moving air) and wind (make a coil) are pronounced differently, but without diacritics, someone has to tell you how to do so.


Be careful who you call a dictator. While Ataturk has been President of Turkey for 15 years (until his death in 1938) he encouraged a multi-party system. However, during his lifetime several parties were formed and again self-dissolved or dissolved after an uncovered assassination attempt on Ataturk. It's only in 1945 - after Ataturk's death - that the multi-party system in Turkey took off for real.


From what I can tell Ataturk was mostly a benevolent dictator.

(And like a good wine, he gets better with every passing year since his dead. When I was in Ankarka in 2008, they had pictures / flags of him on the high rise buildings covering five storeys.)


Actually, isn't the concept of a kana-only character system how early, domestically-marketed microcomputers in Japan did in fact work? The only example I can think of is a home entertainment console, but the Nintendo Family Computer generally used kana with a few highly common kanji.


Yes, early computers in Japan used katakana only. They were half-width (normally they are written in a square box) so as to be compatible with the Latin alphabet. This is also how telegrams were sent, starting from when Japan began modernizing in 1868.


First: I lived in China for a year studying Mandarin, and spent 3 years studying at school.

You're wrong in that there is no way to describe characters in Chinese to other speakers. There are certain words you use to describe strokes. The equivalent in English would be something like "there is a cross on the left and a flower on the right." Chinese speakers do this all the time, so I'm kind of surprised you jump to this conclusion when it's clear you aren't knowledgeable on the issue.


Care to explain how you'd explain this [1] to someone else such that they could reproduce it in a readable fashion? Or one of these [2]?

[1] http://necromanc.blogspot.com/2006/05/most-complicated-chine...

[2] http://www.chinese-forums.com/index.php?/topic/437-most-comp...


A challenge!

First of all, [1] is ridiculous. Second of all, you just do it recursively. I would describe the first example roughly as follows:

Walking radical (162); cave top (116); to the left a left-right combination with "moon" on the left (74) and a thread radical (52) above "long" (168) on the right; in the middle "speech" (149) above "horse" (187); to the right, a left-right combination with thread above "long" on the left, and a knife radical (18) on the right; under all of that, a heart radical (61).

Numbers refer to the chart here

http://www.yellowbridge.com/chinese/radicals.php

In practice, it is very common for two Chinese people to meet for the first time and explain to each other which characters their name are. This normally does not involve pulling out pencil and paper.


That's the equivalent of explaining how to spell "supercalifragilisticexpialidocious" -- yeah, it's an English word, but it's ridiculous and nobody really expects you to know how to spell it.

Most simplified characters can be explained with a handful of strokes. Furthermore, many of them can be broken down into "radicals" which are commonly repeated patterns.


I know #1 looks completely insane, but I have been studying Chinese for about 2 years, and I just started writing 3 or 4 months ago, and the breakdown of it is actually fairly simple. There's actually a somewhat limited set of characters that are reused over and over again, and I have already learned to write every one of the pieces (*edit: In that particular character). It's already been broken down in a post above me so I won't do it again...

Some characters would be harder for me to remember than this, because this particular character is made up of common used components.


齉 == nose left, bag right.

龘 == three dragons stack

驫 == three horses stack

It's pretty simple and straight forward.


> IMHO (1) is incredibly important. In Mandarin, if I want to tell you a new character I have to show you. There is no way to describe it.

Have you actually studied hanzi? It is very easy to describe a character verbally, and if you live in Asia for any period of time, you will see that people do this quite often. There are only 214 Kangxi radicals[0] (plus some variations based upon how much space is available). Clearly not the same as having 26 letters, but not unmanageable by any stretch of the imagination.

The second difference is that characters are "spelled" in 2 dimensions. Once again, there is a set of rules for radical placement, and if you're familiar with these (as you would be if you'd studied Chinese or Japanese), it is very straightforward.

0: http://en.wikipedia.org/wiki/Kangxi_radicals


This is true, but I've noticed that in practice, Chinese or Japanese trying to identify characters to each other (rather than look up an unfamiliar character in a dictionary) tend to draw rather than list the components.


I think it depends mostly on how hard the explanation is or how available paper is.

I had a friend of mine show me how to explain how to write her name in Japanese. There are even special names for the different radical forms (e.g. ninben vs. ningen).


I think some of radicals are homonyms of other radicals. Most of the radicals have a lot of near-homonyms.Also, those radicals can change their shape depending on what position they are in.

Yes, it's possible, but nowhere near as easy.


>English in the 10th century was basically the same language as German (Althochdeutsch or Old High German, to be precise). In 1066, the Normans conquered England, bringing French which became the official and court language of England for several centuries.

Old English and Old High German being "basically the same language" is a very ignorant thing to say and categorically wrong. Old High German might have been much more close to Old English than modern standard German is to English but they weren't mutually intelligible and they were clearly different languages even at that time. Major differences between the two had arisen many centuries earlier. Take for example the High German consonant shift. Now, would you please explain why you would say that both were "basically the same language"?


Old English (aka Anglo-Saxon) was a Low German dialect, closer to Dutch in many ways, and to Danish in others, than to High German. Which isn't surprising when you consider the Angles, Saxons, and Jutes came from the North Sea coast region from southern Denmark to the eastern Netherlands.


No, it was not a Low German dialect. It was what is called an Ingvaeonic dialect, most closely related to Old Saxon and Old Frisian. Old High German is what is sometimes called Istvaeonic (and the Franconian dialects that were to become Dutch is also part of that group). There is a bunch of important differences between Ingvaeonic and Istvaeonic, and both are quite, quite different from North Germanic.


> In Mandarin, if I want to tell you a new character I have to show you. There is no way to describe it.

This isn't strictly true. Some characters are simple combinations of radicals and other characters and can be sufficiently described as saying "the radical for x and the character for y." For example, the word for hungry (饿, è) is a combination of the radical for "eat" or "food" (饣, shí) and the character for "self" or "me" (我, wǒ).

I'm only a beginning student of the language, so I can't claim that many other characters are as simple to describe.


Chinese characters may seem intimidating until you remember that we read and memorize English words, not letters, when reading - and Chinese characters are much the same. If you don't know a character you can slow down and figure it out based on its composition

The Chinese composition just has many more variables than the English alphabet (it is a few hundred, IIRC)


Interesting post and interesting linked article. One part strikes me as especially noteworthy:

“Lieberman, Michel, and colleagues expect that some 15 of the 98 modern irregular verbs they studied -- although likely none of these top 10 -- will regularize in the next 500 years.”

Now, is this based on previous evolution of verbs? Because the past thousand years have had a certain feasibility of communication, locally, nationally and globally. I think the past 100 years with the advent of radio, television and now the internet, language is really going to evolve at a rate previously unseen in history.

The whole globe is connected, textually, verbally and visually, and it's immediate and constant. The past thousand years the only way to get your novel usage of a particular word or grammatical construct was to either go to some venue and talk, send a letter, or write a book. Now you can spread your literary love everywhere, constantly and with a wide audience. And not only to people with your local dialect, but every dialect. What a melting pot.

I'm quite a lover of language evolution. I moved to Italy a year ago in a very multilingual office, and my French and Italian colleagues noted how nice it is in English that you can verb nouns. It hadn't occurred to me that this wasn't possible in French or Italian. I expressed that though English is quite liberal and almost anything goes in a lot of areas, I still wish that people were more accepting of linguistic novelties. People scorn you if you play with language, or actively drop old ways, or invent new words, with the exception of high school kids who, in my experience, are the most inventive English speakers I've seen. When I was in school the amount of new language and idioms introduced every week was overwhelming.

I'm quite descriptivist, though. I like dictionaries that are extremely up to date, like [Wordnik](http://www.wordnik.com/), that encourage people to just use words freely, and take 3 seconds to explain to their partner in conversation what the word means, without fear that their new word isn't cromulent! (I just added "cromulent" to Chrome's dictionary.) Some words I like to use when talking to myself (hey, kids do it, so sue me), are words that don't exist already but are the 'root' of existing words, like inane (“That's quite ane.”), edible (“I think I'll ed some peanut butter sarnies”), etc.

Anyway, I'm rambling, too.


On the other hand, prescription has a far bigger reach today than even thirty years ago. Trivial example: When everything you write gets spell checked automatically, new orthography develops slower.

Also immediate communication can slow down a language, and homogenize it. Radio and TV certainly brought the German dialects closer together.

It depends on the patterns of communication. The internet allows lots of small groups to interact with each other all over the world. That has a different effect than the few to many pattern you get with traditional mass media.

You might enjoy http://verben.texttheater.net/Englisch and if you know German, you might enjoy http://verben.texttheater.net/ even more. On the German version they are doing stuff like inane, ane, or overwhelmed, underwhelmed, whelmed.


Chromulant: a fake word legitimised on a personal basis by adding it to Chrome's dictionary. (verb: Chromulate).


I don't know if it's completely true that English has less character variation than Arabic— as you say, in English there is a choice of upper or lower-case, with occasional changes in meaning. In Arabic, the form of a letter is completely determined by the letter it follows. It's purely a display difference, not a separate character set. There are the diacritics to think about, but outside of the Quran they are simply ignored.

That's still a big problem for a universal language of the internet though, since written Arabic is highly non-phonetic.


A couple of corrections I feel I have to make as an Arab.

> There are the diacritics to think about, but outside of the Quran they are simply ignored.

They are not ignored - when they are present, attention is paid to them. I think what you mean is that Arabic speakers, knowing what the diacritics are, do not bother writing them down. That is not because they ignore them, it is because we have paid such close attention to them when learning Arabic that we no longer need to be reminded of them.

> That's still a big problem for a universal language of the internet though, since written Arabic is highly non-phonetic.

Arabic is highly phonetic, and if being phonetic was a criterion for being universal language of the internet, English should be disqualified immediately.

On arrival in England at the age of 10, I had no idea how English people knew how to pronounce their words. Now I know that non-Arab speakers may think the same way about Arabic because we do not write down the diacritics by default... but it is easy to buy books that have these diacritics written, and thus to crack the code.

But English seemed designed to trap foreigners into mispronunciations, to the great amusement of my classmates. (Traveling to America after college, it was mostly place names that tripped me up.)


American place names are a constant source of confusion and amusement even among Americans, largely because many of them are adapted from American Indian words. You might be a perfectly normal English-speaking American, but if you've never been to the state of Washington before you won't know how to pronounce "Puyallup" or "Sequim" just by reading them.


Yeah, I'm sorry, I was highly unclear. What you said is what I meant :) I was saying that the Arabic character set is actually simpler, but vocalized Arabic becomes harder again.


Someone more familiar with Arabic may need to correct me, however I was under the impression that the shape of the letters is not at all affected by the letters it follows, but by their position in the word.

Arabic letters look different depending on whether they are in the initial, medial, final position in a word (as well as having a 4th form when they appear in isolation). However, there are patterns of similarity so it's not as difficult as having to learn 4 completely random shapes for each letter.


That would be me :) No, the preceding letter matters. Some letters (د،ذ،ر،ز،ؤ) have no medial form-- they take the terminal form instead, and the following letter takes the initial form.

There are patterns (17 by my count?) among which letters differ only minorly, so it's not as bad as learning four forms for each letter, but there are some additional gotchas too. For example "ل-ا" ("laa") is always written as a single character "lamalif": لا. (Bonus knowledge: that's also the word for "no". You can see it at the beginning of the Shahada: ...لا إله إلا الله <- "There is no god but God..." etc.)


Not quite. There are several letters that don't connect to the letter that follows them, so a letter that follows them will appear in the 'initial' form even though it's not at a word boundary.


There are some differences in the way the way the characters are laid out in English too, unless you use a monospace font.


Those differences are not to my knowledge as a native English speaker used to convey any sort of meaning, and as demonstrated by the existence of monospaced fonts have no particular importance. They are merely an artifact of the glyph geometry.


They don't convey any sort of meaning in Arabic either - letters just change their shapes depending on where they occur in a word. Cursive English handwriting does the same thing, to a lesser degree, and for the same reasons.


Like where? In cursive, at best what changes is that some ending curse goes up or down a bit more. The existence of cursive computer fonts shows that even in cursive writing, the shape of letters can be the same across a text. Minor changes are caused by the speed of writing the cursive.

On the contrary, and as I understand it, in Arabic there are rules on how to change the shape of letters in certain contexts. In Arabic it's part of the writing system, in Western European languages it's not.


I give you Typographic Ligature: http://en.wikipedia.org/wiki/Typographic_ligature


Exactly, ligatures don't have any grammatical meaning. They're merely that - 'typographic'.


And the Arabic letters don't have any grammatical meaning. They are also merely typographic.

I think we are saying the same thing here.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: