Hacker News new | past | comments | ask | show | jobs | submit login

Simplified and traditional characters are in perfect correspondence right? Could maybe translate on the fly?



Chinese Wikipedia https://zh.wikipedia.org automatically transliterates into Mainland Simplified, Hong Kong Traditional, Macau Traditional, Malaysia Simplified, Singapore Simplified and Taiwan Traditional. The article text can be in any variant (and is usually a mixture after getting edited by people using different standards) and if you visit an untransliterated page, you'll be prompted for your preference.

There's no simple one-to-one correspondence for all characters, but Wikipedia has multiple layers of special cases and exceptions that can cover most situations (including vocabulary differences). That doesn't mean the text always makes sense after transliteration: The article about the Taiwanese township of Shuili mentions that it was originally named 水裡, but then renamed to 水里 in 1966. At least in the Taiwanese Traditional version https://zh.wikipedia.org/zh-tw/%E6%B0%B4%E9%87%8C%E9%84%89 . If you read the Mainland Simplified one ( https://zh.wikipedia.org/zh-cn/%E6%B0%B4%E9%87%8C%E9%84%89 ) both of these names get simplified to 水里.


I've never really bought the multiple layers of special cases and exceptions. For example, British English and American English also have many of the same differences between simplified and traditional Chinese characters; e.g. British English and American English both agree on the verb form of "to curb" but disagree on "kerb" vs "curb." Yet AFAIK Wikipedia doesn't have a similar system that tries to convert from British English to American English or vice versa that also handles all these special cases apart from just the normal spelling differences.

Taiwanese Mandarin is really not that different from PRC Mandarin. In fact most college-educated mainland speakers can read works written in traditional characters just fine although I'm not sure on the Taiwanese side for simplified characters. In fact I think a substantial proportion (most?) of Chinese Wikipedia readers just don't bother with changing the character set either way and are fine with traditional/simplified character switches throughout the article (that's certainly the way I read it).


I wanted to find out whether there are any statistics on how this feature is used, but only found this Phabricator ticket: https://phabricator.wikimedia.org/T227904 Looks like they put doing research on user needs into the backlog for now.


> Simplified and traditional characters are in perfect correspondence right?

No, there is no perfect correspondence between simplified and traditional Chinese. The simplified Chinese collapses important characters such as “after” and “back”, often causing confusion (you can read more here [1]).

[1]: http://pages.ucsd.edu/~dkjordan/chin/SimplifiedCharacters.ht...


Not only that, but there are a couple of characters that are collapsed in traditional but split out in simplified:

"乾" in Simplified refers to only one of the 8 trigrams, as is used in the word "乾坤"

"乾" in Traditional can mean either the above OR the Simplified "干" (dry)

"干" also exists in Traditional but only means "stem" not "dry"

However, GP's point about translating on the fly is possible. You don't need a very advanced algorithm to translate simplified<->traditional almost perfectly. Unless you're translating poetry they can almost always be disambiguated by the nearby characters very well.

Note though that there are lots of actual word differences in mainland Mandarin and Taiwanese Mandarin. It's not difficult for one to read the other and maybe occasionally asking a question or two but it's nice for every human to have materials available in their native dialect and vocabulary. Much like you probably appreciate that your system offers both UK English and US English and doesn't force you to use the "other" one from what you're used to.


As a non-native English speaker, I actually hate that there is a split.


You should check out Singaporean English or "Singlish", which uses a lot of Chinese grammar and Hokkien and Malay vocabulary in English sentences. It's a wonderfully efficient dialect taking the best from all of these languages.


I heard it spoken when I was in Singapore. It was pretty fun, but I'm glad it isn't given equal status, and that we didn't have to learn any in school.


No. They are not in perfect correspondence. The mapping is one-to-multiple or multiple-to-one in quite a few characters, some of them commonly used. https://en.m.wikipedia.org/wiki/Ambiguities_in_Chinese_chara...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: