Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Why do Sanskrit users use Latin script when they have their own?
4 points by trifit on Jan 31, 2023 | hide | past | favorite | 11 comments
Case in point - https://youtu.be/RBCk1SyC1PA [the video is titled Main Rang Sharbato Ka - I am a blend of juices] when it could have been written in Urdu or Hindi they chose for it to be in Hinglish? Do other languages do that?



You seem to be conflating languages and scripts. I'm not sure what language "Main Rang Sharbato Ka" (Hindi?) but as you point out, it could be written in another script, I assume Devanagari or Arabic script. It would still be Hindi(?) though.

Though languages are often associated with one primary writing system, often another can be adapted for use.

Look at how Vietnamese was classically written with a form of Chinese characters, but in modern times is written in Latin script using a system adapted from Portuguese.

I think your question is still a good one (which I don't know the answer to) but it's worth being precise with the terminology :)


Yes, the languages are Hindustani (Hindi and Urdu) and English, while the scripts are Latin, Devanagari, and usually Nastaliq. Compare Russian (language) vs. Cyrillic (script). The use of the Latin alphabet to write languages that aren't traditionally written in it is called Romanization, and it exists for all sorts of reasons.

https://en.wikipedia.org/wiki/Hinglish says

> In India, Romanized Hindi is the dominant form of expression online. In an analysis of YouTube comments, Palakodety et al., identified that 52% of comments were in Romanized Hindi, 46% in English, and 1% in Devanagari Hindi.

The most common explanation for use of Romanization online between native speakers of a language that's traditionally written in some other script is limitations of computer systems. Those can include things like

* inconsistent character encoding support (different people or different software systems don't agree on which encoding to use, or some software system actually corrupts or filters out some encodings)

* poor input method support (it's hard to type in some other script on some devices)

* poor rendering support (some software system doesn't display some script well, e.g. failing to render ligatures between connected characters properly)

* legacy of historical software problems (people may have learned to use earlier systems that didn't have good language support, and may have developed workarounds that then persisted because of cultural inertia or individual habits)

* limited literacy (people might not have enough training and practice in a script to be fully comfortable reading and/or writing it, even if they're fluent speakers orally)

Almost every computer system supports ASCII perfectly, and so that can easily become a lowest common denominator for representing various languages. This is not uncommon online with Arabic and Chinese, in addition to Hindustani.

But I don't actually know the answer here; those are just some common trends related to representation of languages on computers...


Based on experiences dealing with unicode/non-unicode issues, sounds like a historical more support for latin character set per length of time available vs. the length of time relevant unicode character set(s) has been available.

Python language unicode discussion : https://realpython.com/python-encodings-guide/


Thanks, was referring to characters not written natively. Cool, now just need to find a Vietnamese girlfriend to share this with so I can score some points :P


Because of having lot more character references? :-)


I’m surprised nobody has mentioned IAST: https://en.wikipedia.org/wiki/International_Alphabet_of_Sans...


I think there's a huge trend of people inventing their own informal romanizations even when there is already a reasonable and standard scholarly version.

https://en.wikipedia.org/wiki/Beta_Code - but people (including Greek speakers) still transliterate Greek to English-style, like https://en.wikipedia.org/wiki/Greeklish

https://en.wikipedia.org/wiki/Romanization_of_Hebrew#Compara... - see "Common Israeli" (despite the existence of scholarly options like SBL, and the fact that the informal transliterations conflate tons of different things that are spelled differently but pronounced alike in modern Israeli Hebrew)

https://en.wikipedia.org/wiki/Romanization_of_Arabic#Compari... - as with Hebrew, the https://en.wikipedia.org/wiki/Arabic_chat_alphabet versions will also sometimes elide differences that are preserved in writing in Arabic script

https://en.wikipedia.org/wiki/Pinyin - but people (including some Chinese speakers) sometimes still transliterate Chinese to English-style (and often omit the tones)

My impression is that, of these, native speakers of the mentioned languages are most familiar with Pinyin, probably because it's officially taught and tested in school in many Chinese-speaking countries. I don't think romanization methods for the other languages are taught in school and that might help account for why many people often don't know them, or at least don't know them in detail.


Neat, do you have this enabled as an additional keyboard in your phone?


By the way, Sanskrit is an ancient Indian language and, while it's still extensively used, it's not a main language of day-to-day communication by Indian people.

The writing system that developed with Sanskrit and that's still used for both Sanskrit and several modern languages of India is called Devanagari.

https://en.wikipedia.org/wiki/Devanagari


Yes, interesting learnt something new about my own culture today! Apple has called it Hindi and they’ve got dictation to Devanagari which is super responsive and accurate - a pleasant surprise.


Hindi might be an appropriate term if they're doing speech-to-text or text-to-speech or machine translation (because there are things there that will be specific to the individual spoken language variety, even involving distinguishing Hindi and Urdu for some purposes).

They could also, for example, have support for Nepali, which is also normally written in the same alphabet but is definitely a different language from Hindi. In that case the language support would be different at a software level, like the vocabulary and/or grammatical patterns that the speech-to-text system is trying to recognize.

The problem (whether for a computer or a human being!) of figuring out what language something is actually written in can be tricky when many languages use identical or related scripts:

https://en.wikipedia.org/wiki/Wikipedia:Language_recognition...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: