[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
user:zeman:transliteration-of-urdu-to-latin-script [2010/11/09 13:05]
zeman vytvořeno
user:zeman:transliteration-of-urdu-to-latin-script [2010/11/09 16:14]
zeman Hamza.
Line 22: Line 22:
 Some other notes: //j// is pronounced as in English, not as in Czech or German. //č// and //š// are used in Baltic and Slavic languages (among others) to represent the sounds that are usually written “ch” or “sh”, respectively, in English. Of similar descent is the character //ž//; the corresponding sound is sometimes represented as “zh” in English and corresponds to the French pronunciation of //j//. //x// represents (in accord with phonetic tradition) the same sound as Czech/German/Scottish “ch”. English-oriented transcriptions of Arabic often transcribe this sound as “kh”, a solution that we want to avoid. It would conflict with the aspirated //kh// of Urdu. //ğ// is taken from Turkish and describes the sound that is often transcribed “gh” from Arabic (which we cannot use, again because of the aspirated //gh//). Some other notes: //j// is pronounced as in English, not as in Czech or German. //č// and //š// are used in Baltic and Slavic languages (among others) to represent the sounds that are usually written “ch” or “sh”, respectively, in English. Of similar descent is the character //ž//; the corresponding sound is sometimes represented as “zh” in English and corresponds to the French pronunciation of //j//. //x// represents (in accord with phonetic tradition) the same sound as Czech/German/Scottish “ch”. English-oriented transcriptions of Arabic often transcribe this sound as “kh”, a solution that we want to avoid. It would conflict with the aspirated //kh// of Urdu. //ğ// is taken from Turkish and describes the sound that is often transcribed “gh” from Arabic (which we cannot use, again because of the aspirated //gh//).
  
-| **Unicode** | **Character** | **Pronunciation** | **Transliteration** |+I do not attempt to map the special Semitic guttural consonant //ayin// to a Latin letter following pronunciation of a European language, as this sound is very peculiar to most Europeans. In transcription of Arabic, it is sometimes represented by superscript //c//. We use the IPA symbol ˀ (MODIFIER LETTER GLOTTAL STOP). 
 + 
 +The letter ں (NOON GHUNNA) occurs only at the end of the word and marks nasalization of the preceding vowel rather than a real consonant. 
 + 
 +There are two //h// letters: ہ (HEH GOAL) and ھ (HEH DOACHASHMEE). It is not necessary to distinguish them by diacritics as they occur in different positions. The normal consonant //h// is written using ہ (HEH GOAL), which can also appear at the end of the word to mark an (otherwise invisible) word-final short vowel //a// (transcribed //ah//). In contrast, ھ (HEH DOACHASHMEE) is used exclusively after other consonants (such as //k, g, č, j, t, d, b, p//) to form their aspirated counterparts. Thus, بھ is //bh//, پھ is //ph// etc. 
 + 
 +Unicode Character Pronunciation Transliteration ^
 | 0628 | ب | b | b | | 0628 | ب | b | b |
 | 067E | پ | p | p | | 067E | پ | p | p |
Line 54: Line 60:
 | 0645 | م | m | m | | 0645 | م | m | m |
 | 0646 | ن | n | n | | 0646 | ن | n | n |
-| 06BA | ں | n | |+| 06BA | ں | n | ñ |
 | 0648 | و | v | w | | 0648 | و | v | w |
 | 06C1 | ہ | h | h | | 06C1 | ہ | h | h |
 | 06BE | ھ | h | h | | 06BE | ھ | h | h |
 | 06CC | ی | j | y | | 06CC | ی | j | y |
 +
 +===== Vowels =====
 +
 +The consonant (or semi-vowel) و //(w)// is also ambiguously used to represent the long vowels //ū// (pronounced as //oo// in English //fool//) and //o// (pronounced as //oo// in English //door//). I want to distinguish these three pronunciations. In most cases however, the script can only output //[wūo]// and leave the disambiguation to a human judgment:
 +
 +  * In word-initial position, I assume that only consonantal pronunciation is possible and always output //w//.
 +  * Anywhere immediately before ا (ALEF), I assume that only consonantal pronunciation is possible and always output //w//.
 +  * In word-final position, I believe that vowel is more likely although I am not sure that the consonant can be completely excluded. Nevertheless, I currently output //[ūo]//.
 +  * If it appears immediately before word-final ں (NOON GHUNNA), I consider it part of plural oblique case suffix and invariably output //o//.
 +  * In all other cases I output //[wūo]//.
 +
 +The consonant (or semi-vowel) ی //(y)// is also ambiguously used to represent the long vowels //ī// (pronounced as //ee// in English //feet//) and //e// (pronounced roughly as //ai// in English //fair//). I want to distinguish these three pronunciations. In most cases however, the script can only output //[yīe]// and leave the disambiguation to a human judgment:
 +
 +  * In word-initial position, I assume that only consonantal pronunciation is possible and always output //y//.
 +  * Anywhere immediately before ا (ALEF), I assume that only consonantal pronunciation is possible and always output //y//.
 +  * In word-final position, I assume that the only possible reading is //ī//.
 +  * In all other cases I output //[yīe]//.
 +
 +The letter ے (YEH BARREE) only appears in word-final position and is transliterated as //e// (which is written in other positions using the ambiguous ی).
 +
 +The letter ا (ALEF) is ambiguous and can lead to many different readings:
 +
 +  * In word-initial position, it merely says that the word begins with a vowel. It could be any of the three short vowels //[aiu]//: افریقہ //afrīqah// “Africa”, اسلام //islām// “Islam”, اردو //urdū// “Urdu”.
 +    * If word-initial ا is followed by و or ی, they together could represent a word-initial long vowel //[ūoīe]//, such as in ایک //ek// “one”. In this case, ا should map to an empty string (because the next character itself will allow for transliteration by the long vowel).
 +  * In word-internal and word-final positions, ا is transliterated to the long vowel //ā// (pronounced as //a// in English //father//).
 +
 +The letter آ (ALEF MADDA) only appears in word-initial position and is transliterated as //ā// (which is written in other positions using normal ا).
 +
 +The YEH with the diacritic HAMZA above separates two consecutive vowels, e.g. جائے گا //jāe gā// “will go” or کوئی //koī// “some”.
 +
 +Similarly, the diacritic HAMZA above a و separates it from the preceding vowel as in ہاؤسنگ //hāūsing// “housing”. (In this case, the hamza is a separate character that is placed in the logical sequence after the و.)
 +
 +^ Unicode ^ Character ^ Pronunciation ^ Transliteration ^
 +| 0627 | ا | -, a: | a, i, u, 0, ā |
 +| 0622 | آ | a: | ā |
 +| 0648 | و | v, u:, o: | w, ū, o |
 +| 06CC | ی | j, i:, e: | y, ī, e |
 +| 06D2 | ے | e: | e |
 +| 0626 | ئ | - | 0 |
 +| 0674 | ٔ (high hamza) | - | 0 |
  

[ Back to the navigation ] [ Back to the content ]