Differences

This shows you the differences between two versions of the page.

--- user:zeman:transliteration-of-urdu-to-latin-script [2010/11/09 15:02]
zeman wy
+++ user:zeman:transliteration-of-urdu-to-latin-script [2010/11/09 16:52]
zeman Vocabulary.
@@ Line 68: / Line 68: @@
 ===== Vowels =====
-The consonant (or semi-vowel) و //(w)// is also ambiguously used to represent the long vowels //ū// (pronounced as //oo// in English //fool//) and //o// (pronounced as //oo// in English //door//). I want to distinguish these three pronunciations. In most cases however, the script can only output //[wūo]// and leave the disambiguation to a human judgment:
+The consonant (or semi-vowel) و //(w)// is also ambiguously used to represent the long vowels //ū// (pronounced as //oo// in English //fool//) and //o// (pronounced as //oo// in English //door//). I want to distinguish these three pronunciations (note however that I am not attempting to further distinguish //o// from the slightly different vowel //ao// that is pronounced as //au// in English //automatic//; I am pretending that these two are identical). In most cases however, the script can only output //[wūo]// and leave the disambiguation to a human judgment:
   * In word-initial position, I assume that only consonantal pronunciation is possible and always output //w//.
@@ Line 76: / Line 76: @@
   * In all other cases I output //[wūo]//.
-The consonant (or semi-vowel) ی //(y)// is also ambiguously used to represent the long vowels //ī// (pronounced as //ee// in English //feet//) and //e// (pronounced roughly as //ai// in English //fair//). I want to distinguish these three pronunciations. In most cases however, the script can only output //[yīe]// and leave the disambiguation to a human judgment:
+The consonant (or semi-vowel) ی //(y)// is also ambiguously used to represent the long vowels //ī// (pronounced as //ee// in English //feet//) and //e// (pronounced roughly as //ai// in English //fair//). I want to distinguish these three pronunciations (note however that I am not attempting to further distinguish //e// from the slightly different vowel //ae// that is pronounced more open; I am pretending that these two are identical). In most cases however, the script can only output //[yīe]// and leave the disambiguation to a human judgment:
   * In word-initial position, I assume that only consonantal pronunciation is possible and always output //y//.
@@ Line 82: / Line 82: @@
   * In word-final position, I assume that the only possible reading is //ī//.
   * In all other cases I output //[yīe]//.
+The letter ے (YEH BARREE) only appears in word-final position and is transliterated as //e// (which is written in other positions using the ambiguous ی).
+The letter ا (ALEF) is ambiguous and can lead to many different readings:
+  * In word-initial position, it merely says that the word begins with a vowel. It could be any of the three short vowels //[aiu]//: افریقہ //afrīqah// “Africa”, اسلام //islām// “Islam”, اردو //urdū// “Urdu”.
+    * If word-initial ا is followed by و or ی, they together could represent a word-initial long vowel //[ūoīe]//, such as in ایک //ek// “one”. In this case, ا should map to an empty string (because the next character itself will allow for transliteration by the long vowel).
+  * In word-internal and word-final positions, ا is transliterated to the long vowel //ā// (pronounced as //a// in English //father//).
+The letter آ (ALEF MADDA) only appears in word-initial position and is transliterated as //ā// (which is written in other positions using normal ا).
+The YEH with the diacritic HAMZA above separates two consecutive vowels, e.g. جائے گا //jāe gā// “will go” or کوئی //koī// “some”.
+Similarly, the diacritic HAMZA above a و separates it from the preceding vowel as in ہاؤسنگ //hāūsing// “housing”. (In this case, the hamza is a separate character that is placed in the logical sequence after the و.)
+^ Unicode ^ Character ^ Pronunciation ^ Transliteration ^
+| 0627 | ا | -, a: | a, i, u, 0, ā |
+| 0622 | آ | a: | ā |
+| 0648 | و | v, u:, o: | w, ū, o |
+| 06CC | ی | j, i:, e: | y, ī, e |
+| 06D2 | ے | e: | e |
+| 0626 | ئ | - | 0 |
+| 0674 | ٔ (high hamza) | - | 0 |
+The transliteration script should contain a gradually growing vocabulary that would help disambiguate known words. Otherwise there would be a very high number of ambiguous positions in any transliterated string.
+===== Vowel Diacritics =====
+//Warning! This section is under construction. I am still confused about the exact rules for Urdu vowel representation, so I also expect more errors to occur here.//
+Although used rarely, Urdu has means to mark the three short vowels as well. This is done using one of the three diacritical marks. Long vowels can be disambiguated as well, e.g. a consonant with the pesh mark followed by a waw without any diacritic means that the waw is a long vowel //[ūo]// but not the consonant //w//.
+pesh (ARABIC DAMMA, 064F) ... u ... کُون //kon// “who”
+zabar (ARABIC FATHA, 064E) ... a ...  کَون //kawn//
+zer (ARABIC KASRA, 0650) ... i ...
+Possible further reading: http://en.wikipedia.org/wiki/Arabic_diacritics
+http://users.skynet.be/hugocoolens/newurdu/vowels.html

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences