Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:transliteration-of-urdu-to-latin-script [2010/11/09 16:52] zeman Vocabulary. |
user:zeman:transliteration-of-urdu-to-latin-script [2010/11/10 13:38] zeman Where is the script? |
||
---|---|---|---|
Line 108: | Line 108: | ||
The transliteration script should contain a gradually growing vocabulary that would help disambiguate known words. Otherwise there would be a very high number of ambiguous positions in any transliterated string. | The transliteration script should contain a gradually growing vocabulary that would help disambiguate known words. Otherwise there would be a very high number of ambiguous positions in any transliterated string. | ||
- | ===== Vowel Diacritics ===== | + | ===== Short Vowels and Diacritics ===== |
+ | |||
+ | Without diacritics (which is more common), every consonant that is not followed by a long vowel may or may not be followed by a short vowel. I denote this possibility by the character for the neutral character schwa: //ə//. | ||
//Warning! This section is under construction. I am still confused about the exact rules for Urdu vowel representation, | //Warning! This section is under construction. I am still confused about the exact rules for Urdu vowel representation, | ||
Although used rarely, Urdu has means to mark the three short vowels as well. This is done using one of the three diacritical marks. Long vowels can be disambiguated as well, e.g. a consonant with the pesh mark followed by a waw without any diacritic means that the waw is a long vowel //[ūo]// but not the consonant //w//. | Although used rarely, Urdu has means to mark the three short vowels as well. This is done using one of the three diacritical marks. Long vowels can be disambiguated as well, e.g. a consonant with the pesh mark followed by a waw without any diacritic means that the waw is a long vowel //[ūo]// but not the consonant //w//. | ||
+ | |||
+ | ^ Unicode ^ Unicode Name ^ Urdu Name ^ With Beh ^ Transliteration ^ | ||
+ | | 064E | ARABIC FATHA | zabar | بَ | ba | | ||
+ | | 064F | ARABIC DAMMA | pesh | بُ | bu | | ||
+ | | 0650 | ARABIC KASRA | zer | بِ | bi | | ||
pesh (ARABIC DAMMA, 064F) ... u ... کُون //kon// “who” | pesh (ARABIC DAMMA, 064F) ... u ... کُون //kon// “who” | ||
Line 118: | Line 125: | ||
zer (ARABIC KASRA, 0650) ... i ... | zer (ARABIC KASRA, 0650) ... i ... | ||
- | Possible further reading: http:// | + | Possible further reading: |
- | http:// | + | * http:// |
+ | | ||
+ | |||
+ | ===== The Transliteration Script ===== | ||
+ | |||
+ | You need two files. All of the transliteration knowledge is encoded in the library '' | ||
+ | |||
+ | <code bash> | ||
+ | |||
+ | If you happen to sit on the ÚFAL network, you will find the script in '' | ||
+ | |||
+ | This is an example of an Urdu sentence and the romanized output by the script: | ||
+ | * آپ کو پچھلے 182 دنوں میں اپنی بیماری یا معزوری کے سبب مندرجہ ذیل میں سے کوئی ایک ملتا رہا ہے ؟ | ||
+ | * āp ko pəčhəle 182 dənoñ meñ əpənī b[yīe]mārī yā məˀəz[wūo]rī ke səbəb mənədərəjəh ż[yīe]l meñ se koī ek mələtā rəhā he ? |