[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:transliteration-of-urdu-to-latin-script [2010/11/10 09:35]
zeman Schwa.
user:zeman:transliteration-of-urdu-to-latin-script [2010/11/10 13:38]
zeman Where is the script?
Line 116: Line 116:
 Although used rarely, Urdu has means to mark the three short vowels as well. This is done using one of the three diacritical marks. Long vowels can be disambiguated as well, e.g. a consonant with the pesh mark followed by a waw without any diacritic means that the waw is a long vowel //[ūo]// but not the consonant //w//. Although used rarely, Urdu has means to mark the three short vowels as well. This is done using one of the three diacritical marks. Long vowels can be disambiguated as well, e.g. a consonant with the pesh mark followed by a waw without any diacritic means that the waw is a long vowel //[ūo]// but not the consonant //w//.
  
-^ Unicode ^ Unicode Name ^ Urdu Name ^ With Alef ^ Transliteration ^ +^ Unicode ^ Unicode Name ^ Urdu Name ^ With Beh ^ Transliteration ^ 
-| 064E | ARABIC FATHA | zabar | َا +| 064E | ARABIC FATHA | zabar | بَ | ba 
-| 064F | ARABIC DAMMA | pesh | ُا +| 064F | ARABIC DAMMA | pesh | بُ | bu 
-| 0650 | ARABIC KASRA | zer | ِا |+| 0650 | ARABIC KASRA | zer | بِ | bi |
  
 pesh (ARABIC DAMMA, 064F) ... u ... کُون //kon// “who” pesh (ARABIC DAMMA, 064F) ... u ... کُون //kon// “who”
Line 129: Line 129:
   * http://users.skynet.be/hugocoolens/newurdu/vowels.html   * http://users.skynet.be/hugocoolens/newurdu/vowels.html
  
 +===== The Transliteration Script =====
 +
 +You need two files. All of the transliteration knowledge is encoded in the library ''urdund.pm''. The Perl script ''translit_urdund.pl'' merely reads the standard input, passes it through the library and sends the result to the standard output. It is called like this:
 +
 +<code bash>perl translit_urdund.pl < urdu.txt > latin.txt</code>
 +
 +If you happen to sit on the ÚFAL network, you will find the script in ''~zeman/projekty/transliterace''. It should be able to find the library itself; the library is in ''~zeman/lib/translit'' (you will programs and libraries for other writing systems in these two folders as well).
 +
 +This is an example of an Urdu sentence and the romanized output by the script:
 +
 +  * آپ کو پچھلے 182 دنوں میں اپنی بیماری یا معزوری کے سبب مندرجہ ذیل میں سے کوئی ایک ملتا رہا ہے ؟
 +  * āp ko pəčhəle 182 dənoñ meñ əpənī b[yīe]mārī yā məˀəz[wūo]rī ke səbəb mənədərəjəh ż[yīe]l meñ se koī ek mələtā rəhā he ?

[ Back to the navigation ] [ Back to the content ]