[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
user:zeman:transliteration-of-urdu-to-latin-script [2010/11/10 14:18]
zeman New vowels with hamza.
user:zeman:transliteration-of-urdu-to-latin-script [2010/11/16 08:21]
zeman
Line 1: Line 1:
 ====== Transliteration of Urdu to Latin Script ====== ====== Transliteration of Urdu to Latin Script ======
 +
 +Copyright © 2010 by Dan Zeman <zeman@ufal.mff.cuni.cz>
 +License: GNU GPL
 +This research has been supported by the grant of the Czech Ministry of Education no. MSM0021620838. 
  
 ===== Transliteration versus Transcription ===== ===== Transliteration versus Transcription =====
Line 18: Line 22:
 ===== Consonants ===== ===== Consonants =====
  
-Most of the consonants do not pose any serious problem. I decided to represent the retroflex consonants (ٹ, ڈ, ڑ) by a dot below their dental or other counterparts, as is usual across the Indo-Aryan languages. A dot above a letter distinguishes two Arabic letters whose Urdu pronunciation is identical to other letters, from transliteration of those other letters (ث, ذ). Similarly, a cedilla below a letter distinguishes other five letters that occur in words of Arabic descent (ح, ص, ض, ط, ظ).+Most of the consonants do not pose any serious problem. I decided to represent the retroflex consonants (ٹ, ڈ, ڑ) by a dot below their dental or other counterparts, as is usual across the Indo-Aryan languages. A dot above a letter distinguishes two Arabic letters whose Urdu pronunciation is identical to other letters, from transliteration of those other letters (ث, ذ). Similarly, a cedilla below a letter distinguishes five other letters that occur in words of Arabic descent (ح, ص, ض, ط, ظ).
  
 Some other notes: //j// is pronounced as in English, not as in Czech or German. //č// and //š// are used in Baltic and Slavic languages (among others) to represent the sounds that are usually written “ch” or “sh”, respectively, in English. Of similar descent is the character //ž//; the corresponding sound is sometimes represented as “zh” in English and corresponds to the French pronunciation of //j//. //x// represents (in accord with phonetic tradition) the same sound as Czech/German/Scottish “ch”. English-oriented transcriptions of Arabic often transcribe this sound as “kh”, a solution that we want to avoid. It would conflict with the aspirated //kh// of Urdu. //ğ// is taken from Turkish and describes the sound that is often transcribed “gh” from Arabic (which we cannot use, again because of the aspirated //gh//). Some other notes: //j// is pronounced as in English, not as in Czech or German. //č// and //š// are used in Baltic and Slavic languages (among others) to represent the sounds that are usually written “ch” or “sh”, respectively, in English. Of similar descent is the character //ž//; the corresponding sound is sometimes represented as “zh” in English and corresponds to the French pronunciation of //j//. //x// represents (in accord with phonetic tradition) the same sound as Czech/German/Scottish “ch”. English-oriented transcriptions of Arabic often transcribe this sound as “kh”, a solution that we want to avoid. It would conflict with the aspirated //kh// of Urdu. //ğ// is taken from Turkish and describes the sound that is often transcribed “gh” from Arabic (which we cannot use, again because of the aspirated //gh//).
Line 113: Line 117:
 ===== Short Vowels and Diacritics ===== ===== Short Vowels and Diacritics =====
  
-Without diacritics (which is more common), every consonant that is not followed by a long vowel may or may not be followed by a short vowel. I denote this possibility by the character for the neutral character schwa: //ə//.+Without diacritics (which is more common), every consonant that is not followed by a long vowel may or may not be followed by a short vowel. I denote this possibility by the character for the neutral vowel schwa: //ə//.
  
 //Warning! This section is under construction. I am still confused about the exact rules for Urdu vowel representation, so I also expect more errors to occur here.// //Warning! This section is under construction. I am still confused about the exact rules for Urdu vowel representation, so I also expect more errors to occur here.//
Line 153: Line 157:
 <code bash>perl translit_urdund.pl < urdu.txt > latin.txt</code> <code bash>perl translit_urdund.pl < urdu.txt > latin.txt</code>
  
-If you happen to sit on the ÚFAL network, you will find the script in ''~zeman/projekty/transliterace''. It should be able to find the library itself; the library is in ''~zeman/lib/translit'' (you will programs and libraries for other writing systems in these two folders as well).+If you happen to sit on the ÚFAL network, you will find the script in ''~zeman/projekty/transliterace''. It should be able to find the library itself; the library is in ''~zeman/lib/translit'' (you will find programs and libraries for other writing systems in these two folders as well).
  
 I am also attaching the current snapshot of the two folders to this wiki {{:user:zeman:translit.zip|here}}. Note however that it will not be updated regularly. I am also attaching the current snapshot of the two folders to this wiki {{:user:zeman:translit.zip|here}}. Note however that it will not be updated regularly.
Line 161: Line 165:
   * آپ کو پچھلے 182 دنوں میں اپنی بیماری یا معزوری کے سبب مندرجہ ذیل میں سے کوئی ایک ملتا رہا ہے ؟   * آپ کو پچھلے 182 دنوں میں اپنی بیماری یا معزوری کے سبب مندرجہ ذیل میں سے کوئی ایک ملتا رہا ہے ؟
   * āp ko pəčhəle 182 dənoñ meñ əpənī b[yīe]mārī yā məˀəz[wūo]rī ke səbəb mənədərəjəh ż[yīe]l meñ se koī ek mələtā rəhā he ?   * āp ko pəčhəle 182 dənoñ meñ əpənī b[yīe]mārī yā məˀəz[wūo]rī ke səbəb mənədərəjəh ż[yīe]l meñ se koī ek mələtā rəhā he ?
 +
 +Afterwards, a speaker of Urdu is supposed to edit the transliteration and disambiguate all remaining cases:
 +
 +  * āp ko p**ə**čh**ə**le 182 d**ə**noñ meñ **ə**p**ə**nī b**[yīe]**mārī yā m**ə**ˀ**ə**z**[wūo]**rī ke s**ə**b**ə**b m**ə**n**ə**d**ə**r**ə**j**ə**h ż**[yīe]**l meñ se koī ek m**ə**l**ə**tā r**ə**hā he ?
 +    * Of alternatives in brackets, one has to be selected. Sometimes the brackets do not list all possibilities but they are easy to guess. For instance, [yīe] should in fact be [əyə|ī|e].
 +    * The schwa //ə// is a shortcut for [aiu] or an empty string (no vowel here).

[ Back to the navigation ] [ Back to the content ]