This is an old revision of the document!
Urdu Part-of-Speech Tags
Hassan Sajjad, Center for Research in Urdu Language Processing, National University of Computer & Emerging Sciences, Lahore, Pakistan, 7.12.2007
- Tagset description: http://www.crulp.org/Downloads/langproc/UrduPOStagger/UrduPOStagset.pdf
- POS Tagged Urdu Corpus: http://www.crulp.org/Downloads/ling_resources/parallelcorpus/Urdu%20Tagged%20Corpus%20(100k).zip
- Urdu Stemmer. This is a Windows GUI program. It requires that some files be in a fixed path but it works. However, its precision is questionable. For example, it segments “ناموں” as “نا|موں” (prefix|stem).
- Urdu Finite State Morphological Analyzer. This is a Windows program. I have not been able to run it because it requires Microsoft Visual C++, particularly the
mfc42ud.dll
library (Unicode debug version). However, there is a text file with the lexicon that could be potentially converted for PC Kimmo. - Urdu Statistical POS Tagger: http://www.crulp.org/software/langproc/POS_tagger.htm
- English-to-Urdu MT (based on LFG): http://www.crulp.org/software/langproc/E2UMachineTranslationSystem.htm
- Hassan Sajjad, Helmut Schmid: Tagging Urdu Text with Parts of Speech: A Tagger Comparison (EACL 2009 Athens): http://portal.acm.org/citation.cfm?id=1609067.1609144, http://www.aclweb.org/anthology/E/E09/E09-1079.pdf
- Urdu Emille POS Tagset: http://www.lancs.ac.uk/staff/hardiea/cl03_urdu.pdf
- Urdu Tagging Challenges (presentation): http://www.panl10n.net/Presentations/Laos/RegionalConference/CorpusCollection/Tagset_and_Tagging_Urdu_Corpus.pdf