This is an old revision of the document!
Table of Contents
Persian (fa)
Persian Dependency Treebank (پیکرۀ وابستگی)
Versions
- “Preversion” 0.1 (January 2012, 12,200 annotated sentences)
- Final version (expected fall 2012, 30,000 sentences)
Obtaining and License
The treebank is available for free under the GNU GPLicense (with the additional requirement that the data be used non-commercially). Complete the license form and they will send you the data by e-mail. (You may also contact info(at)dadegan(dot)ir or Mohammad Sadegh Rasooli.) The license in short:
- non-commercial usage
- redistribution permitted under the same license
- citation of publications not explicitly required but it is common courtesy
PDT was created by members of the Dadegan Research Group (دادگان, Dādegān), Computer Engineering Department, Iran University of Science and Technology (دانشگاه علم و صنعت ایران), Tehrān تهران, Iran. The copyright lies with the Supreme Council of Information and Communication Technology (SCICT).
References
- Website
- Data
- no separate citation
- Principal publications
- Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Manouchehr Kouhestani, Behrouz Minaei-Bidgoli: A Syntactic Valency Lexicon for Persian Verbs: The First Steps towards Persian Dependency Treebank. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 227-231, Poznań, Poland, 2011.
- Documentation
- Attached to the data distribution: Persian Dependency Treebank Version 0.1, Annotation Manual and User Guide, Dadegan Research Group, Tehran, Iran, 2012.
Domain
Unknown.
Size
12200 annotated sentences.
Inside
Provided in the CoNLL data format. The morphosyntactic annotation contains lemmas. Morphosyntactic / part-of-speech tags have been assigned manually. The text does not contain diacritical marks to distinguish short vowels (not normally shown in Persian text).
Sample
Parsing
Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%).
I am not aware of any published results of Persian dependency parsing.