[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Persian (fa)

Persian Dependency Treebank (پیکرۀ وابستگی)

Versions

Obtaining and License

The treebank is available for free under the GNU GPLicense (with the additional requirement that the data be used non-commercially). Complete the license form and they will send you the data by e-mail. (You may also contact info(at)dadegan(dot)ir or Mohammad Sadegh Rasooli.) The license in short:

PDT was created by members of the Dadegan Research Group (دادگان, Dādegān), Computer Engineering Department, Iran University of Science and Technology (دانشگاه علم و صنعت ایران), Tehrān تهران, Iran. The copyright lies with the Supreme Council of Information and Communication Technology (SCICT).

References

Domain

Unknown.

Size

12200 annotated sentences.

Inside

Provided in the CoNLL data format. The morphosyntactic annotation contains lemmas. Morphosyntactic / part-of-speech tags have been assigned manually. The text does not contain diacritical marks to distinguish short vowels (not normally shown in Persian text).

Sample

Parsing

Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%).

I am not aware of any published results of Persian dependency parsing.


[ Back to the navigation ] [ Back to the content ]