[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Persian (fa)

Persian Dependency Treebank (پیکرۀ وابستگی)

Versions

Obtaining and License

The treebank is available for free under the GNU GPLicense (with the additional requirement that the data be used non-commercially). Complete the license form and they will send you the data by e-mail. (You may also contact info(at)dadegan(dot)ir or Mohammad Sadegh Rasooli.) The license in short:

PDT was created by members of the Dadegan Research Group (دادگان, Dādegān), Computer Engineering Department, Iran University of Science and Technology (دانشگاه علم و صنعت ایران), Tehrān تهران, Iran. The copyright lies with the Supreme Council of Information and Communication Technology (SCICT).

References

Domain

Unknown.

Size

12200 annotated sentences.

Inside

Provided in the CoNLL data format. The morphosyntactic annotation contains lemmas. Morphosyntactic / part-of-speech tags have been assigned manually. The text does not contain diacritical marks to distinguish short vowels (not normally shown in Persian text).

Sample

The first sentence of the corpus in the CoNLL format:

1 به به PREP PREP attachment=ISO|senID=23472 26 ADV _ _
2 گزارش گزارش N IANM attachment=ISO|number=SING|senID=23472 1 POSDEP _ _
3 خبرنگار خبرنگار N ANM attachment=ISO|number=SING|senID=23472 2 MOZ _ _
4 مهر مهر N IANM attachment=ISO|number=SING|senID=23472 3 MOZ _ _
5 در در PREP PREP attachment=ISO|senID=23472 3 NPP _ _
6 گرگان گرگان N IANM attachment=ISO|number=SING|senID=23472 5 POSDEP _ _
7 ، ، PUNC PUNC attachment=ISO|senID=23472 6 PUNC _ _
8 بر بر PREP PREP attachment=ISO|senID=23472 26 ADV _ _
9 اساس اساس N IANM attachment=ISO|number=SING|senID=23472 8 POSDEP _ _
10 باورهای باور N IANM attachment=ISO|number=PLUR|senID=23472 9 MOZ _ _
11 دینی دینی ADJ AJP attachment=ISO|senID=23472 10 NPOSTMOD _ _
12 ترکمن‌ها ترکمن N ANM attachment=ISO|number=PLUR|senID=23472 10 MOZ _ _
13 در در PREP PREP attachment=ISO|senID=23472 26 ADV _ _
14 این این PREM DEMAJ attachment=ISO|senID=23472 15 NPREMOD _ _
15 روز روز N IANM attachment=ISO|number=SING|senID=23472 13 POSDEP _ _
16 برای برای PREP PREP attachment=ISO|senID=23472 26 NPP _ _
17 پیامبر پیامبر N ANM attachment=ISO|number=SING|senID=23472 16 VPP _ _
18 اکرم اکرم ADJ AJP attachment=ISO|senID=23472 17 NPOSTMOD _ _
19 ( ( PUNC PUNC attachment=ISO|senID=23472 20 PUNC _ _
20 ص ص ADJ AJP attachment=ISO|senID=23472 17 APP _ _
21 ) ) PUNC PUNC attachment=ISO|senID=23472 20 PUNC _ _
22 ناراحتی ناراحتی N IANM attachment=ISO|number=SING|senID=23472 26 SBJ _ _
23 و و CONJ CONJ attachment=ISO|senID=23472 22 NCONJ _ _
24 بیماری بیماری N IANM attachment=ISO|number=SING|senID=23472 23 POSDEP _ _
25 رخ رخ N IANM attachment=ISO|number=SING|senID=23472 26 NVE _ _
26 داد داد#ده V ACT person=3|attachment=ISO|number=SING|tma=GS|senID=23472 0 ROOT _ _
27 که که SUBR SUBR attachment=ISO|senID=23472 26 AJUCL _ _
28 چند چند PREM AMBAJ attachment=ISO|senID=23472 29 NPREMOD _ _
29 روز روز N IANM attachment=ISO|number=SING|senID=23472 39 ADV _ _
30 بعد بعد ADJ AJP attachment=ISO|senID=23472 29 NPOSTMOD _ _
31 با با PREP PREP attachment=ISO|senID=23472 39 ADV _ _
32 رحلت رحلت N IANM attachment=ISO|number=SING|senID=23472 31 POSDEP _ _
33 نبی نبی N ANM attachment=ISO|number=SING|senID=23472 32 MOZ _ _
34 مکرم مکرم ADJ AJP attachment=ISO|senID=23472 33 NPOSTMOD _ _
35 اسلام اسلام N IANM attachment=ISO|number=SING|senID=23472 33 MOZ _ _
36 جهان جهان N IANM attachment=ISO|number=SING|senID=23472 39 SBJ _ _
37 عزادار عزادار ADJ AJP attachment=ISO|senID=23472 39 MOS _ _
38 ماتمش ماتم N IANM attachment=ISO|number=SING|senID=23472 37 MOZ _ _
39 شد کرد#کن V PASS person=3|attachment=ISO|number=SING|tma=GS|senID=23472 27 PRD _ _
40 . . PUNC PUNC attachment=ISO|senID=23472 26 PUNC _ _

Parsing

Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%).

I am not aware of any published results of Persian dependency parsing.


[ Back to the navigation ] [ Back to the content ]