Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
user:zeman:treebanks:fa [2012/03/10 11:58] zeman Tokenization. |
user:zeman:treebanks:fa [2015/06/24 15:26] (current) zeman The license form is no longer accessible at the location where it was in 2012. |
==== Obtaining and License ==== | ==== Obtaining and License ==== |
| |
The treebank is available for free under the GNU GPLicense (with the additional requirement that the data be used non-commercially). Complete the [[http://dadegan.ir/en/content/user-agreement-persian-dependency-treebank|license form]] and they will send you the data by e-mail. (You may also contact info(at)dadegan(dot)ir or Mohammad Sadegh Rasooli.) The license in short: | The treebank is available for free under the GNU GPLicense (with the additional requirement that the data be used non-commercially). Contact the Dadegan Research Group using their on-line form at http://dadegan.ir/en/contact-us and ask them for the data. The license in short: |
| |
* non-commercial usage | * non-commercial usage |
* Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Manouchehr Kouhestani, Behrouz Minaei-Bidgoli: [[http://dadegan.ir/sites/default/files/A%20Syntactic%20Valency%20Lexicon%20for%20Persian%20Verbs%20The%20First%20Steps%20towards%20Persian%20Dependency%20Treebank.pdf|A Syntactic Valency Lexicon for Persian Verbs: The First Steps towards Persian Dependency Treebank]]. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 227-231, Poznań, Poland, 2011. | * Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Manouchehr Kouhestani, Behrouz Minaei-Bidgoli: [[http://dadegan.ir/sites/default/files/A%20Syntactic%20Valency%20Lexicon%20for%20Persian%20Verbs%20The%20First%20Steps%20towards%20Persian%20Dependency%20Treebank.pdf|A Syntactic Valency Lexicon for Persian Verbs: The First Steps towards Persian Dependency Treebank]]. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguistics, pp. 227-231, Poznań, Poland, 2011. |
* Documentation | * Documentation |
* Attached to the data distribution: {{:user:zeman:treebanks:persian-dependency-treebank-version-0.1-annotation-manual-and-user-guide.pdf|Persian Dependency Treebank Version 0.1, Annotation Manual and User Guide}}, Dadegan Research Group, Tehran, Iran, 2012. | * Attached to the data distribution: {{:user:zeman:treebanks:persian-dependency-treebank-version-0.1-annotation-manual-and-user-guide.pdf|Persian Dependency Treebank Version 0.1, Annotation Manual and User Guide}}, Dadegan Research Group, Tehran, Iran, 2012. (http://dadegan.ir/sites/default/files/Persian%20Dependency%20Treebank%20Version%200.1%20Annotation%20Manual%20and%20User%20Guide.pdf) |
| |
==== Domain ==== | ==== Domain ==== |
==== Parsing ==== | ==== Parsing ==== |
| |
Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%). | Nonprojectivities in PDT are relatively rare. Only 3357 of the 189,572 tokens are attached nonprojectively (1.77%). |
| |
I am not aware of any published results of Persian dependency parsing. | I am not aware of any published results of Persian dependency parsing. Our own experiments gave 86.84% unlabeled attachment score with Malt Parser, the stack-lazy algorithm. |