Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:treebanks:fa [2012/01/28 18:45] zeman vytvořeno |
user:zeman:treebanks:fa [2012/01/29 18:19] zeman Update. I have seen the data! |
||
---|---|---|---|
Line 10: | Line 10: | ||
==== Obtaining and License ==== | ==== Obtaining and License ==== | ||
- | The treebank is available for free after completing | + | The treebank is available for free under the GNU GPLicense (with the additional requirement that the data be used non-commercially). Complete |
- | * non-commercial | + | * non-commercial usage |
- | * redistribution | + | * redistribution |
- | * citation of publications not specified | + | * citation of publications not explicitly required but it is common courtesy |
- | PDT was created by members of the [[http:// | + | PDT was created by members of the [[http:// |
==== References ==== | ==== References ==== | ||
* Website | * Website | ||
- | * http://www.bultreebank.org/indexBTB.html | + | * http://dadegan.ir/en/ |
* Data | * Data | ||
* //no separate citation// | * //no separate citation// | ||
* Principal publications | * Principal publications | ||
- | * Kiril Simov, Petya Osenova, Alexander Simov, Milen Kouylekov: //Design and Implementation of the Bulgarian HPSG-based Treebank.// In: Erhard Hinrichs, Kiril Simov (eds.): Journal of Research on Language and Computation, Special Issue, vol. 2, no. 4, pp. 495–522, Kluwer Academic Publishers, ISSN 1570-7075. 2004. | + | * Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Manouchehr Kouhestani, Behrouz Minaei-Bidgoli: |
* Documentation | * Documentation | ||
- | * Kiril Simov, Petya Osenova, Milena Slavcheva: [[http:// | + | * Attached to the data distribution: {{:user:zeman:treebanks:persian-dependency-treebank-version-0.1-annotation-manual-and-user-guide.pdf|Persian Dependency Treebank Version 0.1, Annotation Manual and User Guide}}, Dadegan Research Group, Tehran, Iran, 2012. |
- | * Petya Osenova, Kiril Simov: [[http:// | + | |
- | * http:// | + | |
==== Domain ==== | ==== Domain ==== | ||
- | Unknown | + | Unknown. |
==== Size ==== | ==== Size ==== | ||
- | The CoNLL 2006 version contains 196,151 tokens in 13221 sentences, yielding 14.84 tokens per sentence on average (CoNLL 2006 data split: 190,217 tokens / 12823 sentences training, 5934 tokens / 398 sentences test). | + | 12200 annotated |
==== Inside ==== | ==== Inside ==== | ||
- | The original morphosyntactic tags have been converted to fit into the three columns (CPOS, POS and FEAT) of the CoNLL format. There //should// be a 1-1 mapping between | + | Provided in the [[:format-conll|CoNLL |
- | + | ||
- | The morphological analysis does not include | + | |
- | + | ||
- | The guidelines for syntactic annotation are documented | + | |
==== Sample ==== | ==== Sample ==== | ||
- | |||
- | The first three sentences of the CoNLL 2006 training data: | ||
- | |||
- | | 1 | Глава | _ | N | Nc | _ | 0 | ROOT | 0 | ROOT | | ||
- | | 2 | трета | _ | M | Mo | gen=f< | ||
- | | |||||||||| | ||
- | | 1 | НАРОДНО | _ | A | An | gen=n< | ||
- | | 2 | СЪБРАНИЕ | _ | N | Nc | gen=n< | ||
- | | |||||||||| | ||
- | | 1 | Народното | _ | A | An | gen=n< | ||
- | | 2 | събрание | _ | N | Nc | gen=n< | ||
- | | 3 | осъществява | _ | V | Vpi | trans=t< | ||
- | | 4 | законодателната | _ | A | Af | gen=f< | ||
- | | 5 | власт | _ | N | Nc | _ | 3 | obj | 3 | obj | | ||
- | | 6 | и | _ | C | Cp | _ | 3 | conj | 3 | conj | | ||
- | | 7 | упражнява | _ | V | Vpi | trans=t< | ||
- | | 8 | парламентарен | _ | A | Am | gen=m< | ||
- | | 9 | контрол | _ | N | Nc | gen=m< | ||
- | | 10 | . | _ | Punct | Punct | _ | 3 | punct | 3 | punct | | ||
- | |||
- | The first three sentences of the CoNLL 2006 test data: | ||
- | |||
- | | 1 | Единственото | _ | A | An | gen=n< | ||
- | | 2 | решение | _ | N | Nc | gen=n< | ||
- | | |||||||||| | ||
- | | 1 | Ерик | _ | N | Np | gen=m< | ||
- | | 2 | Франк | _ | N | Np | gen=m< | ||
- | | 3 | Ръсел | _ | H | Hm | gen=m< | ||
- | | |||||||||| | ||
- | | 1 | Пълен | _ | A | Am | gen=m< | ||
- | | 2 | мрак | _ | N | Nc | gen=m< | ||
- | | 3 | и | _ | C | Cp | _ | 2 | conj | 2 | conj | | ||
- | | 4 | пълна | _ | A | Af | gen=f< | ||
- | | 5 | самота | _ | N | Nc | _ | 2 | conjarg | 2 | conjarg | | ||
- | | 6 | . | _ | Punct | Punct | _ | 2 | punct | 2 | punct | | ||
==== Parsing ==== | ==== Parsing ==== | ||
Line 88: | Line 47: | ||
Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%). | Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%). | ||
- | The results | + | I am not aware of any published results |
- | + | ||
- | ^ Parser (Authors) ^ LAS ^ UAS ^ | + | |
- | | MST (McDonald et al.) | 87.57 | 92.04 | | + | |
- | | Malt (Nivre et al.) | 87.41 | 91.72 | | + | |
- | | Nara (Yuchang Cheng) | 86.34 | 91.30 | | + |