Differences

This shows you the differences between two versions of the page.

--- user:zeman:treebanks:tr [2012/03/22 20:57]
zeman Sample.
+++ user:zeman:treebanks:tr [2012/03/22 21:25]
zeman Parsing.
@@ Line 29: / Line 29: @@
   * Principal publications
     * Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, Gökhan Tür: Building a Turkish Treebank. In: Anne Abeillé (ed.): Building and Exploiting Syntactically Annotated Corpora. Kluwer Academic Publishers, 2003.
-    * Nart B. Atalay, Kemal Oflazer, Bilge Say: The Annotation Process in the Turkish Treebank. In: Proceedings of the EACL Workshop on Linguistically Interpreted Corpora – LINC. Budapest, Hungary, 2003.
+    * Nart B. Atalay, Kemal Oflazer, Bilge Say: [[http://aclweb.org/anthology-new/W/W03/W03-2405.pdf|The Annotation Process in the Turkish Treebank]]. In: Proceedings of the EACL Workshop on Linguistically Interpreted Corpora – LINC. Budapest, Hungary, 2003.
   * Documentation
     * Three PDF files are attached to the CoNLL version in the ''doc'' folder: ttbankkl.pdf (the chapter from Anne Abeillé, contains list of morphological tags), turkishtreebank.pdf (the paper from the EACL workshop) and user_guide.pdf (annotation manual for dependencies, in Turkish).
@@ Line 43: / Line 43: @@
 ==== Inside ====
-The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.
+The original METU-Sabanci Treebank is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is distributed in the [[:format-conll|CoNLL 2006/2007 format]].
-Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.
+Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually.
-Personal names have been collapsed into one token, using underscore as the joining character (e.g. Torgyán_József).
+There are special derivational nodes. Derived words have been split into several tokens (see also the sample below).
 ==== Sample ====
@@ Line 78: / Line 78: @@
 ==== Parsing ====
-SzTB is a mildly nonprojective treebank. 4032 of the 139,143 tokens of the CoNLL 2007 version are attached nonprojectively (2.9%).
+Nonprojectivity rate in METU-Sabanci is relatively high. 3716 of the 69695 tokens of the CoNLL 2007 version are attached nonprojectively (5.33%).
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Hungarian:
+The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Turkish:
 ^ Parser (Authors) ^ LAS ^ UAS ^
-| Malt (Nilsson et al.) | 80.27 | 83.55 |
+| Titov et al. | 79.81 | 86.22 |
-| Sagae | 79.53 | 83.51 |
+| Malt (Nilsson et al.) | 79.79 | 85.77 |
-| Nakagawa | 76.74 | 82.49 |
+| Nakagawa | 78.22 | 85.77 |
-| Titov et al. | 77.94 | 82.18 |
+| Keith Hall | 77.42 | 85.18 |
+| Malt (Johan Hall) | 79.24 | 85.04 |
 The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences