[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
user:zeman:treebanks:tr [2012/03/22 21:11]
zeman Link to the ACL Anthology.
user:zeman:treebanks:tr [2013/06/18 14:51]
zeman Uploaded ttbankkl.pdf.
Line 31: Line 31:
     * Nart B. Atalay, Kemal Oflazer, Bilge Say: [[http://aclweb.org/anthology-new/W/W03/W03-2405.pdf|The Annotation Process in the Turkish Treebank]]. In: Proceedings of the EACL Workshop on Linguistically Interpreted Corpora – LINC. Budapest, Hungary, 2003.     * Nart B. Atalay, Kemal Oflazer, Bilge Say: [[http://aclweb.org/anthology-new/W/W03/W03-2405.pdf|The Annotation Process in the Turkish Treebank]]. In: Proceedings of the EACL Workshop on Linguistically Interpreted Corpora – LINC. Budapest, Hungary, 2003.
   * Documentation   * Documentation
-    * Three PDF files are attached to the CoNLL version in the ''doc'' folder: ttbankkl.pdf (the chapter from Anne Abeillé, contains list of morphological tags), turkishtreebank.pdf (the paper from the EACL workshop) and user_guide.pdf (annotation manual for dependencies, in Turkish).+    * Three PDF files are attached to the CoNLL version in the ''doc'' folder: {{:user:zeman:treebanks:ttbankkl.pdf|ttbankkl.pdf}} (the chapter from Anne Abeillé, contains list of morphological tags), turkishtreebank.pdf (the paper from the EACL workshop) and user_guide.pdf (annotation manual for dependencies, in Turkish).
  
 ==== Domain ==== ==== Domain ====
Line 47: Line 47:
 Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually.
  
-There are special derivational nodes. Derived words have been split into several tokens (see also the sample below).+There are special derivational nodes. Derived words have been split into several tokens (see also the sample below). Typical pattern (maybe the only pattern but I have not confirmed that) is as follows: There are two nodes connected with a dependency link. The head node corresponds to the surface word. It has the word form, part of speech and morphological features but it has no lemma (lemma is '_'). The surface word is a result of a derivational morphological process. It has been derived from another word, often a different part of speech (e.g. a noun was derived from a verb). The dependent node represents the source of the derivation. It has no word form but it has a lemma. Its part-of-speech tag describes the source word and thus it can differ from the part-of-speech tag of the head node. The FEAT column says just 'Pos'. The dependent node need not be a leave. Other nodes may depend on it, instead of depending on the parent node. If we have a noun derived from a verb, i.e. we have a verbal node depending on the nominal node, and there is a dependent filling a verbal valency slot of the derived noun, we can expect the dependent to be attached to the verbal node. 
 + 
 +Occasionally there are derivational chains longer than two nodes. An example is in the sentence No. 82 of the test data: 
 +lemma azal / Verb -> _ / Verb / Caus -> _ / Verb / Pass|Pos -> azaltılması / Noun / NInf / A3sg|P3sg|Nom 
 +According to Google Translate, //azal// means “to decrease” and //azaltılması// means “reduced”. TRmorph gives the following four analyses: 
 +<code> 
 +analyze> azaltılması 
 +azal<v><caus><pass><vn_ma><p3s> 
 +azal<v><caus><pass><vn_ma><p3s><3s> 
 +azal<v><caus><pass><vn_ma><p3s><3p> 
 +azal<v><caus><pass><cv_ma><p3s> 
 +</code>
  
 ==== Sample ==== ==== Sample ====
Line 78: Line 89:
 ==== Parsing ==== ==== Parsing ====
  
-SzTB is a mildly nonprojective treebank4032 of the 139,143 tokens of the CoNLL 2007 version are attached nonprojectively (2.9%).+Nonprojectivity rate in METU-Sabanci is relatively high3716 of the 69695 tokens of the CoNLL 2007 version are attached nonprojectively (5.33%).
  
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Hungarian:+The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Turkish:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-Malt (Nilsson et al.80.27 83.55 +Titov et al. | 79.81 86.22 
-Sagae | 79.53 83.51 +Malt (Nilsson et al.) | 79.79 85.77 
-| Nakagawa | 76.74 82.49 +| Nakagawa | 78.22 85.77 
-Titov et al. | 77.94 82.18 |+Keith Hall | 77.42 85.18 
 +| Malt (Johan Hall) | 79.24 | 85.04 |
  
 The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].
  

[ Back to the navigation ] [ Back to the content ]