[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
user:zeman:treebanks:en [2011/11/20 18:54]
zeman vytvořeno
user:zeman:treebanks:en [2014/03/18 08:57] (current)
zeman Treebank converter.
Line 37: Line 37:
     * Robert MacIntyre: [[ftp://ftp.cis.upenn.edu/pub/treebank/doc/faq.cd2|NP Heads and Base NPs]] (Treebank FAQ)     * Robert MacIntyre: [[ftp://ftp.cis.upenn.edu/pub/treebank/doc/faq.cd2|NP Heads and Base NPs]] (Treebank FAQ)
     * Richard Johansson, Pierre Nugues: [[http://dspace.utlib.ee/dspace/bitstream/handle/10062/2560/reg-Johansson-10.pdf;jsessionid=BB8432D9BAE4FCF9DD9BD746704E796F?sequence=1|Extended constituent-to-dependency conversion for English]]. In: Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA), pp. 105-112, Tartu, Estonia, 2007.     * Richard Johansson, Pierre Nugues: [[http://dspace.utlib.ee/dspace/bitstream/handle/10062/2560/reg-Johansson-10.pdf;jsessionid=BB8432D9BAE4FCF9DD9BD746704E796F?sequence=1|Extended constituent-to-dependency conversion for English]]. In: Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA), pp. 105-112, Tartu, Estonia, 2007.
 +      * The treebank converter that was used to convert the constituent trees of Penn Treebank to dependencies for the CoNLL shared tasks is documented at http://nlp.cs.lth.se/software/treebank_converter/.
  
 ==== Domain ==== ==== Domain ====
Line 245: Line 246:
 ==== Parsing ==== ==== Parsing ====
  
-PDT is a mildly nonprojective treebank8351 of the 437,020 tokens in the CoNLL 2007 version are attached nonprojectively (1.91%).+Nonprojectivities in the Penn Treebank are rareOnly 3819 of the 991,535 tokens in the CoNLL 2009 version are attached nonprojectively (0.39%).
  
-There is an [[http://ufal.mff.cuni.cz/czech-parsing/|online summary]] of known results in Czech parsing. +The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for English:
- +
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Czech: +
- +
-^ Parser (Authors) ^ LAS ^ UAS ^ +
-| MST (McDonald et al.) | 80.18 | 87.30 | +
-| Basis (O'Neil) | 76.60 | 85.58 | +
-| Malt (Nivre et al.) | 78.42 | 84.80 | +
-| Nara (Yuchang Cheng) | 76.24 | 83.40 | +
- +
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Czech:+
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| Nakagawa | 80.19 86.28 +| Carreras | 89.61 | 90.63 | 
-Carreras 78.60 85.16 +| Nakagawa | 88.41 90.13 
-| Titov et al. | 77.94 84.19 +Sagae 89.01 89.87 
-| Malt (Nilsson et al.) | 77.98 83.59 +| Titov et al. | 88.39 89.73 
-Attardi et al. 77.37 83.40 +| Malt (Nilsson et al.) | 88.11 88.93 
-Malt (Hall et al.) 77.22 82.35 |+Schiehlen 86.21 88.91 
 +Nguyen 86.73 88.01 |
  
 The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].
  
-The results of the CoNLL 2009 shared task are [[http://ufal.mff.cuni.cz/conll2009-st/results/results.php|available online]]. They have been published in [[http://aclweb.org/anthology/W/W09/W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for Czech:+The results of the CoNLL 2009 shared task are [[http://ufal.mff.cuni.cz/conll2009-st/results/results.php|available online]]. They have been published in [[http://aclweb.org/anthology/W/W09/W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for English:
  
 ^ Parser (Authors) ^ LAS ^ ^ Parser (Authors) ^ LAS ^
-| Merlo (Gesmundo et al.) | 80.38 +| Bohnet | 89.88 | 
-Bohnet 80.11 +| Chen | 89.19 | 
-| Che et al. | 80.01 |+| Merlo (Gesmundo et al.) | 88.79 
 +Asahara 88.54 
 +| Che et al. | 88.48 |
  

[ Back to the navigation ] [ Back to the content ]