[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
user:zeman:treebanks:en [2011/11/20 18:54]
zeman vytvořeno
user:zeman:treebanks:en [2014/03/18 08:57]
zeman Treebank converter.
Line 37: Line 37:
     * Robert MacIntyre: [[ftp://​ftp.cis.upenn.edu/​pub/​treebank/​doc/​faq.cd2|NP Heads and Base NPs]] (Treebank FAQ)     * Robert MacIntyre: [[ftp://​ftp.cis.upenn.edu/​pub/​treebank/​doc/​faq.cd2|NP Heads and Base NPs]] (Treebank FAQ)
     * Richard Johansson, Pierre Nugues: [[http://​dspace.utlib.ee/​dspace/​bitstream/​handle/​10062/​2560/​reg-Johansson-10.pdf;​jsessionid=BB8432D9BAE4FCF9DD9BD746704E796F?​sequence=1|Extended constituent-to-dependency conversion for English]]. In: Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA), pp. 105-112, Tartu, Estonia, 2007.     * Richard Johansson, Pierre Nugues: [[http://​dspace.utlib.ee/​dspace/​bitstream/​handle/​10062/​2560/​reg-Johansson-10.pdf;​jsessionid=BB8432D9BAE4FCF9DD9BD746704E796F?​sequence=1|Extended constituent-to-dependency conversion for English]]. In: Proceedings of the 16th Nordic Conference on Computational Linguistics (NODALIDA), pp. 105-112, Tartu, Estonia, 2007.
 +      * The treebank converter that was used to convert the constituent trees of Penn Treebank to dependencies for the CoNLL shared tasks is documented at http://​nlp.cs.lth.se/​software/​treebank_converter/​.
  
 ==== Domain ==== ==== Domain ====
Line 245: Line 246:
 ==== Parsing ==== ==== Parsing ====
  
-PDT is a mildly nonprojective treebank8351 of the 437,020 tokens in the CoNLL 2007 version are attached nonprojectively (1.91%).+Nonprojectivities in the Penn Treebank are rareOnly 3819 of the 991,535 tokens in the CoNLL 2009 version are attached nonprojectively (0.39%).
  
-There is an [[http://​ufal.mff.cuni.cz/​czech-parsing/​|online summary]] of known results in Czech parsing. +The results of the CoNLL 2007 shared task are [[http://​nextens.uvt.nl/​depparse-wiki/​AllScores|available online]]. They have been published in [[http://​aclweb.org/​anthology-new/​D/​D07/​D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for English:
- +
-The results of the CoNLL 2006 shared task are [[http://​ilk.uvt.nl/​conll/​results.html|available online]]. They have been published in [[http://​aclweb.org/​anthology-new/​W/​W06/​W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Czech: +
- +
-^ Parser (Authors) ^ LAS ^ UAS ^ +
-| MST (McDonald et al.) | 80.18 | 87.30 | +
-| Basis (O'​Neil) | 76.60 | 85.58 | +
-| Malt (Nivre et al.) | 78.42 | 84.80 | +
-| Nara (Yuchang Cheng) | 76.24 | 83.40 | +
- +
-The results of the CoNLL 2007 shared task are [[http://​nextens.uvt.nl/​depparse-wiki/​AllScores|available online]]. They have been published in [[http://​aclweb.org/​anthology-new/​D/​D07/​D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Czech:+
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| Nakagawa | 80.19 86.28 +| Carreras | 89.61 | 90.63 | 
-Carreras ​78.60 85.16 +| Nakagawa | 88.41 90.13 
-| Titov et al. | 77.94 84.19 +Sagae 89.01 89.87 
-| Malt (Nilsson et al.) | 77.98 83.59 +| Titov et al. | 88.39 89.73 
-Attardi et al. 77.37 83.40 +| Malt (Nilsson et al.) | 88.11 88.93 
-Malt (Hall et al.) 77.22 82.35 |+Schiehlen ​86.21 88.91 
 +Nguyen ​86.73 88.01 |
  
 The two Malt parser results of 2007 (single malt and blended) are described in [[http://​aclweb.org/​anthology-new/​D/​D07/​D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://​w3.msi.vxu.se/​users/​jha/​conll07/​|here]]. The two Malt parser results of 2007 (single malt and blended) are described in [[http://​aclweb.org/​anthology-new/​D/​D07/​D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://​w3.msi.vxu.se/​users/​jha/​conll07/​|here]].
  
-The results of the CoNLL 2009 shared task are [[http://​ufal.mff.cuni.cz/​conll2009-st/​results/​results.php|available online]]. They have been published in [[http://​aclweb.org/​anthology/​W/​W09/​W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for Czech:+The results of the CoNLL 2009 shared task are [[http://​ufal.mff.cuni.cz/​conll2009-st/​results/​results.php|available online]]. They have been published in [[http://​aclweb.org/​anthology/​W/​W09/​W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for English:
  
 ^ Parser (Authors) ^ LAS ^ ^ Parser (Authors) ^ LAS ^
-| Merlo (Gesmundo et al.) | 80.38 +| Bohnet | 89.88 | 
-Bohnet ​80.11 +| Chen | 89.19 | 
-| Che et al. | 80.01 |+| Merlo (Gesmundo et al.) | 88.79 
 +Asahara ​88.54 
 +| Che et al. | 88.48 |
  

[ Back to the navigation ] [ Back to the content ]