[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
user:zeman:treebanks:nl [2012/01/10 12:11]
zeman Parsing.
user:zeman:treebanks:nl [2012/01/10 16:59]
zeman Multi-word expressions.
Line 44: Line 44:
  
 In the CoNLL version, the original POS tags from the Alpino Treebank were replaced by POS tags from the Memory-based part-of-speech tagger using the WOTAN tagset, which is described in the file ''​tagset.txt''​. The morphological annotation includes lemmas. The syntactic annotation is mostly identical to that of the Corpus Gesproken Nederlands (CGN, Spoken Dutch Corpus) as described in the file ''​syn_prot.pdf''​ (Dutch only). An attempt to describe a number of differences between the CGN and Alpino annotation practice is given in the file ''​diff.pdf''​ (which is heavily out of date, but the number of differences has been reduced). Conversion issues: head selection, multi-word units, discourse units. In the CoNLL version, the original POS tags from the Alpino Treebank were replaced by POS tags from the Memory-based part-of-speech tagger using the WOTAN tagset, which is described in the file ''​tagset.txt''​. The morphological annotation includes lemmas. The syntactic annotation is mostly identical to that of the Corpus Gesproken Nederlands (CGN, Spoken Dutch Corpus) as described in the file ''​syn_prot.pdf''​ (Dutch only). An attempt to describe a number of differences between the CGN and Alpino annotation practice is given in the file ''​diff.pdf''​ (which is heavily out of date, but the number of differences has been reduced). Conversion issues: head selection, multi-word units, discourse units.
 +
 +Multi-word expressions have been concatenated into one token, using underscore as the joining character (e.g. "​Economische_en_Monetaire_Unie"​). They have special part-of-speech tags ''​MWU'',​ their subparts of speech and features may describe the individual parts of the unit. E.g. "​aan_het"​ has CPOS ''​MWU'',​ (sub)POS ''​Prep_Art''​ and features ''​voor_bep|onzijd|neut''​.
  
 ==== Sample ==== ==== Sample ====

[ Back to the navigation ] [ Back to the content ]