Differences
This shows you the differences between two versions of the page.
Both sides previous revision
Previous revision
|
Last revision
Both sides next revision
|
user:zeman:treebanks:nl [2012/01/10 12:11] zeman Parsing. |
user:zeman:treebanks:nl [2012/01/10 16:59] zeman Multi-word expressions. |
| |
In the CoNLL version, the original POS tags from the Alpino Treebank were replaced by POS tags from the Memory-based part-of-speech tagger using the WOTAN tagset, which is described in the file ''tagset.txt''. The morphological annotation includes lemmas. The syntactic annotation is mostly identical to that of the Corpus Gesproken Nederlands (CGN, Spoken Dutch Corpus) as described in the file ''syn_prot.pdf'' (Dutch only). An attempt to describe a number of differences between the CGN and Alpino annotation practice is given in the file ''diff.pdf'' (which is heavily out of date, but the number of differences has been reduced). Conversion issues: head selection, multi-word units, discourse units. | In the CoNLL version, the original POS tags from the Alpino Treebank were replaced by POS tags from the Memory-based part-of-speech tagger using the WOTAN tagset, which is described in the file ''tagset.txt''. The morphological annotation includes lemmas. The syntactic annotation is mostly identical to that of the Corpus Gesproken Nederlands (CGN, Spoken Dutch Corpus) as described in the file ''syn_prot.pdf'' (Dutch only). An attempt to describe a number of differences between the CGN and Alpino annotation practice is given in the file ''diff.pdf'' (which is heavily out of date, but the number of differences has been reduced). Conversion issues: head selection, multi-word units, discourse units. |
| |
| Multi-word expressions have been concatenated into one token, using underscore as the joining character (e.g. "Economische_en_Monetaire_Unie"). They have special part-of-speech tags ''MWU'', their subparts of speech and features may describe the individual parts of the unit. E.g. "aan_het" has CPOS ''MWU'', (sub)POS ''Prep_Art'' and features ''voor_bep|onzijd|neut''. |
| |
==== Sample ==== | ==== Sample ==== |