Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
user:zeman:interset:drivers [2008/03/25 13:22] zeman PDT CoNLL. |
user:zeman:interset:drivers [2008/03/25 14:10] zeman Lemma features. |
||
---|---|---|---|
Line 26: | Line 26: | ||
České značky PDT (přes 4000 značek; jádro Intersetu vzniklo jako vedlejší produkt, když jsem dělal tohle) asi 2 dny, tedy dejme tomu 18 hodin. Dalších 11:09 hodin jsem spotřeboval, | České značky PDT (přes 4000 značek; jádro Intersetu vzniklo jako vedlejší produkt, když jsem dělal tohle) asi 2 dny, tedy dejme tomu 18 hodin. Dalších 11:09 hodin jsem spotřeboval, | ||
+ | |||
==== CoNLL (derived from PDT) ==== | ==== CoNLL (derived from PDT) ==== | ||
The CoNLL 2006 and 2007 Czech treebanks are data from PDT converted to the CoNLL format. The PDT morphological tags have been decomposed into coarse-grained part of speech, detailed part of speech, and a set of feature values. There should be a one-to-one mapping between the original PDT and the CoNLL tagsets, however, the driver cannot be a simple envelope around the driver of the original tagset (as is the case for e.g. Penn Treebank tags) because of the features. | The CoNLL 2006 and 2007 Czech treebanks are data from PDT converted to the CoNLL format. The PDT morphological tags have been decomposed into coarse-grained part of speech, detailed part of speech, and a set of feature values. There should be a one-to-one mapping between the original PDT and the CoNLL tagsets, however, the driver cannot be a simple envelope around the driver of the original tagset (as is the case for e.g. Penn Treebank tags) because of the features. | ||
+ | |||
+ | Update: the mapping to the original PDT tags is not one-to-one. Some information, | ||
Work started: 25.3.2008 | Work started: 25.3.2008 |