Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:drivers [2008/03/25 14:10] zeman Lemma features. |
user:zeman:interset:drivers [2008/03/31 22:14] zeman de::conll |
||
---|---|---|---|
Line 26: | Line 26: | ||
České značky PDT (přes 4000 značek; jádro Intersetu vzniklo jako vedlejší produkt, když jsem dělal tohle) asi 2 dny, tedy dejme tomu 18 hodin. Dalších 11:09 hodin jsem spotřeboval, | České značky PDT (přes 4000 značek; jádro Intersetu vzniklo jako vedlejší produkt, když jsem dělal tohle) asi 2 dny, tedy dejme tomu 18 hodin. Dalších 11:09 hodin jsem spotřeboval, | ||
- | |||
==== CoNLL (derived from PDT) ==== | ==== CoNLL (derived from PDT) ==== | ||
- | The CoNLL 2006 and 2007 Czech treebanks are data from PDT converted to the CoNLL format. The PDT morphological tags have been decomposed into coarse-grained part of speech, detailed part of speech, and a set of feature values. | + | The CoNLL 2006 and 2007 Czech treebanks are data from PDT converted to the CoNLL format. The PDT morphological tags have been decomposed into coarse-grained part of speech, detailed part of speech, and a set of feature values. |
- | Update: the mapping to the original PDT tags is not one-to-one. Some information, encoded in lemmas in the PDT, has been encoded as features | + | The list of tags of this tagset contains equivalents of all original PDT tags. In addition, it contains those tags with the '' |
Work started: 25.3.2008 | Work started: 25.3.2008 | ||
+ | Work finished: 25.3.2008 | ||
+ | Total work time: 6:02 h | ||
+ | |||
+ | More than half of the time was consumed during testing for tuning tags containing the Sem feature. | ||
+ | |||
+ | ===== German (de) ===== | ||
+ | |||
+ | ==== Stuttgart-Tübingen Tagset (STTS) ==== | ||
+ | |||
+ | This is the tagset used in the Tiger treebank. It is quite syntax-oriented, | ||
+ | |||
+ | The tags omit inflectional information (number and case of pronouns and articles, degree of comparison of adjectives, tense (Präteritum, | ||
+ | |||
+ | Work started: 29.3.2008 | ||
+ | Work finished: 29.3.2008 | ||
+ | Total work time: 4:00 h | ||
+ | |||
+ | ==== CoNLL (derived from STTS) ==== | ||
+ | |||
+ | Only simple envelope around the STTS driver needed. | ||
+ | |||
+ | Work started: 31.3.2008 | ||
+ | Work finished: 31.3.2008 | ||
+ | Total work time: 10 min | ||
===== Time needed for tag set conversion ===== | ===== Time needed for tag set conversion ===== |