Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Last revision Both sides next revision | ||
user:zeman:interset:drivers [2009/03/24 11:23] zeman CoNLL 2009. |
user:zeman:interset:drivers [2014/03/01 12:40] zeman Slovenský národný korpus. |
||
---|---|---|---|
Line 4: | Line 4: | ||
===== Arabic (ar) ===== | ===== Arabic (ar) ===== | ||
+ | |||
+ | ==== CoNLL 2006 ==== | ||
The Arabic CoNLL tags are derived from the tags of the Prague Arabic Dependency Treebank. | The Arabic CoNLL tags are derived from the tags of the Prague Arabic Dependency Treebank. | ||
Line 9: | Line 11: | ||
Created in 2006-2007. | Created in 2006-2007. | ||
Total work time: 13 hours | Total work time: 13 hours | ||
+ | |||
+ | ==== CoNLL 2007 ==== | ||
+ | |||
+ | The Arabic tags in CoNLL 2007 slightly differed from 2006. There are also new tags. The driver '' | ||
+ | |||
+ | Created: 23.6.2011 | ||
+ | Total work time: 2 hours | ||
===== Bulgarian (bg) ===== | ===== Bulgarian (bg) ===== | ||
Line 62: | Line 71: | ||
The [[: | The [[: | ||
+ | |||
+ | The '' | ||
Work started: 24.3.2009 | Work started: 24.3.2009 | ||
- | Work finished: | + | Work finished: |
- | Total work time: | + | Total work time: 1:10 h |
==== Multext ==== | ==== Multext ==== | ||
Line 76: | Line 87: | ||
Czech tagsets are notoriously complex. This one maps quite nicely to DZ Interset features. However, the few distinctions that are not (yet) represented in DZ Interset made debugging difficult. Clitic_s and generic numerals represented using the '' | Czech tagsets are notoriously complex. This one maps quite nicely to DZ Interset features. However, the few distinctions that are not (yet) represented in DZ Interset made debugging difficult. Clitic_s and generic numerals represented using the '' | ||
+ | |||
+ | ==== Prague Spoken Corpus ==== | ||
+ | |||
+ | The Prague Spoken Corpus (Pražský mluvený korpus, PMK) is distributed together with the frequency dictionary of spoken Czech (book). It uses very strange tags and very many of them (over 10000!) Extremely high portion of the tags has to rely on the '' | ||
+ | |||
+ | Work started: 26.11.2009 | ||
+ | Work finished: 4.10.2010 | ||
+ | Total work time: 57 hours | ||
===== Danish (da) ===== | ===== Danish (da) ===== | ||
Line 91: | Line 110: | ||
Total work time: about 3 hours | Total work time: about 3 hours | ||
- | ==== CoNLL Tagset (derived from Penn tags) ==== | + | ==== CoNLL 2006 ==== |
The driver is just an envelope around the '' | The driver is just an envelope around the '' | ||
Total work time: 48 minutes | Total work time: 48 minutes | ||
+ | |||
+ | ==== CoNLL 2009 ==== | ||
+ | |||
+ | Another envelope around the '' | ||
+ | |||
+ | Work started: 25.3.2009 | ||
+ | Work finished: 25.3.2009 | ||
+ | Total work time: 2:57 h | ||
===== German (de) ===== | ===== German (de) ===== | ||
Line 109: | Line 136: | ||
Total work time: 4:00 h | Total work time: 4:00 h | ||
- | ==== CoNLL (derived from STTS) ==== | + | ==== CoNLL 2006 ==== |
Only simple envelope around the STTS driver needed. | Only simple envelope around the STTS driver needed. | ||
Line 116: | Line 143: | ||
Work finished: 31.3.2008 | Work finished: 31.3.2008 | ||
Total work time: 10 min | Total work time: 10 min | ||
+ | |||
+ | |||
+ | ==== CoNLL 2009 ==== | ||
+ | |||
+ | This tagset is derived from the STTS, too. Unlike CoNLL 2006, there are also morphological features this time, which required additional processing effort. | ||
+ | |||
+ | Work started: 5.4.2009 | ||
+ | Work finished: 6.4.2009 | ||
+ | Total work time: 9:39 h | ||
+ | |||
+ | ===== Polish (pl) ===== | ||
+ | |||
+ | Based on the [[http:// | ||
+ | |||
+ | Work started: 4.9.2009 | ||
+ | Work finished: 8.9.2009 | ||
+ | Total work time: 9:54 h | ||
===== Portuguese (pt) ===== | ===== Portuguese (pt) ===== | ||
Line 280: | Line 324: | ||
| < | | < | ||
| VFIN | noise | há od haver | | | VFIN | noise | há od haver | | ||
+ | |||
+ | ===== Slovak (sk) ===== | ||
+ | |||
+ | ==== Slovenský národný korpus (SNK) ==== | ||
+ | |||
+ | 1457 structured tags. | ||
+ | |||
+ | Total work time: 5:32 hours. | ||
===== Swedish (sv) ===== | ===== Swedish (sv) ===== |