Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:drivers [2008/04/04 09:09] zeman 1st person in Portuguese. |
user:zeman:interset:drivers [2009/02/20 15:10] zeman |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Tag Set Drivers ====== | ====== Tag Set Drivers ====== | ||
- | This is an overview of existing tag set drivers. Tag-set or language specific issues are described here. | + | This is an overview of existing tag set drivers. Tag-set or language specific issues are described here. I also try to keep track of the work time needed for particular drivers because the original motivation behind DZ Interset was to save time and effort. |
===== Arabic (ar) ===== | ===== Arabic (ar) ===== | ||
Line 58: | Line 58: | ||
More than half of the time was consumed during testing for tuning tags containing the Sem feature. | More than half of the time was consumed during testing for tuning tags containing the Sem feature. | ||
+ | |||
+ | ==== Multext ==== | ||
+ | |||
+ | The tagset of the MULTEXT-EAST project and corpora. The file '' | ||
+ | |||
+ | Work started: 16.2.2009 | ||
+ | Work finished: 18.2.2009 | ||
+ | Total work time: 16:36 h | ||
+ | |||
+ | Czech tagsets are notoriously complex. This one maps quite nicely to DZ Interset features. However, the few distinctions that are not (yet) represented in DZ Interset made debugging difficult. Clitic_s and generic numerals represented using the '' | ||
===== Danish (da) ===== | ===== Danish (da) ===== | ||
Line 98: | Line 108: | ||
Work finished: 31.3.2008 | Work finished: 31.3.2008 | ||
Total work time: 10 min | Total work time: 10 min | ||
- | |||
- | |||
- | |||
- | |||
- | |||
===== Portuguese (pt) ===== | ===== Portuguese (pt) ===== | ||
Line 110: | Line 115: | ||
http:// | http:// | ||
http:// | http:// | ||
+ | |||
+ | Work started: 2.4.2008 | ||
+ | Work finished: 24.4.2008 | ||
+ | Total work time: 28:18 h | ||
+ | |||
+ | The CoNLL version of the Floresta tagset was a real pain. Not only is the tagset complex with many features, some of them strangely overlapping, | ||
| **Feature** | **Explanation** | **Examples** | | | **Feature** | **Explanation** | **Examples** | | ||
Line 249: | Line 260: | ||
| < | | < | ||
| < | | < | ||
- | | R | noise | 2 occurrences | | + | | R | noise; should be PR | 2 occurrences | |
| recohidas> | | recohidas> | ||
| < | | < |