Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:drivers [2008/04/03 23:02] zeman Portuguese. |
user:zeman:interset:drivers [2009/02/20 15:10] zeman |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Tag Set Drivers ====== | ====== Tag Set Drivers ====== | ||
- | This is an overview of existing tag set drivers. Tag-set or language specific issues are described here. | + | This is an overview of existing tag set drivers. Tag-set or language specific issues are described here. I also try to keep track of the work time needed for particular drivers because the original motivation behind DZ Interset was to save time and effort. |
===== Arabic (ar) ===== | ===== Arabic (ar) ===== | ||
Line 58: | Line 58: | ||
More than half of the time was consumed during testing for tuning tags containing the Sem feature. | More than half of the time was consumed during testing for tuning tags containing the Sem feature. | ||
+ | |||
+ | ==== Multext ==== | ||
+ | |||
+ | The tagset of the MULTEXT-EAST project and corpora. The file '' | ||
+ | |||
+ | Work started: 16.2.2009 | ||
+ | Work finished: 18.2.2009 | ||
+ | Total work time: 16:36 h | ||
+ | |||
+ | Czech tagsets are notoriously complex. This one maps quite nicely to DZ Interset features. However, the few distinctions that are not (yet) represented in DZ Interset made debugging difficult. Clitic_s and generic numerals represented using the '' | ||
===== Danish (da) ===== | ===== Danish (da) ===== | ||
Line 98: | Line 108: | ||
Work finished: 31.3.2008 | Work finished: 31.3.2008 | ||
Total work time: 10 min | Total work time: 10 min | ||
- | |||
- | |||
- | |||
- | |||
===== Portuguese (pt) ===== | ===== Portuguese (pt) ===== | ||
Line 109: | Line 115: | ||
http:// | http:// | ||
http:// | http:// | ||
+ | |||
+ | Work started: 2.4.2008 | ||
+ | Work finished: 24.4.2008 | ||
+ | Total work time: 28:18 h | ||
+ | |||
+ | The CoNLL version of the Floresta tagset was a real pain. Not only is the tagset complex with many features, some of them strangely overlapping, | ||
| **Feature** | **Explanation** | **Examples** | | | **Feature** | **Explanation** | **Examples** | | ||
| _ | no features | prepositions, | | _ | no features | prepositions, | ||
- | | 1 | 1st person | | | ||
| 1/3S | 1st person or 3rd person singular | leia, disse, seria, prefira | | | 1/3S | 1st person or 3rd person singular | leia, disse, seria, prefira | | ||
| 1S | 1st person singular | tenho, tinha, usei, vivo, vou | | | 1S | 1st person singular | tenho, tinha, usei, vivo, vou | | ||
Line 215: | Line 226: | ||
| > | noise; should be ignored | | | | > | noise; should be ignored | | | ||
| 0/1/3S | noise; should probably be 1/3S | | | | 0/1/3S | noise; should probably be 1/3S | | | ||
+ | | 1 | noise; should be 1S | aproveitaria, | ||
| 1S> | noise; should be 1S | meu, meus, minha, minhas | | | 1S> | noise; should be 1S | meu, meus, minha, minhas | | ||
| 1P> | noise; should be 1P | nossa, nossas, nosso, nossos | | | 1P> | noise; should be 1P | nossa, nossas, nosso, nossos | | ||
Line 248: | Line 260: | ||
| < | | < | ||
| < | | < | ||
- | | R | noise | 2 occurrences | | + | | R | noise; should be PR | 2 occurrences | |
| recohidas> | | recohidas> | ||
| < | | < |