[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:interset:drivers [2008/04/03 23:02]
zeman Portuguese.
user:zeman:interset:drivers [2009/02/16 15:57]
zeman Český Multext.
Line 58: Line 58:
  
 More than half of the time was consumed during testing for tuning tags containing the Sem feature. More than half of the time was consumed during testing for tuning tags containing the Sem feature.
 +
 +==== Multext ====
 +
 +The tagset of the MULTEXT-EAST project and corpora. The file ''mte-lex/wfl-cs.tbl'' contains 1428 unique tags (which is not to say that other tags are not possible). The corpora are stored in a TEI-compliant SGML format. It is easily readable except that non-ASCII characters are encoded using SGML entities.
 +
 +Work started: 16.2.2009
  
 ===== Danish (da) ===== ===== Danish (da) =====
Line 98: Line 104:
 Work finished: 31.3.2008 Work finished: 31.3.2008
 Total work time: 10 min Total work time: 10 min
- 
- 
- 
- 
  
 ===== Portuguese (pt) ===== ===== Portuguese (pt) =====
Line 109: Line 111:
 http://visl.sdu.dk/visl/pt/info/symbolset-floresta.html http://visl.sdu.dk/visl/pt/info/symbolset-floresta.html
 http://en.wikipedia.org/wiki/Portuguese_grammar http://en.wikipedia.org/wiki/Portuguese_grammar
 +
 +Work started: 2.4.2008
 +Work finished: 24.4.2008
 +Total work time: 28:18 h
 +
 +The CoNLL version of the Floresta tagset was a real pain. Not only is the tagset complex with many features, some of them strangely overlapping, some of them undocumented. There was also a terrible proportion of noise, typos or otherwise introduced errors in annotation.
  
 | **Feature** | **Explanation** | **Examples** | | **Feature** | **Explanation** | **Examples** |
 | _ | no features | prepositions, punctuation etc. | | _ | no features | prepositions, punctuation etc. |
-| 1 | 1st person | | 
 | 1/3S | 1st person or 3rd person singular | leia, disse, seria, prefira | | 1/3S | 1st person or 3rd person singular | leia, disse, seria, prefira |
 | 1S | 1st person singular | tenho, tinha, usei, vivo, vou | | 1S | 1st person singular | tenho, tinha, usei, vivo, vou |
Line 215: Line 222:
 | > | noise; should be ignored | | | > | noise; should be ignored | |
 | 0/1/3S | noise; should probably be 1/3S | | | 0/1/3S | noise; should probably be 1/3S | |
 +| 1 | noise; should be 1S | aproveitaria, saiba, tinha, vivia |
 | 1S> | noise; should be 1S | meu, meus, minha, minhas | | 1S> | noise; should be 1S | meu, meus, minha, minhas |
 | 1P> | noise; should be 1P | nossa, nossas, nosso, nossos | | 1P> | noise; should be 1P | nossa, nossas, nosso, nossos |
Line 248: Line 256:
 | <prop>M | noise; should be two features | | | <prop>M | noise; should be two features | |
 | <prparg> | noise; should be <co-prparg> | | | <prparg> | noise; should be <co-prparg> | |
-| R | noise | 2 occurrences |+| R | noise; should be PR | 2 occurrences |
 | recohidas> | noise; should be <ALT> | recolhidas | | recohidas> | noise; should be <ALT> | recolhidas |
 | <rel><ks> | noise; should be two features | | | <rel><ks> | noise; should be two features | |

[ Back to the navigation ] [ Back to the content ]