[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:interset:drivers [2008/04/04 09:09]
zeman 1st person in Portuguese.
user:zeman:interset:drivers [2009/02/16 15:57]
zeman Český Multext.
Line 58: Line 58:
  
 More than half of the time was consumed during testing for tuning tags containing the Sem feature. More than half of the time was consumed during testing for tuning tags containing the Sem feature.
 +
 +==== Multext ====
 +
 +The tagset of the MULTEXT-EAST project and corpora. The file ''mte-lex/wfl-cs.tbl'' contains 1428 unique tags (which is not to say that other tags are not possible). The corpora are stored in a TEI-compliant SGML format. It is easily readable except that non-ASCII characters are encoded using SGML entities.
 +
 +Work started: 16.2.2009
  
 ===== Danish (da) ===== ===== Danish (da) =====
Line 98: Line 104:
 Work finished: 31.3.2008 Work finished: 31.3.2008
 Total work time: 10 min Total work time: 10 min
- 
- 
- 
- 
- 
  
 ===== Portuguese (pt) ===== ===== Portuguese (pt) =====
Line 110: Line 111:
 http://visl.sdu.dk/visl/pt/info/symbolset-floresta.html http://visl.sdu.dk/visl/pt/info/symbolset-floresta.html
 http://en.wikipedia.org/wiki/Portuguese_grammar http://en.wikipedia.org/wiki/Portuguese_grammar
 +
 +Work started: 2.4.2008
 +Work finished: 24.4.2008
 +Total work time: 28:18 h
 +
 +The CoNLL version of the Floresta tagset was a real pain. Not only is the tagset complex with many features, some of them strangely overlapping, some of them undocumented. There was also a terrible proportion of noise, typos or otherwise introduced errors in annotation.
  
 | **Feature** | **Explanation** | **Examples** | | **Feature** | **Explanation** | **Examples** |
Line 249: Line 256:
 | <prop>M | noise; should be two features | | | <prop>M | noise; should be two features | |
 | <prparg> | noise; should be <co-prparg> | | | <prparg> | noise; should be <co-prparg> | |
-| R | noise | 2 occurrences |+| R | noise; should be PR | 2 occurrences |
 | recohidas> | noise; should be <ALT> | recolhidas | | recohidas> | noise; should be <ALT> | recolhidas |
 | <rel><ks> | noise; should be two features | | | <rel><ks> | noise; should be two features | |

[ Back to the navigation ] [ Back to the content ]