Differences

This shows you the differences between two versions of the page.

--- user:zeman:interset:drivers [2008/04/03 23:02]
zeman Portuguese.
+++ user:zeman:interset:drivers [2009/02/16 15:57]
zeman Český Multext.
@@ Line 58: / Line 58: @@
 More than half of the time was consumed during testing for tuning tags containing the Sem feature.
+==== Multext ====
+The tagset of the MULTEXT-EAST project and corpora. The file ''mte-lex/wfl-cs.tbl'' contains 1428 unique tags (which is not to say that other tags are not possible). The corpora are stored in a TEI-compliant SGML format. It is easily readable except that non-ASCII characters are encoded using SGML entities.
+Work started: 16.2.2009
 ===== Danish (da) =====
@@ Line 98: / Line 104: @@
 Work finished: 31.3.2008
 Total work time: 10 min
 ===== Portuguese (pt) =====
@@ Line 109: / Line 111: @@
 http://visl.sdu.dk/visl/pt/info/symbolset-floresta.html
 http://en.wikipedia.org/wiki/Portuguese_grammar
+Work started: 2.4.2008
+Work finished: 24.4.2008
+Total work time: 28:18 h
+The CoNLL version of the Floresta tagset was a real pain. Not only is the tagset complex with many features, some of them strangely overlapping, some of them undocumented. There was also a terrible proportion of noise, typos or otherwise introduced errors in annotation.
 | **Feature** | **Explanation** | **Examples** |
 | _ | no features | prepositions, punctuation etc. |
-| 1 | 1st person | |
 | 1/3S | 1st person or 3rd person singular | leia, disse, seria, prefira |
 | 1S | 1st person singular | tenho, tinha, usei, vivo, vou |
@@ Line 215: / Line 222: @@
 | > | noise; should be ignored | |
 | 0/1/3S | noise; should probably be 1/3S | |
+| 1 | noise; should be 1S | aproveitaria, saiba, tinha, vivia |
 | 1S> | noise; should be 1S | meu, meus, minha, minhas |
 | 1P> | noise; should be 1P | nossa, nossas, nosso, nossos |
@@ Line 248: / Line 256: @@
 | <prop>M | noise; should be two features | |
 | <prparg> | noise; should be <co-prparg> | |
-| R | noise | 2 occurrences |
+| R | noise; should be PR | 2 occurrences |
 | recohidas> | noise; should be <ALT> | recolhidas |
 | <rel><ks> | noise; should be two features | |

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences