Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:drivers [2008/04/03 14:49] zeman |
user:zeman:interset:drivers [2009/02/18 23:29] zeman |
||
---|---|---|---|
Line 58: | Line 58: | ||
More than half of the time was consumed during testing for tuning tags containing the Sem feature. | More than half of the time was consumed during testing for tuning tags containing the Sem feature. | ||
+ | |||
+ | ==== Multext ==== | ||
+ | |||
+ | The tagset of the MULTEXT-EAST project and corpora. The file '' | ||
+ | |||
+ | Work started: 16.2.2009 | ||
+ | Work finished: 18.2.2009 | ||
+ | Total work time: 16:36 h | ||
+ | |||
+ | Czech tagsets are notoriously complex. This one maps quite nicely to DZ Interset features. However, the few distinctions that are not (yet) represented in DZ Interset made debugging difficult. Clitic_s and generic numerals represented using the '' | ||
===== Danish (da) ===== | ===== Danish (da) ===== | ||
Line 98: | Line 108: | ||
Work finished: 31.3.2008 | Work finished: 31.3.2008 | ||
Total work time: 10 min | Total work time: 10 min | ||
- | |||
===== Portuguese (pt) ===== | ===== Portuguese (pt) ===== | ||
The Portuguese CoNLL treebank contains tags with 149 different features. Big part of them are noise, probably introduced by the conversion procedure from the original Floresta format to the CoNLL format. The driver is designed so that it accepts all incorrect tags on decoding but encodes only corrected tags. Incorrect tags are not on the list of possible tags so the driver tester will not complain. | The Portuguese CoNLL treebank contains tags with 149 different features. Big part of them are noise, probably introduced by the conversion procedure from the original Floresta format to the CoNLL format. The driver is designed so that it accepts all incorrect tags on decoding but encodes only corrected tags. Incorrect tags are not on the list of possible tags so the driver tester will not complain. | ||
+ | |||
+ | http:// | ||
+ | http:// | ||
+ | |||
+ | Work started: 2.4.2008 | ||
+ | Work finished: 24.4.2008 | ||
+ | Total work time: 28:18 h | ||
+ | |||
+ | The CoNLL version of the Floresta tagset was a real pain. Not only is the tagset complex with many features, some of them strangely overlapping, | ||
| **Feature** | **Explanation** | **Examples** | | | **Feature** | **Explanation** | **Examples** | | ||
| _ | no features | prepositions, | | _ | no features | prepositions, | ||
- | | 1 | 1st person | | | ||
| 1/3S | 1st person or 3rd person singular | leia, disse, seria, prefira | | | 1/3S | 1st person or 3rd person singular | leia, disse, seria, prefira | | ||
| 1S | 1st person singular | tenho, tinha, usei, vivo, vou | | | 1S | 1st person singular | tenho, tinha, usei, vivo, vou | | ||
Line 117: | Line 134: | ||
| ACC | pronoun as direct accusative object | se, te, vos | | | ACC | pronoun as direct accusative object | se, te, vos | | ||
| ACC/DAT | pronouns in accusative or dative | nos, se | | | ACC/DAT | pronouns in accusative or dative | nos, se | | ||
+ | | COND | verb in conditional mood | precisariam, | ||
+ | | DAT | pronoun as dative object | lhe, lhes, me, no, nos, se, vos | | ||
+ | | F | feminine | | | ||
+ | | F/M | feminine or masculine | | | ||
+ | | FUT | future tense of verbs | tenderão, tomará, usará | | ||
+ | | IMP | imperative mood of verbs | chega, move, olha, sê | | ||
+ | | IMPF | imperfect tense of verbs | abandonasse, | ||
+ | | IND | indicative mood of verbs | abafaram, abandonam, abate, abateu | | ||
+ | | M | masculine | açúcar, adepto, adiantado | | ||
+ | | M/F | masculine or feminine | Abidjan, cada, Chaves, especial | | ||
+ | | MQP | pluperfect past tense of verbs | acabara, defendera, existira, foram, quisera, viram | | ||
+ | | NOM | personal pronoun in nominative | ela, elas, ele, eles, eu, nós, vocês, você, vós | | ||
+ | | NOM/PIV | personal pronoun in nominative or prepositional object | ela, elas, ele, eles, nós, você | | ||
+ | | P | plural | 0,92, 14h00, africanos, águas, Amigos_da_Ilha_de_Santos | | ||
+ | | PIV | pronoun in prepositional object | ela, elas, ele, eles, mim, nós, si, ti, vós | | ||
+ | | PR | present tense of verbs | abandonam, abate, abonam, abordo, abra | | ||
+ | | PR/PS | present or past tense of verbs | conhecemos, conseguimos, | ||
+ | | PS | perfect past tense of verbs | abalou, abandonaram, | ||
+ | | PS/MQP | perfect or pluperfect past tense of verbs | abafaram, abriram, acabaram, aceitaram | | ||
+ | | S | singular | 1992, adicional, aditamento, aduaneira | | ||
+ | | S/P | singular or plural | capaz, Chaves, mais | | ||
+ | | SUBJ | subjunctive mood of verbs | abandonasse, | ||
+ | | <ALT> | indicates typo in word | | | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | <SUP> | superlative of adjectives and adverbs | inferior, máximo, melhor, mínimo, ótimo, péssimo, pior | | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | <dem> | demonstrative pronoun or adverb | este, isso, isto, o, os, tais, tal, tão | | ||
+ | | <det> | determiner usage / inflection of adverb | algo, meio, nada, quase, todo, um_tanto | | ||
+ | | < | ||
+ | | < | ||
+ | | <fmc> | verb heading finite main clause | | | ||
+ | | <foc> | focus marker, adverb or pronoun | é_que, foi, fomos, que, são, será | | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | <kc> | conjunctional adverb | agora, aí, bem_como, como, ora, tal_como, todavia | | ||
+ | | <ks> | adverb or preposition used like a subordinating conjunction | como, enquanto, onde, quando, segundo | | ||
+ | | <n> | other word class used as noun, typically as head of noun phrase | anglo-americano, | ||
+ | | <poss | possessive determiner pronoun | meu, meus, minha, minhas, nossa, nossas, nosso, nossos, seu, seus, sua | | ||
+ | | < | ||
+ | | <prp> | other word class used as preposition | como, conforme, consoante, embora, segundo | | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | <rel> | relative pronoun or adverb | à_medida_que, | ||
+ | | < | ||
+ | | < | ||
+ | | <si> | reflexive usage of 3rd person possessive | seu, seus, sua, suas | | ||
+ | | <eg> | undocumented feature | 2 occurrences with cardinal numbers | | ||
+ | | <Eg> | undocumented feature | occurs with numbers, adjectives and pronouns | | ||
+ | | <Em> | undocumented feature | 6 occurrences with adjectives | | ||
+ | | <Es> | undocumented feature | 3 occurrences with adverbs and prepositions | | ||
+ | | <ink> | undocumented feature of finite verbs | está, havia, pode, tentou | | ||
+ | | < | ||
+ | | < | ||
+ | | N | undocumented feature of nouns and articles | 15 occurrences | | ||
+ | | <new> | undocumented feature | | | ||
+ | | <nil> | undocumented feature | | | ||
+ | | <obj> | undocumented feature | se | | ||
+ | | <p> | undocumented feature | 1 occurrence | | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
+ | | < | ||
| > | noise; should be ignored | | | | > | noise; should be ignored | | | ||
| 0/1/3S | noise; should probably be 1/3S | | | | 0/1/3S | noise; should probably be 1/3S | | | ||
+ | | 1 | noise; should be 1S | aproveitaria, | ||
| 1S> | noise; should be 1S | meu, meus, minha, minhas | | | 1S> | noise; should be 1S | meu, meus, minha, minhas | | ||
| 1P> | noise; should be 1P | nossa, nossas, nosso, nossos | | | 1P> | noise; should be 1P | nossa, nossas, nosso, nossos | | ||
Line 126: | Line 234: | ||
| 3S/P> | noise; should be 3S/P | seu, seus, sua | | | 3S/P> | noise; should be 3S/P | seu, seus, sua | | ||
| 3P> | noise; should be 3P | seu, seus, sua | | | 3P> | noise; should be 3P | seu, seus, sua | | ||
+ | | <adv> | noise? | fundo | | ||
+ | | < | ||
+ | | < | ||
+ | | > | ||
+ | | < | ||
+ | | convidado-> | ||
+ | | < | ||
+ | | < | ||
+ | | <corr | noise; should be <ALT> | | | ||
+ | | < | ||
+ | | <Eg>F | noise; should be two features | | | ||
+ | | <Eg>M | noise; should be two features | | | ||
+ | | <F | noise; should be F | | | ||
+ | | GER | noise; redundant gerund marker | 1 occurrence with v-ger | | ||
+ | | < | ||
+ | | INF | noise; redundant infinitive marker | 2 occurrences with < | ||
+ | | 'Maio | noise | Maio | | ||
+ | | MVF | noise; should be MV and F | motivada | | ||
+ | | NUM | noise; redundant numeral marker | 1994 | | ||
+ | | pasando> | noise; should be <ALT> | passando | | ||
+ | | PCP | noise; redundant participle marker | 2 occurrences | | ||
+ | | < | ||
+ | | < | ||
+ | | PROP | noise | 2 occurrences | | ||
+ | | < | ||
+ | | < | ||
+ | | R | noise; should be PR | 2 occurrences | | ||
+ | | recohidas> | ||
+ | | < | ||
+ | | s | noise; should be S | | | ||
+ | | saiem> | noise; should be <ALT> | saem | | ||
+ | | < | ||
+ | | < | ||
+ | | <sc> | noise; should be < | ||
+ | | subordinanda> | ||
+ | | V | noise; redundant verb marker | | | ||
+ | | < | ||
+ | | VFIN | noise | há od haver | | ||
===== Swedish (sv) ===== | ===== Swedish (sv) ===== |