Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
user:zeman:interset:to-do [2008/04/29 17:48] zeman pos = det removed. |
user:zeman:interset:to-do [2008/05/20 09:36] zeman Pluralia tantum. |
* Normalize processing of pronouns, determiners, interrogative adverbs etc. Old drivers use a different approach from the new ones (beginning with Bulgarian). Pronoun as an independent part of speech will cease to exist. | * Normalize processing of pronouns, determiners, interrogative adverbs etc. Old drivers use a different approach from the new ones (beginning with Bulgarian). Pronoun as an independent part of speech will cease to exist. |
* Remove ''pos="pron"''. Distribute pronouns to nouns, adjectives and adverbs. When encoding into a tagset that distinguishes pronouns, detect pronouns by non-empty ''prontype''. Remove subposes of pronouns (''pers'', ''clit''...) | * Remove ''pos="pron"''. Distribute pronouns to nouns, adjectives and adverbs. When encoding into a tagset that distinguishes pronouns, detect pronouns by non-empty ''prontype''. Remove subposes of pronouns (''pers'', ''clit''...) |
| * Remove ''subpos = pers'' and ''subpos = recip''. These features should now be captured by ''prontype''. |
* Move ''subpos=clit'' to an independent feature so that it is easier to ask whether a pronoun is personal. Or remove the feature. This is connected to the problem of changed processing of pronouns, and of the processing of contracted word forms (see below). | * Move ''subpos=clit'' to an independent feature so that it is easier to ask whether a pronoun is personal. Or remove the feature. This is connected to the problem of changed processing of pronouns, and of the processing of contracted word forms (see below). |
* Find more fine-grained classification of punctuation and symbols. Danish has punctuation proper, symbols (+, $), and strange strings like "U-21". | * Find more fine-grained classification of punctuation and symbols. Danish has punctuation proper, symbols (+, $), and strange strings like "U-21". |
* Přejmenovat number = plu na plur? | * Přejmenovat number = plu na plur? |
* Zrušit ''subpos = voc''. Zatím se používá pro vokalizované tvary českých předložek v cs::pdt (a odvozeném cs::conll; nikde jinde). Místo toho by se ale dalo využít ''variant = long''. U tříd předložek to teď narušuje členění na předložky, záložky a "okololožky" (cirkumpozice). **Problém:** jak vokalizované, tak nevokalizované předložky se také vyskytují s ''variant = 1''. Nemůžu do jednoho rysu nacpat současně ''long'' a ''1'', a nemůžu ani říct, že z ''1'' taky plyne vokalizovanost. | * Zrušit ''subpos = voc''. Zatím se používá pro vokalizované tvary českých předložek v cs::pdt (a odvozeném cs::conll; nikde jinde). Místo toho by se ale dalo využít ''variant = long''. U tříd předložek to teď narušuje členění na předložky, záložky a "okololožky" (cirkumpozice). **Problém:** jak vokalizované, tak nevokalizované předložky se také vyskytují s ''variant = 1''. Nemůžu do jednoho rysu nacpat současně ''long'' a ''1'', a nemůžu ani říct, že z ''1'' taky plyne vokalizovanost. |
| * Define new value //pluralia tantum// (''ptan'') of ''number''? It is present in the Bulgarian CoNLL tagset and it could theoretically be present in other languages, including Czech. |
| |
===== Specific drivers ===== | ===== Specific drivers ===== |