Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
user:zeman:interset:versions [2009/09/08 18:08] zeman Link. |
user:zeman:interset:versions [2014/06/16 21:47] (current) zeman Version 2 is out. |
||
---|---|---|---|
Line 20: | Line 20: | ||
? 1.1 | ? 1.1 | ||
- | ! 8 September 2009. Three new incarnations of Czech, English and German CoNLL tagsets, reflecting the 2009 changes in format. Most interestingly, | + | ! 8 September 2009. Three new incarnations of Czech, English and German CoNLL tagsets, reflecting the 2009 changes in format. Most interestingly, |
+ | |||
+ | ? 1.2 | ||
+ | ! 27 June 2011. New drivers: Prague Spoken Corpus (Pražský mluvený korpus, PMK) long and short tags ('' | ||
+ | |||
+ | ! New test: For all tags in all drivers now must hold that deleting the value of the '' | ||
+ | |||
+ | ! New usage: Interset in Treex (TectoMT). | ||
+ | |||
+ | ? 2.001 | ||
+ | ! 13 June 2014. Complete rewrite of Interset. The old Perl interface was not object-oriented. The modules resided under the “tagset” namespace (yes, all lowercase). The new modules are object-oriented (using Moose) and the new namespace is [[http:// | ||
+ | * Drivers will be ported gradually but Interset 2.0 is still able to work with old drivers that you have installed in '' | ||
+ | * Project development has left our [[https:// | ||
+ | * For the record: The project has also [[http:// | ||
+ | |||
+ | ! Feature changes: | ||
+ | * Several new features were split from the subpos feature: nountype, adjtype, verbtype and conjtype. This is a logical extension of the previously created prontype, advtype etc. | ||
+ | * The features tense and subtense have been merged. Their separation in the early years of Interset was driven by problems with encoding tagsets that lacked specialized tenses; later on however, Interset got the algorithms for strict encoding and feature replacement. Now there are other features whose values form a hierarchy, so it seems logical to treat tenses the same way. | ||
+ | |||
+ | ! **For a more detailed list of changes, see either the '' | ||
? Changes since then | ? Changes since then | ||
- | ! - | + | ! I also plan exportable conversion tables that will bring Interset functionality to programming languages other than Perl. |
+ | |||
+ | ! Feature changes: | ||
+ | * I am considering removal of the feature '' | ||
+ | * I am considering further changes in partition of numerals, in a similar spirit as with pronouns. Many words that are considered numerals in Czech are tagged as nouns, adjectives, pronouns, determiners or adverbs in other tagsets. I may decide to keep a separate part of speech for cardinal numbers but I have not arrived at a clear opinion yet. |