Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
user:zeman:interset:versions [2009/02/20 10:37] zeman vytvořeno |
user:zeman:interset:versions [2014/06/16 21:47] (current) zeman Version 2 is out. |
||
---|---|---|---|
Line 18: | Line 18: | ||
! Various maintenance changes took place, too. Version control has been migrated to network-accessible (though not publicly accessible) SVN repository, together with Trac project management interface. Website now includes information on [[License|licensing]], | ! Various maintenance changes took place, too. Version control has been migrated to network-accessible (though not publicly accessible) SVN repository, together with Trac project management interface. Website now includes information on [[License|licensing]], | ||
+ | |||
+ | ? 1.1 | ||
+ | ! 8 September 2009. Three new incarnations of Czech, English and German CoNLL tagsets, reflecting the 2009 changes in format. Most interestingly, | ||
+ | |||
+ | ? 1.2 | ||
+ | ! 27 June 2011. New drivers: Prague Spoken Corpus (Pražský mluvený korpus, PMK) long and short tags ('' | ||
+ | |||
+ | ! New test: For all tags in all drivers now must hold that deleting the value of the '' | ||
+ | |||
+ | ! New usage: Interset in Treex (TectoMT). | ||
+ | |||
+ | ? 2.001 | ||
+ | ! 13 June 2014. Complete rewrite of Interset. The old Perl interface was not object-oriented. The modules resided under the “tagset” namespace (yes, all lowercase). The new modules are object-oriented (using Moose) and the new namespace is [[http:// | ||
+ | * Drivers will be ported gradually but Interset 2.0 is still able to work with old drivers that you have installed in '' | ||
+ | * Project development has left our [[https:// | ||
+ | * For the record: The project has also [[http:// | ||
+ | |||
+ | ! Feature changes: | ||
+ | * Several new features were split from the subpos feature: nountype, adjtype, verbtype and conjtype. This is a logical extension of the previously created prontype, advtype etc. | ||
+ | * The features tense and subtense have been merged. Their separation in the early years of Interset was driven by problems with encoding tagsets that lacked specialized tenses; later on however, Interset got the algorithms for strict encoding and feature replacement. Now there are other features whose values form a hierarchy, so it seems logical to treat tenses the same way. | ||
+ | |||
+ | ! **For a more detailed list of changes, see either the '' | ||
+ | |||
+ | ? Changes since then | ||
+ | ! I also plan exportable conversion tables that will bring Interset functionality to programming languages other than Perl. | ||
+ | |||
+ | ! Feature changes: | ||
+ | * I am considering removal of the feature '' | ||
+ | * I am considering further changes in partition of numerals, in a similar spirit as with pronouns. Many words that are considered numerals in Czech are tagged as nouns, adjectives, pronouns, determiners or adverbs in other tagsets. I may decide to keep a separate part of speech for cardinal numbers but I have not arrived at a clear opinion yet. |