[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:interset:versions [2014/06/11 10:52]
zeman Interset 2.0.
user:zeman:interset:versions [2014/06/16 21:47] (current)
zeman Version 2 is out.
Line 28: Line 28:
  
 ! New usage: Interset in Treex (TectoMT). ! New usage: Interset in Treex (TectoMT).
 +
 +? 2.001
 +! 13 June 2014. Complete rewrite of Interset. The old Perl interface was not object-oriented. The modules resided under the “tagset” namespace (yes, all lowercase). The new modules are object-oriented (using Moose) and the new namespace is [[http://search.cpan.org/search?query=Lingua%3A%3AInterset&mode=all|Lingua::Interset]]. And it is available at the CPAN.
 +  * Drivers will be ported gradually but Interset 2.0 is still able to work with old drivers that you have installed in ''lib/tagset''. Initially, only the ''en::penn'' driver has been ported.
 +  * Project development has left our [[https://svn.ms.mff.cuni.cz/trac/interset/timeline|SVN server]] and landed on our [[https://redmine.ms.mff.cuni.cz/projects/interset/repository|Redmine server]]. Version control is now performed by Git.
 +  * For the record: The project has also [[http://ufal.mff.cuni.cz/interset|its page at the main ÚFAL website]]. It is pretty much empty at the moment. It may eventually become the main website of the project but not before the webmaster fixes HTML entities being damaged by Drupal.
 +
 +! Feature changes:
 +  * Several new features were split from the subpos feature: nountype, adjtype, verbtype and conjtype. This is a logical extension of the previously created prontype, advtype etc.
 +  * The features tense and subtense have been merged. Their separation in the early years of Interset was driven by problems with encoding tagsets that lacked specialized tenses; later on however, Interset got the algorithms for strict encoding and feature replacement. Now there are other features whose values form a hierarchy, so it seems logical to treat tenses the same way.
 +
 +! **For a more detailed list of changes, see either the ''Changes'' file in the distribution, or the revision history in [[https://redmine.ms.mff.cuni.cz/projects/interset/repository|Redmine]].**
  
 ? Changes since then ? Changes since then
-I am working on Interset 2.0, to be released in the second half of 2014. It will be a complete rewrite of Interset, using Moose, the object-oriented extension of Perl 5. I also plan exportable conversion tables that will bring Interset functionality to programming languages other than Perl.+! I also plan exportable conversion tables that will bring Interset functionality to programming languages other than Perl.
  
 ! Feature changes: ! Feature changes:
-  * The ''prep'' value of the ''pos'' feature (preposition) will be renamed to ''adp'' (adposition) because it covers prepositions, postpositions and circumpositions. 
-  * The ''subpos'' feature will be partially divided in several new features that reflect the main part of speech: ''nountype'', ''adjtype'', ''verbtype'' and ''conjtype''. This is a logical extension of previously created ''prontype'', ''advtype'' etc. I have not yet decided whether ''subpos'' will disappear completely or there will be a small set of values that will remain in ''subpos''. 
   * I am considering removal of the feature ''synpos''. Investigation is needed to what extent it is actually used in what tagsets and whether or not it overlaps with information stored elsewhere.   * I am considering removal of the feature ''synpos''. Investigation is needed to what extent it is actually used in what tagsets and whether or not it overlaps with information stored elsewhere.
-  * The features ''tense'' and ''subtense'' have been merged. Their separation in the early years of Interset was driven by problems with encoding tagsets that lacked specialized tenses; later on however, Interset got the algorithms for strict encoding and feature replacement. Now there are other features whose values form a hierarchy, so it seems logical to treat tenses the same way. 
   * I am considering further changes in partition of numerals, in a similar spirit as with pronouns. Many words that are considered numerals in Czech are tagged as nouns, adjectives, pronouns, determiners or adverbs in other tagsets. I may decide to keep a separate part of speech for cardinal numbers but I have not arrived at a clear opinion yet.   * I am considering further changes in partition of numerals, in a similar spirit as with pronouns. Many words that are considered numerals in Czech are tagged as nouns, adjectives, pronouns, determiners or adverbs in other tagsets. I may decide to keep a separate part of speech for cardinal numbers but I have not arrived at a clear opinion yet.

[ Back to the navigation ] [ Back to the content ]