[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:interset:versions [2014/06/11 10:52]
zeman Interset 2.0.
user:zeman:interset:versions [2014/06/16 21:47]
zeman Version 2 is out.
Line 28: Line 28:
  
 ! New usage: Interset in Treex (TectoMT). ! New usage: Interset in Treex (TectoMT).
 +
 +? 2.001
 +! 13 June 2014. Complete rewrite of Interset. The old Perl interface was not object-oriented. The modules resided under the “tagset” namespace (yes, all lowercase). The new modules are object-oriented (using Moose) and the new namespace is [[http://​search.cpan.org/​search?​query=Lingua%3A%3AInterset&​mode=all|Lingua::​Interset]]. And it is available at the CPAN.
 +  * Drivers will be ported gradually but Interset 2.0 is still able to work with old drivers that you have installed in ''​lib/​tagset''​. Initially, only the ''​en::​penn''​ driver has been ported.
 +  * Project development has left our [[https://​svn.ms.mff.cuni.cz/​trac/​interset/​timeline|SVN server]] and landed on our [[https://​redmine.ms.mff.cuni.cz/​projects/​interset/​repository|Redmine server]]. Version control is now performed by Git.
 +  * For the record: The project has also [[http://​ufal.mff.cuni.cz/​interset|its page at the main ÚFAL website]]. It is pretty much empty at the moment. It may eventually become the main website of the project but not before the webmaster fixes HTML entities being damaged by Drupal.
 +
 +! Feature changes:
 +  * Several new features were split from the subpos feature: nountype, adjtype, verbtype and conjtype. This is a logical extension of the previously created prontype, advtype etc.
 +  * The features tense and subtense have been merged. Their separation in the early years of Interset was driven by problems with encoding tagsets that lacked specialized tenses; later on however, Interset got the algorithms for strict encoding and feature replacement. Now there are other features whose values form a hierarchy, so it seems logical to treat tenses the same way.
 +
 +! **For a more detailed list of changes, see either the ''​Changes''​ file in the distribution,​ or the revision history in [[https://​redmine.ms.mff.cuni.cz/​projects/​interset/​repository|Redmine]].**
  
 ? Changes since then ? Changes since then
-I am working on Interset 2.0, to be released in the second half of 2014. It will be a complete rewrite of Interset, using Moose, the object-oriented extension of Perl 5. I also plan exportable conversion tables that will bring Interset functionality to programming languages other than Perl.+! I also plan exportable conversion tables that will bring Interset functionality to programming languages other than Perl.
  
 ! Feature changes: ! Feature changes:
-  * The ''​prep''​ value of the ''​pos''​ feature (preposition) will be renamed to ''​adp''​ (adposition) because it covers prepositions,​ postpositions and circumpositions. 
-  * The ''​subpos''​ feature will be partially divided in several new features that reflect the main part of speech: ''​nountype'',​ ''​adjtype'',​ ''​verbtype''​ and ''​conjtype''​. This is a logical extension of previously created ''​prontype'',​ ''​advtype''​ etc. I have not yet decided whether ''​subpos''​ will disappear completely or there will be a small set of values that will remain in ''​subpos''​. 
   * I am considering removal of the feature ''​synpos''​. Investigation is needed to what extent it is actually used in what tagsets and whether or not it overlaps with information stored elsewhere.   * I am considering removal of the feature ''​synpos''​. Investigation is needed to what extent it is actually used in what tagsets and whether or not it overlaps with information stored elsewhere.
-  * The features ''​tense''​ and ''​subtense''​ have been merged. Their separation in the early years of Interset was driven by problems with encoding tagsets that lacked specialized tenses; later on however, Interset got the algorithms for strict encoding and feature replacement. Now there are other features whose values form a hierarchy, so it seems logical to treat tenses the same way. 
   * I am considering further changes in partition of numerals, in a similar spirit as with pronouns. Many words that are considered numerals in Czech are tagged as nouns, adjectives, pronouns, determiners or adverbs in other tagsets. I may decide to keep a separate part of speech for cardinal numbers but I have not arrived at a clear opinion yet.   * I am considering further changes in partition of numerals, in a similar spirit as with pronouns. Many words that are considered numerals in Czech are tagged as nouns, adjectives, pronouns, determiners or adverbs in other tagsets. I may decide to keep a separate part of speech for cardinal numbers but I have not arrived at a clear opinion yet.

[ Back to the navigation ] [ Back to the content ]