[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:interset:versions [2011/06/27 15:56]
zeman Changes since 1.1.
user:zeman:interset:versions [2011/06/27 17:03]
zeman Version 1.2. released.
Line 22: Line 22:
 ! 8 September 2009. Three new incarnations of Czech, English and German CoNLL tagsets, reflecting the 2009 changes in format. Most interestingly, German tags now contain morphosyntactic features. Thanks to Saša Rosen, who tries to use DZ Interset together with a multi-language parallel corpus called Intercorp, we also created a driver for the IPI PAN Polish corpus, which in turn caused one systemic change: o-tags (those setting the ''other'' feature) [[how-to-write-a-driver#replacing-and-the-other-feature|can now be ignored]] when the driver is scanning the possible feature-value combinations. And there is a new [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl|web interface]] to DZ Interset. ! 8 September 2009. Three new incarnations of Czech, English and German CoNLL tagsets, reflecting the 2009 changes in format. Most interestingly, German tags now contain morphosyntactic features. Thanks to Saša Rosen, who tries to use DZ Interset together with a multi-language parallel corpus called Intercorp, we also created a driver for the IPI PAN Polish corpus, which in turn caused one systemic change: o-tags (those setting the ''other'' feature) [[how-to-write-a-driver#replacing-and-the-other-feature|can now be ignored]] when the driver is scanning the possible feature-value combinations. And there is a new [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl|web interface]] to DZ Interset.
  
-Changes since then +1.2 
-! New drivers: Prague Spoken Corpus (Pražský mluvený korpus, PMK) long and short tags (''cs::pmkdl'' and ''cs::pmkkr''). Arabic CoNLL 2007 slightly differs from CoNLL 2006, so there is now ''ar::conll2007''.+27 June 2011. New drivers: Prague Spoken Corpus (Pražský mluvený korpus, PMK) long and short tags (''cs::pmkdl'' and ''cs::pmkkr''). Arabic CoNLL 2007 slightly differs from CoNLL 2006, so there is now ''ar::conll2007''. 
 ! New test: For all tags in all drivers now must hold that deleting the value of the ''other'' feature does not lead to an unknown tag. This should greatly improve chances of finding permitted feature combinations when converting from one tagset to another. ! New test: For all tags in all drivers now must hold that deleting the value of the ''other'' feature does not lead to an unknown tag. This should greatly improve chances of finding permitted feature combinations when converting from one tagset to another.
 +
 ! New usage: Interset in Treex (TectoMT). ! New usage: Interset in Treex (TectoMT).
 +
 +? Changes since then
 +

[ Back to the navigation ] [ Back to the content ]