Differences
This shows you the differences between two versions of the page.
Both sides previous revision
Previous revision
|
Next revision
Both sides next revision
|
user:zeman:interset:versions [2011/06/27 15:56] zeman Changes since 1.1. |
user:zeman:interset:versions [2011/06/27 17:03] zeman Version 1.2. released. |
! 8 September 2009. Three new incarnations of Czech, English and German CoNLL tagsets, reflecting the 2009 changes in format. Most interestingly, German tags now contain morphosyntactic features. Thanks to Saša Rosen, who tries to use DZ Interset together with a multi-language parallel corpus called Intercorp, we also created a driver for the IPI PAN Polish corpus, which in turn caused one systemic change: o-tags (those setting the ''other'' feature) [[how-to-write-a-driver#replacing-and-the-other-feature|can now be ignored]] when the driver is scanning the possible feature-value combinations. And there is a new [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl|web interface]] to DZ Interset. | ! 8 September 2009. Three new incarnations of Czech, English and German CoNLL tagsets, reflecting the 2009 changes in format. Most interestingly, German tags now contain morphosyntactic features. Thanks to Saša Rosen, who tries to use DZ Interset together with a multi-language parallel corpus called Intercorp, we also created a driver for the IPI PAN Polish corpus, which in turn caused one systemic change: o-tags (those setting the ''other'' feature) [[how-to-write-a-driver#replacing-and-the-other-feature|can now be ignored]] when the driver is scanning the possible feature-value combinations. And there is a new [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl|web interface]] to DZ Interset. |
| |
? Changes since then | ? 1.2 |
! New drivers: Prague Spoken Corpus (Pražský mluvený korpus, PMK) long and short tags (''cs::pmkdl'' and ''cs::pmkkr''). Arabic CoNLL 2007 slightly differs from CoNLL 2006, so there is now ''ar::conll2007''. | ! 27 June 2011. New drivers: Prague Spoken Corpus (Pražský mluvený korpus, PMK) long and short tags (''cs::pmkdl'' and ''cs::pmkkr''). Arabic CoNLL 2007 slightly differs from CoNLL 2006, so there is now ''ar::conll2007''. |
! New test: For all tags in all drivers now must hold that deleting the value of the ''other'' feature does not lead to an unknown tag. This should greatly improve chances of finding permitted feature combinations when converting from one tagset to another. | ! New test: For all tags in all drivers now must hold that deleting the value of the ''other'' feature does not lead to an unknown tag. This should greatly improve chances of finding permitted feature combinations when converting from one tagset to another. |
| |
! New usage: Interset in Treex (TectoMT). | ! New usage: Interset in Treex (TectoMT). |
| |
| ? Changes since then |
| |