Differences

This shows you the differences between two versions of the page.

--- user:zeman:interset:how-to-write-a-driver [2009/09/08 17:46]
zeman Current solution.
+++ user:zeman:interset:how-to-write-a-driver [2009/09/08 17:55]
zeman Corrected number of Chinese tags.
@@ Line 149: / Line 149: @@
 $fs1 = tagset::common::enforce_permitted_joint($fs0, $permitted);
 </code>
@@ Line 161: / Line 162: @@
 The example is a realistic one. O-tags (tags setting the ''other'' feature) are often minor parts of speech. They are used for tokens that hide under broader parts of speech in other tagsets. The specific usage of the o-tags however makes many features of the broader tags unnecessary. Such features are empty in o-tags while they always must be non-empty in corresponding s-tags.
-**A possible solution** would be not to use any o-tags when scanning the possible feature value combinations. This would work for numerous tagset drivers that only resort to ''other'' when dealing with a “strange” tag. One would have to make sure when distinguishing a strange tag from its normal counterpart that only the strange tag has ''other'' set, and that the normal tag has it empty (in other words, we cannot set ''other'' for both, say, ''other = "strange"'' for the former and ''other = "normal"'' for the latter). Nevertheless, there are instances where most or all the tags of a tagset are o-tags. A good example is ''zh::conll'': poorly documented set of 200 or so tags, with most distinctions unrepresentable in DZ Interset. Its decoder only sets ''pos'' and copies the whole tag into ''other''. Excluding o-tags (meaning all tags here) would not work with this tagset.
+**A possible solution** would be not to use any o-tags when scanning the possible feature value combinations. This would work for numerous tagset drivers that only resort to ''other'' when dealing with a “strange” tag. One would have to make sure when distinguishing a strange tag from its normal counterpart that only the strange tag has ''other'' set, and that the normal tag has it empty (in other words, we cannot set ''other'' for both, say, ''other = "strange"'' for the former and ''other = "normal"'' for the latter). Nevertheless, there are instances where most or all the tags of a tagset are o-tags. A good example is ''zh::conll'': poorly documented set of 294 tags, with most distinctions unrepresentable in DZ Interset. Its decoder only sets ''pos'' and copies the whole tag into ''other''. Excluding o-tags (meaning all tags here) would not work with this tagset.
 **Another possible solution** is to implement a new subroutine that returns the list of the tags that can be used for scanning of permitted feature value combinations. By default, the subroutine would return the list of non-o-tags. For tagsets such as ''zh::conll'', it could create a taylored list of tags.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences