[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
user:zeman:interset:how-to-write-a-driver [2009/09/08 17:46]
zeman Current solution.
user:zeman:interset:how-to-write-a-driver [2009/09/08 17:55]
zeman Corrected number of Chinese tags.
Line 149: Line 149:
 $fs1 = tagset::​common::​enforce_permitted_joint($fs0,​ $permitted);​ $fs1 = tagset::​common::​enforce_permitted_joint($fs0,​ $permitted);​
 </​code>​ </​code>​
 +
  
  
Line 161: Line 162:
 The example is a realistic one. O-tags (tags setting the ''​other''​ feature) are often minor parts of speech. They are used for tokens that hide under broader parts of speech in other tagsets. The specific usage of the o-tags however makes many features of the broader tags unnecessary. Such features are empty in o-tags while they always must be non-empty in corresponding s-tags. The example is a realistic one. O-tags (tags setting the ''​other''​ feature) are often minor parts of speech. They are used for tokens that hide under broader parts of speech in other tagsets. The specific usage of the o-tags however makes many features of the broader tags unnecessary. Such features are empty in o-tags while they always must be non-empty in corresponding s-tags.
  
-**A possible solution** would be not to use any o-tags when scanning the possible feature value combinations. This would work for numerous tagset drivers that only resort to ''​other''​ when dealing with a “strange” tag. One would have to make sure when distinguishing a strange tag from its normal counterpart that only the strange tag has ''​other''​ set, and that the normal tag has it empty (in other words, we cannot set ''​other''​ for both, say, ''​other = "​strange"''​ for the former and ''​other = "​normal"''​ for the latter). Nevertheless,​ there are instances where most or all the tags of a tagset are o-tags. A good example is ''​zh::​conll'':​ poorly documented set of 200 or so tags, with most distinctions unrepresentable in DZ Interset. Its decoder only sets ''​pos''​ and copies the whole tag into ''​other''​. Excluding o-tags (meaning all tags here) would not work with this tagset.+**A possible solution** would be not to use any o-tags when scanning the possible feature value combinations. This would work for numerous tagset drivers that only resort to ''​other''​ when dealing with a “strange” tag. One would have to make sure when distinguishing a strange tag from its normal counterpart that only the strange tag has ''​other''​ set, and that the normal tag has it empty (in other words, we cannot set ''​other''​ for both, say, ''​other = "​strange"''​ for the former and ''​other = "​normal"''​ for the latter). Nevertheless,​ there are instances where most or all the tags of a tagset are o-tags. A good example is ''​zh::​conll'':​ poorly documented set of 294 tags, with most distinctions unrepresentable in DZ Interset. Its decoder only sets ''​pos''​ and copies the whole tag into ''​other''​. Excluding o-tags (meaning all tags here) would not work with this tagset.
  
 **Another possible solution** is to implement a new subroutine that returns the list of the tags that can be used for scanning of permitted feature value combinations. By default, the subroutine would return the list of non-o-tags. For tagsets such as ''​zh::​conll'',​ it could create a taylored list of tags. **Another possible solution** is to implement a new subroutine that returns the list of the tags that can be used for scanning of permitted feature value combinations. By default, the subroutine would return the list of non-o-tags. For tagsets such as ''​zh::​conll'',​ it could create a taylored list of tags.

[ Back to the navigation ] [ Back to the content ]