Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
user:zeman:interset:how-to-write-a-driver [2009/09/08 15:45] zeman Replacing feature values and the other feature. |
user:zeman:interset:how-to-write-a-driver [2017/01/16 13:08] (current) zeman Warning: obsolete. |
====== How to write a driver ====== | ====== How to write a driver (THIS PAGE IS OBSOLETE—IT IS ABOUT INTERSET VERSION 1) ====== |
| |
Perl is the language to write a driver. A driver is a simple Perl module (.pm). It should implement the following functions: ''decode()'', ''encode()'', ''list()''. The ''tagset/common.pm'' module contains some code you may have use for, so your driver module should start with | Perl is the language to write a driver. A driver is a simple Perl module (.pm). It should implement the following functions: ''decode()'', ''encode()'', ''list()''. The ''tagset/common.pm'' module contains some code you may have use for, so your driver module should start with |
$fs1 = tagset::common::enforce_permitted_joint($fs0, $permitted); | $fs1 = tagset::common::enforce_permitted_joint($fs0, $permitted); |
</code> | </code> |
| |
| |
| |
===== Replacing and the other feature ===== | ===== Replacing and the other feature ===== |
The example is a realistic one. O-tags (tags setting the ''other'' feature) are often minor parts of speech. They are used for tokens that hide under broader parts of speech in other tagsets. The specific usage of the o-tags however makes many features of the broader tags unnecessary. Such features are empty in o-tags while they always must be non-empty in corresponding s-tags. | The example is a realistic one. O-tags (tags setting the ''other'' feature) are often minor parts of speech. They are used for tokens that hide under broader parts of speech in other tagsets. The specific usage of the o-tags however makes many features of the broader tags unnecessary. Such features are empty in o-tags while they always must be non-empty in corresponding s-tags. |
| |
**A possible solution** would be not to use any o-tags when scanning the possible feature value combinations. This would work for numerous tagset drivers that only resort to ''other'' when dealing with a “strange” tag. One would have to make sure when distinguishing a strange tag from its normal counterpart that only the strange tag has ''other'' set, and that the normal tag has it empty (in other words, we cannot set ''other'' for both, say, ''other = "strange"'' for the former and ''other = "normal"'' for the latter). Nevertheless, there are instances where most or all the tags of a tagset are o-tags. A good example is ''zh::conll'': poorly documented set of 200 or so tags, with most distinctions unrepresentable in DZ Interset. Its decoder only sets ''pos'' and copies the whole tag into ''other''. Excluding o-tags (meaning all tags here) would not work with this tagset. | **A possible solution** would be not to use any o-tags when scanning the possible feature value combinations. This would work for numerous tagset drivers that only resort to ''other'' when dealing with a “strange” tag. One would have to make sure when distinguishing a strange tag from its normal counterpart that only the strange tag has ''other'' set, and that the normal tag has it empty (in other words, we cannot set ''other'' for both, say, ''other = "strange"'' for the former and ''other = "normal"'' for the latter). Nevertheless, there are instances where most or all the tags of a tagset are o-tags. A good example is ''zh::conll'': poorly documented set of 294 tags, with most distinctions unrepresentable in DZ Interset. Its decoder only sets ''pos'' and copies the whole tag into ''other''. Excluding o-tags (meaning all tags here) would not work with this tagset. |
| |
**Another possible solution** is to implement a new subroutine that returns the list of the tags that can be used for scanning of permitted feature value combinations. By default, the subroutine would return the list of non-o-tags. For tagsets such as ''zh::conll'', it could create a taylored list of tags. | **Another possible solution** is to implement a new subroutine that returns the list of the tags that can be used for scanning of permitted feature value combinations. By default, the subroutine would return the list of non-o-tags. For tagsets such as ''zh::conll'', it could create a taylored list of tags. |
| |
| **Current solution:** New (third) parameter to ''tagset::common::get_permitted_structures_joint()'', when set to 1 (''true''), says that o-tags shall be ignored. Note that it requires changing the beginning of ''encode()''. We must not enforce permitted combinations if source tagset is same as target (because we would be violating o-tags without knowing their permitted values). Instead, we should just do a simple deep copy of the feature structure to protect it from incidential modifications during encoding. The following example is taken from the ''pl::ipipan'' driver, which was the first place where the solution was applied. |
| |
| <code perl> my $f0 = shift; |
| # Modify the feature structure so that it contains values expected by this |
| # driver. Do not do that if this was also the source tagset (because the |
| # modification would damage tags using 'other'). However, in any case |
| # create a deep copy of the original feature structure so that it is |
| # protected from changes during encoding. |
| my $f; |
| if($f0->{tagset} eq 'pl::ipipan') |
| { |
| $f = tagset::common::duplicate($f0); |
| } |
| else |
| { |
| $f = tagset::common::enforce_permitted_joint($f0, $permitted); |
| } |
| my %f = %{$f}; |
| </code> |
| |
===== Common problems ===== | ===== Common problems ===== |