Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:how-to-write-a-driver [2008/03/14 10:21] zeman Conversion testing. |
user:zeman:interset:how-to-write-a-driver [2009/09/08 15:45] zeman Replacing feature values and the other feature. |
||
---|---|---|---|
Line 16: | Line 16: | ||
If the tagset encodes features separately (e.g., each character is a value of a particular feature): The decoder should be tolerant to unexpected combinations of features (or should be able to be tolerant if asked for it). | If the tagset encodes features separately (e.g., each character is a value of a particular feature): The decoder should be tolerant to unexpected combinations of features (or should be able to be tolerant if asked for it). | ||
- | |||
- | |||
- | |||
===== encode() ===== | ===== encode() ===== | ||
Line 35: | Line 32: | ||
The list is not necessary for the driver to work. However, it can be useful for [[#Test your driver|testing]] the driver. If no list is distributed along with the tagset description, | The list is not necessary for the driver to work. However, it can be useful for [[#Test your driver|testing]] the driver. If no list is distributed along with the tagset description, | ||
- | |||
===== Alternative values ===== | ===== Alternative values ===== | ||
Line 95: | Line 91: | ||
**Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, | **Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
===== Replacing feature values with defaults ===== | ===== Replacing feature values with defaults ===== | ||
Line 137: | Line 127: | ||
If an array is checked, all member values must be permitted in order for the array to be permitted. Otherwise, the array is pruned and the replacement is a subarray where only permitted values are kept. If no member values are permitted (hence the pruned subarray would be empty), the replacement is a single value, the highest-priority replacement of the first element of the array. If the original array was empty (which should never happen but we ought to be careful anyway), the single empty value is checked and possibly replaced. | If an array is checked, all member values must be permitted in order for the array to be permitted. Otherwise, the array is pruned and the replacement is a subarray where only permitted values are kept. If no member values are permitted (hence the pruned subarray would be empty), the replacement is a single value, the highest-priority replacement of the first element of the array. If the original array was empty (which should never happen but we ought to be careful anyway), the single empty value is checked and possibly replaced. | ||
- | ===== Common problems | + | ===== Replacing whole feature structures with defaults |
- | See [[user: | + | The above technique does not guarantee that the encoder will only see feature // |
+ | Similarly to the replacement of separate values, the encoder can ask the Interset common library to replace the whole structure by something the encoder is used to (i.e. by a structure that results from decoding of a tag known by the driver). This usually removes from the encoder the burden of thinking about exotic features and values. | ||
+ | The correcting function tries to lie as little as possible. There is a priority value associated with every known feature. Feature values are checked (and possibly altered) in the order of feature priorities. In the above example (adjective cannot have case), the part of speech would keep its " | ||
+ | <code perl> | ||
+ | use tagset:: | ||
+ | ... | ||
+ | sub list { ... } | ||
+ | ... | ||
+ | BEGIN | ||
+ | { | ||
+ | # Store the hash reference in a global variable. | ||
+ | $permitted = tagset:: | ||
+ | } | ||
+ | ... | ||
+ | # Give reference to feature structure. Get reference to a new one (deep copy). | ||
+ | $fs1 = tagset:: | ||
+ | </ | ||
+ | ===== Replacing and the other feature ===== | ||
+ | Replacing feature values with defaults has its limitations. It only works with pre-known feature values. It does not touch the features '' | ||
+ | The key problem lies in the method we use to obtain permitted combinations of feature values. All tags of the tagset are decoded into feature structures, which subsequently represent the permitted combinations. Values of '' | ||
+ | **Example: | ||
+ | |||
+ | The example is a realistic one. O-tags (tags setting the '' | ||
+ | |||
+ | **A possible solution** would be not to use any o-tags when scanning the possible feature value combinations. This would work for numerous tagset drivers that only resort to '' | ||
+ | |||
+ | **Another possible solution** is to implement a new subroutine that returns the list of the tags that can be used for scanning of permitted feature value combinations. By default, the subroutine would return the list of non-o-tags. For tagsets such as '' | ||
+ | |||
+ | ===== Common problems ===== | ||
+ | |||
+ | See [[user: | ||
===== Test your driver ===== | ===== Test your driver ===== | ||
Line 153: | Line 173: | ||
< | < | ||
- | driver-test.pl -a | ||
driver-test.pl bg::conll cs::pdt | driver-test.pl bg::conll cs::pdt | ||
+ | driver-test.pl -a | ||
driver-test.pl -A</ | driver-test.pl -A</ | ||
- | Running '' | + | Running '' |
Note that only drivers implementing the '' | Note that only drivers implementing the '' |