Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
user:zeman:interset:how-to-write-a-driver [2008/03/14 10:21] zeman Conversion testing. |
user:zeman:interset:how-to-write-a-driver [2008/03/14 10:56] zeman Enforcing permitted feature structures. |
||
---|---|---|---|
Line 16: | Line 16: | ||
If the tagset encodes features separately (e.g., each character is a value of a particular feature): The decoder should be tolerant to unexpected combinations of features (or should be able to be tolerant if asked for it). | If the tagset encodes features separately (e.g., each character is a value of a particular feature): The decoder should be tolerant to unexpected combinations of features (or should be able to be tolerant if asked for it). | ||
- | |||
- | |||
- | |||
===== encode() ===== | ===== encode() ===== | ||
Line 35: | Line 32: | ||
The list is not necessary for the driver to work. However, it can be useful for [[#Test your driver|testing]] the driver. If no list is distributed along with the tagset description, | The list is not necessary for the driver to work. However, it can be useful for [[#Test your driver|testing]] the driver. If no list is distributed along with the tagset description, | ||
- | |||
===== Alternative values ===== | ===== Alternative values ===== | ||
Line 95: | Line 91: | ||
**Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, | **Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
===== Replacing feature values with defaults ===== | ===== Replacing feature values with defaults ===== | ||
Line 136: | Line 126: | ||
If an array is checked, all member values must be permitted in order for the array to be permitted. Otherwise, the array is pruned and the replacement is a subarray where only permitted values are kept. If no member values are permitted (hence the pruned subarray would be empty), the replacement is a single value, the highest-priority replacement of the first element of the array. If the original array was empty (which should never happen but we ought to be careful anyway), the single empty value is checked and possibly replaced. | If an array is checked, all member values must be permitted in order for the array to be permitted. Otherwise, the array is pruned and the replacement is a subarray where only permitted values are kept. If no member values are permitted (hence the pruned subarray would be empty), the replacement is a single value, the highest-priority replacement of the first element of the array. If the original array was empty (which should never happen but we ought to be careful anyway), the single empty value is checked and possibly replaced. | ||
+ | |||
+ | ===== Replacing whole feature structures with defaults ===== | ||
+ | |||
+ | The above technique does not guarantee that the encoder will only see feature // | ||
+ | |||
+ | Similarly to the replacement of separate values, the encoder can ask the Interset common library to replace the whole structure by something the encoder is used to (i.e. by a structure that results from decoding of a tag known by the driver). This usually removes from the encoder the burden of thinking about exotic features and values. | ||
+ | |||
+ | The correcting function tries to lie as little as possible. There is a priority value associated with every known feature. Feature values are checked (and possibly altered) in the order of feature priorities. In the above example (adjective cannot have case), the part of speech would keep its " | ||
+ | |||
+ | <code perl> | ||
+ | use tagset:: | ||
+ | ... | ||
+ | sub list { ... } | ||
+ | ... | ||
+ | BEGIN | ||
+ | { | ||
+ | # Store the hash reference in a global variable. | ||
+ | $permitted = tagset:: | ||
+ | } | ||
+ | ... | ||
+ | # Give reference to feature structure. Get reference to a new one (deep copy). | ||
+ | $fs1 = tagset:: | ||
+ | </ | ||
===== Common problems ===== | ===== Common problems ===== |