Both sides previous revision
Previous revision
|
Next revision
Both sides next revision
|
user:zeman:interset:how-to-write-a-driver [2007/10/01 13:53] zeman use tagset::common; |
user:zeman:interset:how-to-write-a-driver [2007/10/01 14:29] zeman Replacing values. |
| |
**Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, and if ''NNQW'' means a logical disjunction of the tags ''NNFS'' and ''NNNP'', then you cannot encode the situation in DZ Interset precisely. If you do not want to discard either ''NNFS'' or ''NNNP'' (by storing the other only), you can say that gender = ''F'' or ''N'' and number = ''S'' or ''P'' but by that you have also introduced ''NNFP'' and ''NNNS'' as possibilities. The approach may be revised in future. | **Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, and if ''NNQW'' means a logical disjunction of the tags ''NNFS'' and ''NNNP'', then you cannot encode the situation in DZ Interset precisely. If you do not want to discard either ''NNFS'' or ''NNNP'' (by storing the other only), you can say that gender = ''F'' or ''N'' and number = ''S'' or ''P'' but by that you have also introduced ''NNFP'' and ''NNNS'' as possibilities. The approach may be revised in future. |
| |
| ===== Replacing feature values with defaults ===== |
| |
| The encoder's problem is that there are more feature values on input than can be encoded on output. If a value cannot be encoded, the encoder must replace it with a suitable default. Although it can control the replacement completely by its own means (e.g. by a system of ''if''-''else'' statements), there is a central system of defaults that can take care of it. The central system however needs the following: |
| |
| - A table of replacement values for each value, ordered by precedence. There is a default table in ''tagset::common''. A driver can supply its own, if needed. |
| - The list of all tags in the tag set (implemented by the ''list()'' driver function). Then the central system will return the highest-priority //permitted// value. A value is permitted if the tag set contains a tag that yields the value when decoded. |
| |
| Building the list of permitted values is expensive (all tags must be decoded!) and you should do it only once when your driver initializes. In your ''BEGIN'' block, you should call ''tagset::common::get_permitted_values()'' and store the hash reference it returns. The hash (of arrays) will contain a list of permitted values for every feature. When you later need to check a value and replace it if necessary, you pass the hash reference back to ''tagset::common'': |
| |
| <code perl> |
| use tagset::common; |
| BEGIN |
| { |
| # Store the hash reference in a global variable. |
| $permitvals = tagset::common::get_permitted_values(list()); |
| } |
| ... |
| $replacement = tagset::common::check_value($feature, $value, $permitvals); |
| </code> |
| |
===== Common problems ===== | ===== Common problems ===== |