Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:how-to-write-a-driver [2007/03/07 10:33] zeman |
user:zeman:interset:how-to-write-a-driver [2007/10/01 14:29] zeman Replacing values. |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== How to write a driver ====== | ====== How to write a driver ====== | ||
- | Perl is the language to write a driver. A driver is a simple Perl module (.pm). It should implement the following functions: decode(), encode(), list(). | + | Perl is the language to write a driver. A driver is a simple Perl module (.pm). It should implement the following functions: |
- | Input/output tag can be any string. If the information is stored in several kinds of tags, they can be passed in one string, using some unique delimiters. We recommend " | + | <code perl>use tagset:: |
+ | |||
+ | The input/output tag can be any string. If the information is stored in several kinds of tags, they can be passed in one string, using some unique delimiters. We recommend " | ||
Empty feature value means " | Empty feature value means " | ||
Line 11: | Line 13: | ||
This function has one string argument, the tag. The function returns a reference to a hash of features (feature names are hash keys to the feature values). | This function has one string argument, the tag. The function returns a reference to a hash of features (feature names are hash keys to the feature values). | ||
- | The decoder is not obliged to set any feature. If the decoder decides to set a feature, it should be one of the pre-defined values. This can be checked by a central procedure. However, it is not mandatory, so if the appropriate value is not available, you can use your own, but please do **[[zeman@ufal.mff.cuni.cz|let me know]]** so I can update the central value pool accordingly. | + | The decoder is not obliged to set any feature. If the decoder decides to set a feature, it should be one of the pre-defined values. This can be checked by a central procedure. However, it is not mandatory, so if the appropriate value is not available, you can use your own, but please do **[[zeman@ufal.mff.cuni.cz|let me know]]** so I can update the [[features|central value pool]] accordingly. |
If the tagset encodes features separately (e.g., each character is a value of a particular feature): The decoder should be tolerant to unexpected combinations of features (or should be able to be tolerant if asked for it). | If the tagset encodes features separately (e.g., each character is a value of a particular feature): The decoder should be tolerant to unexpected combinations of features (or should be able to be tolerant if asked for it). | ||
+ | |||
+ | |||
===== encode() ===== | ===== encode() ===== | ||
Line 19: | Line 23: | ||
This function has one argument, a reference to a hash of features (feature names are hash keys to the feature values). The function returns a string - the tag. | This function has one argument, a reference to a hash of features (feature names are hash keys to the feature values). The function returns a string - the tag. | ||
- | The encoder should be able to process all possible values from the central pool. If the tagset does not recognize a value, the most appropriate substitute should be chosen. | + | The encoder should be able to process all possible values from the [[features|central pool]]. If the tagset does not recognize a value, the most appropriate substitute should be chosen. |
- | Since any feature can in theory have an array of values instead of a single value, the encoder should either be prepared to arrays (more precisely: array references) anywhere, or call tagset:: | + | Since any feature can in theory have an array of values instead of a single value, the encoder should either be prepared to arrays (more precisely: array references) anywhere, or call '' |
- | **WARNING: | + | **WARNING: |
===== list() ===== | ===== list() ===== | ||
Line 89: | Line 93: | ||
**Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, | **Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, | ||
+ | |||
+ | ===== Replacing feature values with defaults ===== | ||
+ | |||
+ | The encoder' | ||
+ | |||
+ | - A table of replacement values for each value, ordered by precedence. There is a default table in '' | ||
+ | - The list of all tags in the tag set (implemented by the '' | ||
+ | |||
+ | Building the list of permitted values is expensive (all tags must be decoded!) and you should do it only once when your driver initializes. In your '' | ||
+ | |||
+ | <code perl> | ||
+ | use tagset:: | ||
+ | BEGIN | ||
+ | { | ||
+ | # Store the hash reference in a global variable. | ||
+ | $permitvals = tagset:: | ||
+ | } | ||
+ | ... | ||
+ | $replacement = tagset:: | ||
+ | </ | ||
===== Common problems ===== | ===== Common problems ===== | ||
Line 101: | Line 125: | ||
To perform the test, run the script '' | To perform the test, run the script '' | ||
- |