[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:interset:how-to-write-a-driver [2008/03/14 10:06]
zeman %f was not defined here.
user:zeman:interset:how-to-write-a-driver [2008/03/14 10:15]
zeman Mutual positions of list() and BEGIN.
Line 35: Line 35:
  
 The list is not necessary for the driver to work. However, it can be useful for [[#Test your driver|testing]] the driver. If no list is distributed along with the tagset description, you may still be able to acquire a partial list from a corpus. The list is not necessary for the driver to work. However, it can be useful for [[#Test your driver|testing]] the driver. If no list is distributed along with the tagset description, you may still be able to acquire a partial list from a corpus.
 +
  
 ===== Alternative values ===== ===== Alternative values =====
Line 65: Line 66:
 Now, what do you do with features where you want to encode arrays? You should first check whether the value is an array or not. If it is an array, you may want to ''grep'' your values rather than trying exact match, because you do not know what is going to come from other drivers, and the ordering or additional values may not be what matters. Now, what do you do with features where you want to encode arrays? You should first check whether the value is an array or not. If it is an array, you may want to ''grep'' your values rather than trying exact match, because you do not know what is going to come from other drivers, and the ordering or additional values may not be what matters.
  
-If the arrays turns out to be incompatible with what you expect, you should pick one value (we suggest you  take the first one) and proceed with default single-value processing.+If the array turns out to be incompatible with what you expect, you should pick one value (we suggest you  take the first one) and proceed with default single-value processing.
  
 <code perl> <code perl>
Line 94: Line 95:
  
 **Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, and if ''NNQW'' means a logical disjunction of the tags ''NNFS'' and ''NNNP'', then you cannot encode the situation in DZ Interset precisely. If you do not want to discard either ''NNFS'' or ''NNNP'' (by storing the other only), you can say that gender = ''F'' or ''N'' and number = ''S'' or ''P'' but by that you have also introduced ''NNFP'' and ''NNNS'' as possibilities. The approach may be revised in future. **Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, and if ''NNQW'' means a logical disjunction of the tags ''NNFS'' and ''NNNP'', then you cannot encode the situation in DZ Interset precisely. If you do not want to discard either ''NNFS'' or ''NNNP'' (by storing the other only), you can say that gender = ''F'' or ''N'' and number = ''S'' or ''P'' but by that you have also introduced ''NNFP'' and ''NNNS'' as possibilities. The approach may be revised in future.
 +
  
  
Line 108: Line 110:
  
 Building the list of permitted values is expensive (all tags must be decoded!) and you should do it only once when your driver initializes. In your ''BEGIN'' block, you should call ''tagset::common::get_permitted_values()'' and store the hash reference it returns. The hash (of arrays) will contain a list of permitted values for every feature. When you later need to check a value and replace it if necessary, you pass the hash reference back to ''tagset::common'': Building the list of permitted values is expensive (all tags must be decoded!) and you should do it only once when your driver initializes. In your ''BEGIN'' block, you should call ''tagset::common::get_permitted_values()'' and store the hash reference it returns. The hash (of arrays) will contain a list of permitted values for every feature. When you later need to check a value and replace it if necessary, you pass the hash reference back to ''tagset::common'':
 +
 +(Note that the ''list()'' function must be defined before the ''BEGIN'' block that uses it.)
  
 <code perl> <code perl>
 use tagset::common; use tagset::common;
 +...
 +sub list { ... }
 +...
 BEGIN BEGIN
 { {

[ Back to the navigation ] [ Back to the content ]