[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:interset:how-to-write-a-driver [2007/10/01 14:34]
zeman Arrays and replacements (a proposal).
user:zeman:interset:how-to-write-a-driver [2008/03/08 11:06]
zeman Testing all drivers.
Line 93: Line 93:
  
 **Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, and if ''NNQW'' means a logical disjunction of the tags ''NNFS'' and ''NNNP'', then you cannot encode the situation in DZ Interset precisely. If you do not want to discard either ''NNFS'' or ''NNNP'' (by storing the other only), you can say that gender = ''F'' or ''N'' and number = ''S'' or ''P'' but by that you have also introduced ''NNFP'' and ''NNNS'' as possibilities. The approach may be revised in future. **Note:** This approach cannot encode situations where some combinations of feature values are plausible and some are not! For instance, if positions [2] and [3] in a tag encode gender and number, respectively, and if ''NNQW'' means a logical disjunction of the tags ''NNFS'' and ''NNNP'', then you cannot encode the situation in DZ Interset precisely. If you do not want to discard either ''NNFS'' or ''NNNP'' (by storing the other only), you can say that gender = ''F'' or ''N'' and number = ''S'' or ''P'' but by that you have also introduced ''NNFP'' and ''NNNS'' as possibilities. The approach may be revised in future.
 +
 +
 +
 +
  
  
Line 109: Line 113:
 { {
     # Store the hash reference in a global variable.     # Store the hash reference in a global variable.
-    $permitvals = tagset::common::get_permitted_values(list());+    $permitvals = tagset::common::get_permitted_values(list(), \&decode);
 } }
 ... ...
Line 115: Line 119:
 </code> </code>
  
-If array is an allowed value, it can be matched only against an array where all elements match(Howeverorder of elements in array is not significant.If array needs to be replacedwe first check whether a subarray is allowed. (Again, there should be ordering of value priorities.If notwe look simultaneously for replacements of all elements. Only one replacement value is selected.+Alternatively, the following checks **and replaces** values of all features in a feature structure: 
 + 
 +<code perl> 
 +tagset::common::enforce_permitted_values($fstruct, $permitvals); 
 +</code> 
 + 
 +If an array is a permitted value, all member values are permitted. 
 + 
 +If an array is checked, all member values must be permitted in order for the array to be permittedOtherwisethe array is pruned and the replacement is a subarray where only permitted values are kept. If no member values are permitted (hence the pruned subarray would be empty)the replacement is a single value, the highest-priority replacement of the first element of the arrayIf the original array was empty (which should never happen but we ought to be careful anyway), the single empty value is checked and possibly replaced.
  
 ===== Common problems ===== ===== Common problems =====
  
 See [[user:zeman:interset:Common Problems]] for a list of suggestions for phenomena difficult to match between tagsets and the Interset. See [[user:zeman:interset:Common Problems]] for a list of suggestions for phenomena difficult to match between tagsets and the Interset.
 +
 +
 +
 +
 +
 +
  
 ===== Test your driver ===== ===== Test your driver =====
  
-When you have written a driver for a new tagset, you should test it. An Interset service module can perform the following tests:+When you have written a driver for a new tagset, you should test it. The driver package contains a test script called ''driver-test.pl''. When running it, give the driver name as argument, without the ''tagset::'' prefix. You can also use the ''-d'' option to turn on debug messages (list of tags being tested). 
 + 
 +<code>driver-test.pl ar::conll 
 +driver-test.pl -a</code> 
 + 
 +Running ''driver-test.pl'' without arguments will list the drivers available on the system. Running it with the ''-a'' option will test all the drivers. 
 + 
 +Note that only drivers implementing the ''list()'' function can be tested. Most testing involves generating the list of all possible tags and testing the driver on each tag separately. 
 + 
 +The following tests will be performed:
  
-  * Get list of possible tags by calling list()Go through the list and check for each tag that encode(decode($tag))=$tag. While sometimes it can be annoying to try to preserve some obscure information hidden in the tags, this test can also reveal many unwanted bugs. Besides, you should preserve information of your own tagset because people may want to use your driver merely to //access// the tags, instead of //converting// them.+  * Decode each tag and check that only known features and values are setIn addition to a built-in list, every feature can have an empty value, and the features "tagset" and "other" can have any value. 
 +  * Check for each tag that ''encode(decode($tag)) eq $tag''. While sometimes it can be annoying to try to preserve some obscure information hidden in the tags, this test can also reveal many unwanted bugs. Besides, you should preserve information of your own tagset because people may want to use your driver merely to //access// the tags, instead of //converting// them.
  
-To perform the test, run the script ''driver-test.pl'' in the ''tagset'' root folder. Note that the name of the driver to test is currently hard-coded into the source. In future, it will be changed to a command-line argument. 

[ Back to the navigation ] [ Back to the content ]