[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:interset:how-to-write-a-driver [2008/03/07 23:24]
zeman Test that only known features and values are set.
user:zeman:interset:how-to-write-a-driver [2008/03/10 13:19]
zeman All features are relevant.
Line 7: Line 7:
 The input/output tag can be any string. If the information is stored in several kinds of tags, they can be passed in one string, using some unique delimiters. We recommend "\t" (horizontal tab, ASCII 9) as delimiter. If desirable, the input/output tag can be even a multi-line XML! The input/output tag can be any string. If the information is stored in several kinds of tags, they can be passed in one string, using some unique delimiters. We recommend "\t" (horizontal tab, ASCII 9) as delimiter. If desirable, the input/output tag can be even a multi-line XML!
  
-Empty feature value means "unknown". It is even not known, whether this feature would be relevant. If we know that a feature is irrelevant, we can set it to "n/a" (not applicable; although not mentioned explicitly, this value is allowed for all features)However, I am not sure whether this should be used at all. While something can be irrelevant in one tagset, we can hardly say that it is not relevant in any tagset. So, since we are setting a value in a universal "tagset", we probably better leave the value empty or even set it to an appropriate default.+Empty feature value means "unknown". It is even not known, whether this feature would be relevant. Some tagsets distinguish between unknown values and irrelevant features. This is not the case of Interset. While something can be irrelevant in one tagset, we can hardly say that it is not relevant in any tagset. So, since we are setting a value in a universal "tagset", we probably better leave the value empty or even set it to an appropriate default.
  
 ===== decode() ===== ===== decode() =====
Line 132: Line 132:
  
 See [[user:zeman:interset:Common Problems]] for a list of suggestions for phenomena difficult to match between tagsets and the Interset. See [[user:zeman:interset:Common Problems]] for a list of suggestions for phenomena difficult to match between tagsets and the Interset.
 +
  
  
Line 142: Line 143:
 When you have written a driver for a new tagset, you should test it. The driver package contains a test script called ''driver-test.pl''. When running it, give the driver name as argument, without the ''tagset::'' prefix. You can also use the ''-d'' option to turn on debug messages (list of tags being tested). When you have written a driver for a new tagset, you should test it. The driver package contains a test script called ''driver-test.pl''. When running it, give the driver name as argument, without the ''tagset::'' prefix. You can also use the ''-d'' option to turn on debug messages (list of tags being tested).
  
-<code>driver-test.pl ar::conll</code>+<code>driver-test.pl ar::conll 
 +driver-test.pl -a</code>
  
-Running ''driver-test.pl'' without arguments will list the drivers available on the system.+Running ''driver-test.pl'' without arguments will list the drivers available on the system. Running it with the ''-a'' option will test all the drivers.
  
 Note that only drivers implementing the ''list()'' function can be tested. Most testing involves generating the list of all possible tags and testing the driver on each tag separately. Note that only drivers implementing the ''list()'' function can be tested. Most testing involves generating the list of all possible tags and testing the driver on each tag separately.

[ Back to the navigation ] [ Back to the content ]