Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:how-to-use [2008/03/10 14:25] zeman How to get the list of drivers. |
user:zeman:interset:how-to-use [2008/03/13 17:46] zeman Oprava. |
||
---|---|---|---|
Line 34: | Line 34: | ||
There is also the driver testing script, '' | There is also the driver testing script, '' | ||
+ | |||
+ | |||
Line 64: | Line 66: | ||
Note the two-step replacement of the original tag. I do not dare to use the original tag in a regular expression because there could be special characters in the tag. | Note the two-step replacement of the original tag. I do not dare to use the original tag in a regular expression because there could be special characters in the tag. | ||
+ | |||
+ | Some operations performed by the drivers (especially when encoding) are not trivial. While you may not observe long processing times for toy runs, it might matter once you start converting millions of tags in a big corpus. Then you may want to use up the fact that there are tens to thousands of tags, and cache their translations like in the following example: | ||
+ | |||
+ | <code perl> | ||
+ | use tagset:: | ||
+ | use tagset:: | ||
+ | |||
+ | while(<> | ||
+ | { | ||
+ | if(s/< | ||
+ | { | ||
+ | my $tag0 = $1; | ||
+ | my $tag1; | ||
+ | if(exists($cache{$tag0})) | ||
+ | { | ||
+ | $tag1 = $cache{$tag0}; | ||
+ | } | ||
+ | else | ||
+ | { | ||
+ | my $features = tagset:: | ||
+ | $tag1 = tagset:: | ||
+ | $cache{$tag0} = $tag1; | ||
+ | } | ||
+ | s/< | ||
+ | } | ||
+ | print; | ||
+ | } | ||
+ | </ | ||