Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:how-to-use [2008/03/10 12:54] zeman New location. |
user:zeman:interset:how-to-use [2009/02/20 15:04] zeman Path correction. |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Manual ===== | + | ====== Manual |
- | ==== Installation ==== | + | ===== Installation |
- | If you exist on the ÚFAL network, you can use directly Dan's version here. Otherwise, you need to [[mailto: | + | If you exist on the ÚFAL network, you can use directly Dan's version here. Otherwise, you need to [[download]] a zipped package of the currently existing drivers. Unzip it to a convenient place; below, we assume it is in ''/ |
- | + | ||
- | **Note:** I decided to put the whole thing under version control. At the same time, I moved it from my lib folder to my project folder. So the current ÚFAL location is ''/ | + | |
**Contributions welcome!** If you write your own driver, please share it with others! If you send it to me, I will add it to the package for download here. | **Contributions welcome!** If you write your own driver, please share it with others! If you send it to me, I will add it to the package for download here. | ||
- | === Existing drivers === | + | ==== Existing drivers |
- | * tagset:: | + | Note: This list may not be up-to-date. To see what drivers are currently available on your system, call '' |
- | * tagset:: | + | |
- | * tagset:: | + | |
- | * tagset:: | + | |
- | * tagset:: | + | |
- | * tagset:: | + | |
- | * tagset:: | + | |
- | * tagset:: | + | |
- | * tagset:: | + | |
- | === Directory Structure === | + | - tagset:: |
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
+ | - tagset:: | ||
- | The drivers are Perl modules and must be somewhere under '' | + | ==== Directory Structure ==== |
+ | |||
+ | The drivers are Perl modules and must be somewhere under '' | ||
<code perl> | <code perl> | ||
Line 31: | Line 38: | ||
Besides drivers, there is a library of useful functions that can be called from within drivers: '' | Besides drivers, there is a library of useful functions that can be called from within drivers: '' | ||
- | There is also the driver testing script, '' | + | There is also the driver testing script, '' |
- | ==== How to use the Interset ==== | ||
- | You can write your own tag conversion Perl script, and use the Interset | + | ===== How to use the Interset |
- | < | + | You can write your own tag conversion Perl script, and use the Interset driver library. You have to tell Perl where to find the drivers (the following commands work in '' |
+ | |||
+ | < | ||
+ | setenv PATH / | ||
Once the variable is set, writing a conversion script is very easy. For instance, my '' | Once the variable is set, writing a conversion script is very easy. For instance, my '' | ||
Line 59: | Line 68: | ||
Note the two-step replacement of the original tag. I do not dare to use the original tag in a regular expression because there could be special characters in the tag. | Note the two-step replacement of the original tag. I do not dare to use the original tag in a regular expression because there could be special characters in the tag. | ||
+ | |||
+ | Some operations performed by the drivers (especially when encoding) are not trivial. While you may not observe long processing times for toy runs, it might matter once you start converting millions of tags in a big corpus. Then you may want to use up the fact that there are tens to thousands of tags, and cache their translations like in the following example: | ||
+ | |||
+ | <code perl> | ||
+ | use tagset:: | ||
+ | use tagset:: | ||
+ | |||
+ | while(<> | ||
+ | { | ||
+ | if(s/< | ||
+ | { | ||
+ | my $tag0 = $1; | ||
+ | my $tag1; | ||
+ | if(exists($cache{$tag0})) | ||
+ | { | ||
+ | $tag1 = $cache{$tag0}; | ||
+ | } | ||
+ | else | ||
+ | { | ||
+ | my $features = tagset:: | ||
+ | $tag1 = tagset:: | ||
+ | $cache{$tag0} = $tag1; | ||
+ | } | ||
+ | s/< | ||
+ | } | ||
+ | print; | ||
+ | } | ||
+ | </ | ||