Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
user:zeman:interset:how-to-use [2007/03/01 12:20] zeman |
user:zeman:interset:how-to-use [2017/01/16 13:06] (current) zeman |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== DZ Interset Manual ====== | ||
- | ===== Manual | + | ===== Installation |
- | ==== How to use the Interset | + | If you exist on the ÚFAL network and use Perl from PerlBrew, you probably already have '' |
- | You can write your own tag conversion Perl script, and use the Interset driver library. You have to tell Perl where to find the drivers: | + | <code bash> |
- | < | + | **Contributions welcome!** If you write your own driver, please share it with others! If you send it to me, I will add it to the package on CPAN. |
- | Once the variable is set, writing a conversion script is very easy. For instance, my '' | + | ==== Existing drivers ==== |
+ | |||
+ | Use the tool '' | ||
+ | |||
+ | ==== Directory Structure ==== | ||
+ | |||
+ | The drivers are Perl modules | ||
<code perl> | <code perl> | ||
- | use tagset::cs::pdt; | + | use Lingua::Interset::Tagset::LL::Xxx; |
- | use tagset::en::penn; | + | </ |
+ | but usually it is more convenient to just call the main module and then refer to the tagset using the lowercased identifier: | ||
+ | |||
+ | <code perl> | ||
+ | use Lingua:: | ||
+ | my $fs = decode(' | ||
+ | </ | ||
+ | |||
+ | The main object in Interset is of the class '' | ||
+ | |||
+ | There is also the driver testing script, '' | ||
+ | |||
+ | |||
+ | ===== How to use the Interset ===== | ||
+ | |||
+ | You can write your own Perl script to convert tags, and use the Interset driver library. You may have to tell Perl where to find Interset (the following commands work in '' | ||
+ | |||
+ | < | ||
+ | setenv PATH / | ||
+ | |||
+ | Once the variable is set, writing a conversion script is very easy. Here is an example (note that in CoNLL-X files we often merge the contents of the CPOS, POS and FEATS columns to create one long string that will be seen by Interset as one “tag”): | ||
+ | |||
+ | <code perl> | ||
+ | use Lingua:: | ||
+ | |||
+ | my $c = new Lingua:: | ||
+ | |||
+ | # Read the CoNLL-X file from STDIN or from files given as arguments. | ||
while(<> | while(<> | ||
{ | { | ||
- | | + | |
{ | { | ||
- | my $tag0 = $1; | + | |
- | my $features | + | my @f = split(/\t/, $_); |
- | my $tag1 = tagset:: | + | |
- | | + | my $utag = $c-> |
+ | my ($upos, $ufeat) | ||
+ | $f[3] = $upos; | ||
+ | $f[5] = $ufeat; | ||
+ | $_ = join(" | ||
} | } | ||
- | print; | + | |
+ | | ||
} | } | ||
</ | </ | ||
- | |||
- | Note the two-step replacement of the original tag. I do not dare to use the original tag in a regular expression because there could be special characters in the tag. | ||