[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:interset:how-to-use [2007/09/27 18:21]
zeman Directory structure.
user:zeman:interset:how-to-use [2008/03/10 14:25]
zeman How to get the list of drivers.
Line 4: Line 4:
  
 If you exist on the ÚFAL network, you can use directly Dan's version here. Otherwise, you need to [[mailto:zeman@ufal.mff.cuni.cz|ask Dan]] for a zipped package of the currently existing drivers. (I intend to maintain it here for download some time later.) Unzip it to a convenient place; below, we assume it is in ''/home/zeman/lib/perl''. If you exist on the ÚFAL network, you can use directly Dan's version here. Otherwise, you need to [[mailto:zeman@ufal.mff.cuni.cz|ask Dan]] for a zipped package of the currently existing drivers. (I intend to maintain it here for download some time later.) Unzip it to a convenient place; below, we assume it is in ''/home/zeman/lib/perl''.
 +
 +**Note:** I decided to put the whole thing under version control. At the same time, I moved it from my lib folder to my project folder. So the current ÚFAL location is ''/home/zeman/projekty/interset/lib''.
  
 **Contributions welcome!** If you write your own driver, please share it with others! If you send it to me, I will add it to the package for download here. **Contributions welcome!** If you write your own driver, please share it with others! If you send it to me, I will add it to the package for download here.
  
 === Existing drivers === === Existing drivers ===
 +
 +Note: This list may not be up-to-date. To see what drivers are currently available on your system, call ''driver-test.pl'' without arguments.
  
   * tagset::ar::conll - Arabic CoNLL treebank (coarse, fine and feat fields in one string, delimited by tabs)   * tagset::ar::conll - Arabic CoNLL treebank (coarse, fine and feat fields in one string, delimited by tabs)
Line 29: Line 33:
 Besides drivers, there is a library of useful functions that can be called from within drivers: ''tagset/common.pm''. Besides drivers, there is a library of useful functions that can be called from within drivers: ''tagset/common.pm''.
  
-There is also the driver testing script, ''driver-test.pl''. In the distribution package, this script is in the ''tagset'' folder. However, since ''tagset'' is going to live under one of your ''lib''s, you may prefer to move the script under one of your ''bin''s, e.g. ''~/bin''. The distribution may contain some sample conversion scripts as well; however, these depend much more on the file format than on the tagset drivers, and thus you'll probably need to write your own anyway.+There is also the driver testing script, ''bin/driver-test.pl''. The distribution may contain some sample conversion scripts as well; however, these depend much more on the file format than on the tagset drivers, and thus you'll probably need to write your own anyway. 
 + 
  
 ==== How to use the Interset ==== ==== How to use the Interset ====
Line 35: Line 41:
 You can write your own tag conversion Perl script, and use the Interset driver library. You have to tell Perl where to find the drivers: You can write your own tag conversion Perl script, and use the Interset driver library. You have to tell Perl where to find the drivers:
  
-<code>setenv PERLLIB /home/zeman/lib/perl:$PERLLIB</code>+<code>setenv PERLLIB /home/zeman/projekty/interset/lib:$PERLLIB 
 +setenv PATH /home/zeman/projekty/interset/bin:$PATH</code>
  
 Once the variable is set, writing a conversion script is very easy. For instance, my ''csts-cs-pdt-en-penn.pl'' script (meaning "read and write [[:Formát CSTS|CSTS format]], read Czech PDT tags, write English Penn tags) essentially looks like this: Once the variable is set, writing a conversion script is very easy. For instance, my ''csts-cs-pdt-en-penn.pl'' script (meaning "read and write [[:Formát CSTS|CSTS format]], read Czech PDT tags, write English Penn tags) essentially looks like this:

[ Back to the navigation ] [ Back to the content ]