Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
user:zeman:interset:how-to-use [2008/03/14 09:56] zeman Hyperlink to ISO 639-1. |
user:zeman:interset:how-to-use [2008/03/31 22:15] zeman de::conll |
===== Manual ===== | ===== Manual ===== |
| |
| |
==== Installation ==== | ==== Installation ==== |
Note: This list may not be up-to-date. To see what drivers are currently available on your system, call ''driver-test.pl'' without arguments. | Note: This list may not be up-to-date. To see what drivers are currently available on your system, call ''driver-test.pl'' without arguments. |
| |
* tagset::ar::conll - Arabic CoNLL treebank (coarse, fine and feat fields in one string, delimited by tabs) | - tagset::ar::conll - Arabic CoNLL treebank (coarse, fine and feat fields in one string, delimited by tabs) |
* tagset::bg::conll - Bulgarian CoNLL treebank | - tagset::bg::conll - Bulgarian CoNLL treebank |
* tagset::cs::pdt - Czech positional tags of the Prague Dependency Treebank | - tagset::cs::conll - Czech CoNLL treebank, based on the Prague Dependency Treebank |
* tagset::da::conll - Danish CoNLL treebank | - tagset::cs::pdt - Czech positional tags of the Prague Dependency Treebank |
* tagset::en::conll - English CoNLL treebank (one-to-one mapping to en::penn) | - tagset::da::conll - Danish CoNLL treebank |
* tagset::en::penn - English Penn Treebank | - tagset::de::conll - German CoNLL treebank (one-to-one mapping to de::stts) |
* tagset::sv::conll - Swedish CoNLL treebank (one-to-one mapping to sv::mamba) | - tagset::de::stts - German: Stuttgart-Tübingen Tagset (Tiger treebank) |
* tagset::sv::hajic - Tags output by Swedish tagger by Jan Hajič | - tagset::en::conll - English CoNLL treebank (one-to-one mapping to en::penn) |
* tagset::sv::mamba - Swedish Mamba tags from Talbanken05 (CoNLL treebank) | - tagset::en::penn - English Penn Treebank |
* tagset::sv::svdahybrid - Dan's tagset, aiming at making distribution of tags from sv::hajic and da::conll as close as possible | - tagset::sv::conll - Swedish CoNLL treebank (one-to-one mapping to sv::mamba) |
* tagset::zh::conll - Chinese CoNLL treebank | - tagset::sv::hajic - Tags output by Swedish tagger by Jan Hajič |
| - tagset::sv::mamba - Swedish Mamba tags from Talbanken05 (CoNLL treebank) |
| - tagset::sv::svdahybrid - Dan's tagset, aiming at making distribution of tags from sv::hajic and da::conll as close as possible |
| - tagset::zh::conll - Chinese CoNLL treebank |
| |
=== Directory Structure === | === Directory Structure === |
| |
There is also the driver testing script, ''bin/driver-test.pl''. The distribution may contain some sample conversion scripts as well; however, these depend much more on the file format than on the tagset drivers, and thus you'll probably need to write your own anyway. | There is also the driver testing script, ''bin/driver-test.pl''. The distribution may contain some sample conversion scripts as well; however, these depend much more on the file format than on the tagset drivers, and thus you'll probably need to write your own anyway. |
| |
| |
| |
==== How to use the Interset ==== | ==== How to use the Interset ==== |
| |
You can write your own tag conversion Perl script, and use the Interset driver library. You have to tell Perl where to find the drivers: | You can write your own tag conversion Perl script, and use the Interset driver library. You have to tell Perl where to find the drivers (the following commands work in ''csh''; you have to use different syntax under ''bash'' or in Windows command line): |
| |
<code>setenv PERLLIB /home/zeman/projekty/interset/lib:$PERLLIB | <code>setenv PERLLIB /home/zeman/projekty/interset/lib:$PERLLIB |