Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
user:zeman:interset:how-to-use [2008/03/13 17:46] zeman Oprava. |
user:zeman:interset:how-to-use [2008/03/14 09:56] zeman Hyperlink to ISO 639-1. |
===== Manual ===== | ===== Manual ===== |
| |
==== Installation ==== | |
| |
If you exist on the ÚFAL network, you can use directly Dan's version here. Otherwise, you need to [[mailto:zeman@ufal.mff.cuni.cz|ask Dan]] for a zipped package of the currently existing drivers. (I intend to maintain it here for download some time later.) Unzip it to a convenient place; below, we assume it is in ''/home/zeman/lib/perl''. | ==== Installation ==== |
| |
**Note:** I decided to put the whole thing under version control. At the same time, I moved it from my lib folder to my project folder. So the current ÚFAL location is ''/home/zeman/projekty/interset/lib''. | If you exist on the ÚFAL network, you can use directly Dan's version here. Otherwise, you need to [[mailto:zeman@ufal.mff.cuni.cz|ask Dan]] for a zipped package of the currently existing drivers. (I intend to maintain it here for download some time later.) Unzip it to a convenient place; below, we assume it is in ''/home/zeman/interset''. |
| |
**Contributions welcome!** If you write your own driver, please share it with others! If you send it to me, I will add it to the package for download here. | **Contributions welcome!** If you write your own driver, please share it with others! If you send it to me, I will add it to the package for download here. |
* tagset::cs::pdt - Czech positional tags of the Prague Dependency Treebank | * tagset::cs::pdt - Czech positional tags of the Prague Dependency Treebank |
* tagset::da::conll - Danish CoNLL treebank | * tagset::da::conll - Danish CoNLL treebank |
* tagset::en::conll - English CoNLL treebank | * tagset::en::conll - English CoNLL treebank (one-to-one mapping to en::penn) |
* tagset::en::penn - English Penn Treebank | * tagset::en::penn - English Penn Treebank |
| * tagset::sv::conll - Swedish CoNLL treebank (one-to-one mapping to sv::mamba) |
* tagset::sv::hajic - Tags output by Swedish tagger by Jan Hajič | * tagset::sv::hajic - Tags output by Swedish tagger by Jan Hajič |
* tagset::sv::mamba - Swedish Mamba tags from Talbanken05 (CoNLL treebank) | * tagset::sv::mamba - Swedish Mamba tags from Talbanken05 (CoNLL treebank) |
* tagset::sv::svdahybrid - Dan's tagset, aiming at making distribution of tags from sv::hajic and da::conll as close as possible | * tagset::sv::svdahybrid - Dan's tagset, aiming at making distribution of tags from sv::hajic and da::conll as close as possible |
| * tagset::zh::conll - Chinese CoNLL treebank |
| |
=== Directory Structure === | === Directory Structure === |
| |
The drivers are Perl modules and must be somewhere under ''$PERLLIB'' (''@INC''). Their root folder is ''tagset'' (this is what separates the tag set drivers from other Perl libraries). Subfolders of ''tagset'' are two-letter codes of languages (ISO). Some tagsets may be designed for more than one language but most are language-specific. PM files in language folders are drivers. Drivers are called xxx.pm, where xxx is the code name of the tagset. The driver xxx.pm for language ll should be accessible from Perl via | The drivers are Perl modules and must be somewhere under ''$PERLLIB'' (''@INC''). Their root folder is ''tagset'' (this is what separates the tag set drivers from other Perl libraries). Subfolders of ''tagset'' are two-letter codes of languages ([[http://en.wikipedia.org/wiki/ISO_639-1|ISO 639-1]]). Some tagsets may be designed for more than one language but most are language-specific. PM files in language folders are drivers. Drivers are called xxx.pm, where xxx is the code name of the tagset. The driver xxx.pm for language ll should be accessible from Perl via |
| |
<code perl> | <code perl> |