Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
external:tectomt:tutorial [2009/01/20 15:25] kravalova |
external:tectomt:tutorial [2009/01/20 17:57] popel |
||
---|---|---|---|
Line 17: | Line 17: | ||
* Your shell is bash | * Your shell is bash | ||
* You have basic experience bash and you can read Perl | * You have basic experience bash and you can read Perl | ||
+ | |||
Line 27: | Line 28: | ||
==== Installation and setup ==== | ==== Installation and setup ==== | ||
- | * Checkout SVN repository. If you are running this installation in computer lab in Prague, you have checkout the repository into directory /home/BIG (because data quotas don't apply here): | + | * Checkout SVN repository. If you are running this installation in a computer lab in Prague, you have to checkout the repository into directory |
<code bash> | <code bash> | ||
Line 67: | Line 68: | ||
===== TectoMT Architecture ===== | ===== TectoMT Architecture ===== | ||
+ | |||
Line 74: | Line 76: | ||
In TectoMT, there is the following hierarchy of processing units (software components that process data): | In TectoMT, there is the following hierarchy of processing units (software components that process data): | ||
- | * The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, | + | * The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, |
* To solve a more complex task, selected blocks can be chained into a block sequence, called also a scenario. Technically, | * To solve a more complex task, selected blocks can be chained into a block sequence, called also a scenario. Technically, | ||
- | * The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), | + | * The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), |
This tutorial itself has its blocks in '' | This tutorial itself has its blocks in '' | ||
+ | |||
+ | |||
Line 90: | Line 94: | ||
TectoMT blocks repository is saved in '' | TectoMT blocks repository is saved in '' | ||
- | Thus, the set of TectoMT layers is Cartesian product {S,T} x {English, | + | Thus, the set of TectoMT layers is a Cartesian product {S,T} x {English, |
* {S,T} distinguishes whether the data was created by analysis or transfer/ | * {S,T} distinguishes whether the data was created by analysis or transfer/ | ||
Line 96: | Line 100: | ||
* {W, | * {W, | ||
- | // | + | // |
+ | |||
+ | There are also other directories for other purpose blocks, for example blocks which only print out some information go to '' | ||
- | There are also other directories for other purpose blocks, for example blocks which only print out some information go to '' | ||
Line 107: | Line 112: | ||
===== First application ===== | ===== First application ===== | ||
- | Once you have TectoMT installed on your machine, you can find this tutorial in '' | + | Once you have TectoMT installed on your machine, you can find this tutorial in '' |
- | Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular '' | + | Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular '' |
We can run the application: | We can run the application: | ||
Line 117: | Line 122: | ||
</ | </ | ||
- | Our plain text data '' | + | Our plain text data '' |
- | * One physical file corresponds to one document. | + | * One physical |
* A document consists of a sequence of bundles (''< | * A document consists of a sequence of bundles (''< | ||
* Each bundle contains tree shaped sentence representations on various linguistic layers. In our example '' | * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example '' | ||
* Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node. | * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node. | ||
+ | |||
Line 144: | Line 150: | ||
===== Changing the scenario ===== | ===== Changing the scenario ===== | ||
- | We'll now add syntax analysis to our scenario by adding four more blocks. Instead of | + | We'll now add a syntax analysis |
<code bash> | <code bash> | ||
Line 152: | Line 158: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
- | SEnglishW_to_SEnglishM:: | + | SEnglishW_to_SEnglishM:: |
+ | | ||
</ | </ | ||
Line 163: | Line 170: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
- | SEnglishW_to_SEnglishM:: | + | SEnglishW_to_SEnglishM:: |
SEnglishM_to_SEnglishA:: | SEnglishM_to_SEnglishA:: | ||
SEnglishM_to_SEnglishA:: | SEnglishM_to_SEnglishA:: | ||
- | SEnglishM_to_SEnglishA:: | + | SEnglishM_to_SEnglishA:: |
+ | | ||
</ | </ | ||
Line 184: | Line 192: | ||
tmttred sample.tmt | tmttred sample.tmt | ||
</ | </ | ||
+ | |||
Line 231: | Line 240: | ||
This block illustrates some of the most common methods for accessing objects: | This block illustrates some of the most common methods for accessing objects: | ||
- | * '' | + | * '' |
* '' | * '' | ||
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | * '' |
Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: | Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: | ||
Line 242: | Line 251: | ||
* '' | * '' | ||
- | Our tutorial block '' | + | Our tutorial block '' |
<code bash> | <code bash> | ||
Line 255: | Line 264: | ||
</ | </ | ||
- | Try to change the block so that it prints out the information only for verbs. (You need to look at attribute '' | + | Try to change the block so that it prints out the information only for verbs. (You need to look at an attribute '' |
Line 332: | Line 341: | ||
- | ==== SVO typology ==== | ||
- | **Motivation**: | + | |
+ | |||
+ | |||
+ | |||
+ | ==== SVO to SOV ==== | ||
+ | |||
+ | **Motivation**: | ||
+ | |||
+ | **Task**: Change the word order from SVO to SOV. | ||
+ | |||
+ | **Instructions**: | ||
+ | |||
+ | * To find an object to a verb, look for objects among effective children of a verb ('' | ||
+ | * Once you have node '' | ||
+ | * For debugging, a method returning word order of a node is useful: '' | ||
+ | |||
+ | |||
+ | |||
+ | |||
Line 352: | Line 379: | ||
==== Prepositions ==== | ==== Prepositions ==== | ||
- | In dependency approach a question "where to hang prepositions" | + | **Motivation**: |
TODO obrazek | TODO obrazek | ||
- | The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. | + | **Task**: |
+ | |||
+ | ** Instructions**: | ||
You are going to need these new methods: | You are going to need these new methods: | ||
Line 364: | Line 393: | ||
// | // | ||
- | * On analytical layer, you can use this test to recognize prepositions: | + | * On analytical layer, you can use this test to recognize prepositions: |
* You can use block template in '' | * You can use block template in '' | ||