Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
external:tectomt:tutorial [2009/01/21 11:36] kravalova |
external:tectomt:tutorial [2009/01/21 12:16] kravalova |
| |
This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''. | This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''. |
| |
| |
| |
| |
| |
| |
{{ external:tectomt:pyramid.gif?300x190|MT pyramid in terms of PDT layers}} | {{ external:tectomt:pyramid.gif?300x190|MT pyramid in terms of PDT layers}} |
| |
TectoMT blocks repository is saved in ''libs/blocks/''. In correspondence with ..., the blocks are located in directories describing their purpose. | The notion of 'layer' has a combinatorial nature in TectoMT. It corresponds not only the layer of language description as used e.g. in the Prague Dependency Treebank, but it is also specific for a given language (e.g., possible values of morphological tags are typically different for different languages) and even for how the data on the given layer were created (whether by analysis from the lower layer or by synthesis/transfer). |
| |
Thus, the set of TectoMT layers is a Cartesian product {S,T} x {English,Czech,...} x {W,M,P,A,T}, in which: | Thus, the set of TectoMT layers is a Cartesian product {S,T} x {English,Czech,...} x {W,M,P,A,T}, in which: |
* {English,Czech...} represents the language in question | * {English,Czech...} represents the language in question |
* {W,M,P,A,T...} represents the layer of description in terms of PDT 2.0 (W - word layer, M - morphological layer, A - analytical layer, T - tectogrammatical layer) or extensions (P - phrase-structure layer). | * {W,M,P,A,T...} represents the layer of description in terms of PDT 2.0 (W - word layer, M - morphological layer, A - analytical layer, T - tectogrammatical layer) or extensions (P - phrase-structure layer). |
| |
| Blocks in block repository ''libs/blocks'' are located in directories indicating their purpose in machine translation. |
| |
//Example//: Block adding Czech morphological tags (pos, case, gender, etc.) can be found in ''libs/blocks/SCzechW_to_SCzechM/Simple_tagger.pm''. | //Example//: Block adding Czech morphological tags (pos, case, gender, etc.) can be found in ''libs/blocks/SCzechW_to_SCzechM/Simple_tagger.pm''. |
| |
There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/blocks/Tutorial''. | There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/blocks/Tutorial''. |
| |
| |
| |
| |
</code> | </code> |
| |
Our plain text data ''sample.txt'' have been transformed into ''tmt'', an internal TectoMT format, and saved into ''sample.tmt''. Then, all four blocks have been loaded and our data has been processed. We can now examine ''sample.tmt'' using a regular text editor. We'll now stop and describe data structure in TectoMT. | Our plain text data ''sample.txt'' have been transformed into ''tmt'', an internal TectoMT format, and saved into ''sample.tmt''. Then, all four blocks have been loaded and our data has been processed. We can now examine ''sample.tmt'' with a text editor (vi, emacs, etc). |
| |
* One physical ''tmt'' file corresponds to one document. | * One physical ''tmt'' file corresponds to one document. |
* Each bundle contains tree shaped sentence representations on various linguistic layers. In our example ''sample.tmt'' we have morphological tree (''SEnglishM'') in each bundle. Later on, also an analytical layer (''SEnglishA'') will appear in each bundle as we proceed with our analysis. | * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example ''sample.tmt'' we have morphological tree (''SEnglishM'') in each bundle. Later on, also an analytical layer (''SEnglishA'') will appear in each bundle as we proceed with our analysis. |
* Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node. | * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node. |
| |
| |
| |
===== Changing the scenario ===== | ===== Changing the scenario ===== |
| |
We'll now add a syntax analysis (dependency parsing) to our scenario by adding four more blocks. Instead of | We'll now add a syntax analysis (dependency parsing) to our scenario by adding three more blocks. Instead of |
| |
<code bash> | <code bash> |
tmttred sample.tmt | tmttred sample.tmt |
</code> | </code> |
| |
| |
| |
| |
==== Task ==== | ==== Task ==== |
A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_clause_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise. | A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_clause_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise. |
| |
| |
| |