Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
external:tectomt:tutorial [2009/01/22 11:50] kravalova |
external:tectomt:tutorial [2009/01/22 12:50] zabokrtsky |
* Your shell is bash | * Your shell is bash |
* You have basic experience with bash and can read basic Perl | * You have basic experience with bash and can read basic Perl |
| |
| |
| |
| |
==== Installation and setup ==== | ==== Installation and setup ==== |
| |
* Checkout SVN repository. If you are running this installation in computer lab in Prague, you have to checkout the repository into directory ''/home/BIG'' (because bigger disk quota applies here): | * Checkout SVN repository. If you are running this installation in computer lab in Prague, you have to checkout the repository into directory ''~/BIG'' (because bigger disk quota applies here): |
| |
<code bash> | <code bash> |
</code> | </code> |
| |
* In your ''.bashrc'' file, add line (or source this file every time before experimenting with TectoMT): | * In your ''.bashrc'' file, add line (or source the specified file every time before experimenting with TectoMT): |
| |
<code bash> | <code bash> |
| |
===== TectoMT Architecture ===== | ===== TectoMT Architecture ===== |
| |
| |
| |
| |
* The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, tagging, parsing). Technically, blocks are Perl classes inherited from ''TectoMT::Block'', each saved in a separate file. The blocks repository is in ''libs/blocks/''. | * The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, tagging, parsing). Technically, blocks are Perl classes inherited from ''TectoMT::Block'', each saved in a separate file. The blocks repository is in ''libs/blocks/''. |
* To solve a more complex task, selected blocks can be chained into a block sequence, called also a scenario. Technically, scenarios are instances of ''TectoMT::Scenario'' class, but in some situations (e.g. on the command line) it is sufficient to specify the scenario simply by listing block names separated with spaces. | * To solve a more complex task, selected blocks can be chained into a block sequence, called also a scenario. Technically, scenarios are instances of ''TectoMT::Scenario'' class, but in some situations (e.g. on the command line) it is sufficient to specify the scenario simply by listing block names separated by spaces. |
* The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), or 'only' NLP-related experiments. Technically, applications are often implemented as ''Makefiles'', which only glue the components existing in TectoMT. Some demo applications can be found in ''applications''. | * The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), or 'only' NLP-related experiments. Technically, applications are often implemented as ''Makefiles'', which only glue the components existing in TectoMT. Some demo applications can be found in ''applications''. |
| |
This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''. | This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''. |
| |
| |
| |
Blocks in block repository ''libs/blocks'' are located in directories indicating their purpose in machine translation. | Blocks in block repository ''libs/blocks'' are located in directories indicating their purpose in machine translation. |
| |
//Example//: Block adding Czech morphological tags (pos, case, gender, etc.) can be found in ''libs/blocks/SCzechW_to_SCzechM/Simple_tagger.pm''. | //Example//: A block adding Czech morphological tags (pos, case, gender, etc.) can be found in ''libs/blocks/SCzechW_to_SCzechM/Simple_tagger.pm''. |
| |
There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/blocks/Tutorial/''. | There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/blocks/Tutorial/''. |
| |
===== Advanced block: finite clauses ===== | ===== Advanced block: finite clauses ===== |
| |
| |
| |
==== Motivation ==== | ==== Motivation ==== |
| |
It is assumed that finite clauses can be translated independently, which would reduce computational complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses. | It is assumed that finite clauses can be translated independently, which would reduce combinatorial complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses. |
| |
| |
| |
===== Your turn: more tasks ===== | ===== Your turn: more tasks ===== |
| |
| |
| |
| |
==== SVO to SOV ==== | ==== SVO to SOV ==== |
| |
**Motivation**: During translation from an SVO based language (e.g. English) to an SOV based language (e.g. Korean) we might need to change the word order from SVO to SOV. | **Motivation**: During translation from an SVO based language (e.g. English) to an SOV based language (e.g. Korean), we might need to change the word order from SVO to SOV. |
| |
**Task**: Change the word order from SVO to SOV. | **Task**: Change the word order from SVO to SOV. |
| |
* You can use block template in ''libs/blocks/BlockTemplate.pm''. | * You can use block template in ''libs/blocks/BlockTemplate.pm''. |
* To find an object to a verb, look for objects among effective children of a verb (''$child<nowiki>-></nowiki>get_attr('afun') eq 'Obj' ''). That implies working on analytical layer. | * To find an object of a verb, look for objects among effective children of a verb (''$child<nowiki>-></nowiki>get_attr('afun') eq 'Obj' ''). That implies working on analytical layer. |
* For debugging, a method returning surface word order of a node is useful: ''$node<nowiki>-></nowiki>get_attr('ord')''. It can be used to print out nodes sorted by attribute ''ord''. | * For debugging, a method returning surface word order of a node is useful: ''$node<nowiki>-></nowiki>get_attr('ord')''. It can be used to print out nodes sorted by attribute ''ord''. |
* Once you have node ''$object'' and node ''$verb'', use method ''$object<nowiki>-></nowiki>shift_before_node($verb)''. This method takes the whole subtree under node ''$object'' and re-counts the attributes ''ord'' (surface word order) so that all nodes in subtree under ''$object'' have smaller ''ord'' than ''$verb''. That is, the method rearranges the surface word order from VO to OV. | * Once you have the node ''$object'' and the node ''$verb'', use the method ''$object<nowiki>-></nowiki>shift_before_node($verb)''. This method takes the whole subtree under the node ''$object'' and recalculates the attributes ''ord'' (surface word order) so that all the nodes in the subtree under ''$object'' have a smaller ''ord'' than ''$verb''. That is, the method rearranges the surface word order from VO to OV. |
| |
**Advanced version**: This solution shifts object (or more objects) of a verb just in front of that verb node. So f.e.: //Mr. Brown has urged MPs.// changes to: //Mr. Brown has MPs urged.// You can try to change this solution, so the final sentence would be: //Mr. Brown MPs has urged.// You may need a method ''$node->shift_after_subtree($root_of_that_subtree)''. Subjects should have attribute '''afun' eq 'Sb'''. | **Advanced version**: This solution shifts object (or more objects) of a verb just in front of that verb node. So f.e.: //Mr. Brown has urged MPs.// changes to: //Mr. Brown has MPs urged.// You can try to change this solution, so the final sentence would be: //Mr. Brown MPs has urged.// You may need a method ''$node->shift_after_subtree($root_of_that_subtree)''. Subjects should have attribute '''afun' eq 'Sb'''. |
| |
| |
| |
{{ external:tectomt:preps.png?200x80|Prepositions example}} | {{ external:tectomt:preps.png?200x80|Prepositions example}} |
| |
**Motivation**: In dependency approach a question "where to hang prepositions" arises. In praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier. | **Motivation**: In dependency approach the question "where to hang prepositions" arises. In the praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier. |
| |
**Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. | **Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. |