[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
external:tectomt:tutorial [2009/01/21 11:09]
kravalova
external:tectomt:tutorial [2009/01/21 11:59]
kravalova
Line 8: Line 8:
  
 TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces.  TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces. 
 +
 +
 +
  
  
Line 16: Line 19:
   * Your system is Linux   * Your system is Linux
   * Your shell is bash   * Your shell is bash
-  * You have basic experience bash and you can read Perl+  * You have basic experience with bash and can read basic Perl 
  
  
Line 29: Line 33:
 ==== Installation and setup ==== ==== Installation and setup ====
  
-  * Checkout SVN repository. If you are running this installation in computer lab in Prague, you have to checkout the repository into directory ''/home/BIG'' (because data quotas don't apply here):+  * Checkout SVN repository. If you are running this installation in computer lab in Prague, you have to checkout the repository into directory ''/home/BIG'' (because data quotas don't apply here):
  
 <code bash> <code bash>
Line 82: Line 86:
  
 This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''. This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''.
 +
 +
 +
  
  
Line 93: Line 100:
 {{ external:tectomt:pyramid.gif?300x190|MT pyramid in terms of PDT layers}} {{ external:tectomt:pyramid.gif?300x190|MT pyramid in terms of PDT layers}}
  
-TectoMT blocks repository is saved in ''libs/blocks/''In correspondence with ..., the blocks are located in directories describing their purpose+The notion of 'layerhas a combinatorial nature in TectoMTIt corresponds not only the layer of language description as used e.gin the Prague Dependency Treebank, but it is also specific for a given language (e.g., possible values of morphological tags are typically different for different languages) and even for how the data on the given layer were created (whether by analysis from the lower layer or by synthesis/transfer).
  
 Thus, the set of TectoMT layers is a Cartesian product {S,T} x {English,Czech,...} x {W,M,P,A,T}, in which: Thus, the set of TectoMT layers is a Cartesian product {S,T} x {English,Czech,...} x {W,M,P,A,T}, in which:
Line 100: Line 107:
   * {English,Czech...} represents the language in question   * {English,Czech...} represents the language in question
   * {W,M,P,A,T...} represents the layer of description in terms of PDT 2.0 (W - word layer, M - morphological layer, A - analytical layer, T - tectogrammatical layer) or extensions (P - phrase-structure layer).   * {W,M,P,A,T...} represents the layer of description in terms of PDT 2.0 (W - word layer, M - morphological layer, A - analytical layer, T - tectogrammatical layer) or extensions (P - phrase-structure layer).
 +
 +Blocks in block repository ''libs/blocks'' are located in directories indicating their purpose in machine translation.
  
 //Example//: Block adding Czech morphological tags (pos, case, gender, etc.) can be found in ''libs/blocks/SCzechW_to_SCzechM/Simple_tagger.pm''. //Example//: Block adding Czech morphological tags (pos, case, gender, etc.) can be found in ''libs/blocks/SCzechW_to_SCzechM/Simple_tagger.pm''.
Line 344: Line 353:
  
 ===== Your turn: more tasks ===== ===== Your turn: more tasks =====
 +
 +
  
  
Line 355: Line 366:
 ==== SVO to SOV ==== ==== SVO to SOV ====
  
-**Motivation**: During translation from an SVO based language (English) to an SOV based language (Korean) we might need to change the word order from SVO to SOV. +**Motivation**: During translation from an SVO based language (e.g. English) to an SOV based language (e.g. Korean) we might need to change the word order from SVO to SOV. 
  
 **Task**: Change the word order from SVO to SOV. **Task**: Change the word order from SVO to SOV.
Line 363: Line 374:
   * To find an object to a verb, look for objects among effective children of a verb (''$child<nowiki>-></nowiki>get_attr('afun') eq 'Obj' ''). That implies working on analytical layer.   * To find an object to a verb, look for objects among effective children of a verb (''$child<nowiki>-></nowiki>get_attr('afun') eq 'Obj' ''). That implies working on analytical layer.
   * For debugging, a method returning surface word order of a node is useful: ''$node<nowiki>-></nowiki>get_attr('ord')''. It can be used to print out nodes sorted by attribute ''ord''.   * For debugging, a method returning surface word order of a node is useful: ''$node<nowiki>-></nowiki>get_attr('ord')''. It can be used to print out nodes sorted by attribute ''ord''.
-  * Once you have node ''$object'' and node ''$verb'', use method ''$object<nowiki>-></nowiki>shift_before_node($verb)''. This method takes the whole subtree under node ''$object'' and counts the attributes ''ord'' (surface word order) so that all nodes in subtree under ''$object'' have smaller ''ord'' than ''$verb''. That is, the method rearranges the surface word order from VO to OV.+  * Once you have node ''$object'' and node ''$verb'', use method ''$object<nowiki>-></nowiki>shift_before_node($verb)''. This method takes the whole subtree under node ''$object'' and re-counts the attributes ''ord'' (surface word order) so that all nodes in subtree under ''$object'' have smaller ''ord'' than ''$verb''. That is, the method rearranges the surface word order from VO to OV. 
 + 
 + 
  
  
Line 393: Line 407:
 ==== Prepositions ==== ==== Prepositions ====
  
-{{ external:tectomt:preps.png?200x80|}}+{{ external:tectomt:preps.png?200x80|Prepositions example}}
  
 **Motivation**: In dependency approach a question "where to hang prepositions" arises. In praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier. **Motivation**: In dependency approach a question "where to hang prepositions" arises. In praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier.
Line 402: Line 416:
  
 You are going to need these new methods: You are going to need these new methods:
-  * ''my @children = $node<nowiki>-></nowiki>get_children'' +  * ''my @children = $node<nowiki>-></nowiki>get_children()'' 
-  * ''my $parent = $node<nowiki>-></nowiki>get_parent''+  * ''my $parent = $node<nowiki>-></nowiki>get_parent()''
   * ''$node<nowiki>-></nowiki>set_parent($parent)''   * ''$node<nowiki>-></nowiki>set_parent($parent)''
  
 //Hint//:  //Hint//: 
   * On analytical layer, you can use this test to recognize prepositions: ''$node<nowiki>-></nowiki>get_attr('afun') eq 'AuxP' ''    * On analytical layer, you can use this test to recognize prepositions: ''$node<nowiki>-></nowiki>get_attr('afun') eq 'AuxP' '' 
-  * You can use block template in ''libs/blocks/BlockTemplate.pm''. To see the results, you can again use TrEd (''tmttred sample.tmt'')+  * You can use block template in ''libs/blocks/BlockTemplate.pm'' 
 +  * To see the results, you can again use TrEd (''tmttred sample.tmt'')
  
  

[ Back to the navigation ] [ Back to the content ]