[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
external:tectomt:tutorial [2009/01/20 17:57]
popel
external:tectomt:tutorial [2009/01/21 11:26]
kravalova
Line 8: Line 8:
  
 TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces.  TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces. 
 +
 +
  
  
Line 16: Line 18:
   * Your system is Linux   * Your system is Linux
   * Your shell is bash   * Your shell is bash
-  * You have basic experience bash and you can read Perl+  * You have basic experience with bash and you can read basic Perl 
  
  
Line 35: Line 38:
 </code> </code>
  
-  * In ''tectomt/install/'' run ./install.sh:+  * In ''tectomt/install/'' run ''./install.sh'':
  
 <code bash> <code bash>
Line 240: Line 243:
 This block illustrates some of the most common methods for accessing objects: This block illustrates some of the most common methods for accessing objects:
  
-  * ''my @bundles = $document->get_bundles()'' - an array of bundles contained in the document +  * ''my @bundles = $document<nowiki>-></nowiki>get_bundles()'' - an array of bundles contained in the document 
-  * ''my $root_node = $bundle->get_tree($layer_name);'' - the root node of the tree of the given type in the given bundle +  * ''my $root_node = $bundle<nowiki>-></nowiki>get_tree($layer_name);'' - the root node of the tree of the given type in the given bundle 
-  * ''my @children = $node->get_children();'' - array of the node's children +  * ''my @children = $node<nowiki>-></nowiki>get_children();'' - array of the node's children 
-  * ''my @descendants = $node->get_descendants();'' - array of the node's children and their children and children of their children ... +  * ''my @descendants = $node<nowiki>-></nowiki>get_descendants();'' - array of the node's children and their children and children of their children ... 
-  * ''my $parent = $node->get_parent();'' - parent node of the given node, or undef for root +  * ''my $parent = $node<nowiki>-></nowiki>get_parent();'' - parent node of the given node, or undef for root 
-  * ''my $root_node = $node->get_root();'' - the root node of the tree into which the node belongs+  * ''my $root_node = $node<nowiki>-></nowiki>get_root();'' - the root node of the tree into which the node belongs
  
 Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example:  Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: 
-  * ''$node->get_attr($attr_name);'' +  * ''$node<nowiki>-></nowiki>get_attr($attr_name);'' 
-  * ''$node->set_attr($attr_name, $attr_value);''+  * ''$node<nowiki>-></nowiki>set_attr($attr_name, $attr_value);''
  
 Our tutorial block ''Print_node_info.pm'' is ready to use. You only need to add this block to our scenario, e.g. as a new Makefile target: Our tutorial block ''Print_node_info.pm'' is ready to use. You only need to add this block to our scenario, e.g. as a new Makefile target:
Line 281: Line 284:
  
 It is assumed that finite clauses can be translated independently, which would reduce computational complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses. It is assumed that finite clauses can be translated independently, which would reduce computational complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses.
 +
  
  
Line 286: Line 290:
  
 ==== Task ==== ==== Task ====
-A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise.+A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_clause_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise. 
 + 
 + 
  
  
Line 313: Line 320:
 ==== Instructions ==== ==== Instructions ====
  
-There is a block template with hints in ''libs/blocks/Tutorial/Mark_heads.pm''. You should edit the block so that the ouput of this block is the same a-tree, in addition with attribute ''is_head'' attached to each ''a-node''. There is also a printing block ''libs/blocks/Print_finite_clauses.pm'' which will print out the ''a-nodes'' grouped by clauses:+There is a block template with hints in ''libs/blocks/Tutorial/Mark_heads.pm''. You should edit the block so that the output of this block is the same a-tree, in addition with attribute ''is_clause_head'' attached to each ''a-node''. There is also a printing block ''libs/blocks/Print_finite_clauses.pm'' which will print out the ''a-nodes'' grouped by clauses:
  
 <code bash> <code bash>
Line 319: Line 326:
         brunblocks -S -o \         brunblocks -S -o \
                 Tutorial::Mark_heads \                 Tutorial::Mark_heads \
-                Tutorial::Print_finite_clauses -- sample.tmt+                Tutorial::Print_finite_clauses 
 +        -- sample.tmt
 </code> </code>
  
 You are going to need these methods: You are going to need these methods:
  
-  * ''my root = $bundle->get_tree('tree_name')'' +  * ''my $root = $bundle<nowiki>-></nowiki>get_tree('tree_name')'' 
-  * ''my $attr = $node->get_attr('attr_name')'' +  * ''my $attr = $node<nowiki>-></nowiki>get_attr('attr_name')'' 
-  * ''$node->set_attr('attr_name',$attr_value)'' +  * ''$node<nowiki>-></nowiki>set_attr('attr_name',$attr_value)'' 
-  * ''my @eff_children = $node->get_eff_children()''+  * ''my @eff_children = $node<nowiki>-></nowiki>get_eff_children()''
  
-//Note//: ''get_children'' returns topological node children in a tree, while ''get_eff_children'' returns node children in a linguistic sense. Mostly, these do not differ.+//Note//: ''get_children()'' returns topological node children in a tree, while ''get_eff_children()'' returns node children in a linguistic sense. Mostly, these do not differ.
  
  
  
-//Advanced version//: The output of our block might still be incorrect in special cases - we don't solve coordination and subordinate conjunctions.+//Advanced version//: The output of our block might still be incorrect in special cases - we don't solve coordination (see the second sentence in sample.txt) and subordinate conjunctions.
  
  
  
 ===== Your turn: more tasks ===== ===== Your turn: more tasks =====
 +
  
  
Line 354: Line 363:
 **Instructions**:  **Instructions**: 
  
-  * To find an object to a verb, look for objects among effective children of a verb (''$child->get_attr('afun') eq 'Obj' ''). That implies working on analytical layer. +  * To find an object to a verb, look for objects among effective children of a verb (''$child<nowiki>-></nowiki>get_attr('afun') eq 'Obj' ''). That implies working on analytical layer. 
-  * Once you have node ''$object'' and node ''$verb'', use method TODO  +  * For debugging, a method returning surface word order of a node is useful: ''$node<nowiki>-></nowiki>get_attr('ord')''. It can be used to print out nodes sorted by attribute ''ord''. 
-  * For debugging, a method returning word order of a node is useful: ''$node->get_attr('ord')''. It can be used to print out nodes sorted by attribute ''ord''.+  * Once you have node ''$object'' and node ''$verb'', use method ''$object<nowiki>-></nowiki>shift_before_node($verb)''. This method takes the whole subtree under node ''$object'' and counts the attributes ''ord'' (surface word order) so that all nodes in subtree under ''$object'' have smaller ''ord'' than ''$verb''. That is, the method rearranges the surface word order from VO to OV. 
 + 
 + 
 + 
 + 
 + 
 + 
  
  
Line 378: Line 394:
  
 ==== Prepositions ==== ==== Prepositions ====
 +
 +{{ external:tectomt:preps.png?200x80|}}
  
 **Motivation**: In dependency approach a question "where to hang prepositions" arises. In praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier. **Motivation**: In dependency approach a question "where to hang prepositions" arises. In praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier.
- 
-TODO obrazek 
  
 **Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. **Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child.
Line 388: Line 404:
  
 You are going to need these new methods: You are going to need these new methods:
-  * ''my @children = $node->get_children'' +  * ''my @children = $node<nowiki>-></nowiki>get_children'' 
-  * ''my $parent = $node->get_parent'' +  * ''my $parent = $node<nowiki>-></nowiki>get_parent'' 
-  * ''$node->set_parent($parent)''+  * ''$node<nowiki>-></nowiki>set_parent($parent)''
  
 //Hint//:  //Hint//: 
-  * On analytical layer, you can use this test to recognize prepositions: ''$node->get_attr('afun') eq 'AuxP' '' +  * On analytical layer, you can use this test to recognize prepositions: ''$node<nowiki>-></nowiki>get_attr('afun') eq 'AuxP' '' 
   * You can use block template in ''libs/blocks/BlockTemplate.pm''. To see the results, you can again use TrEd (''tmttred sample.tmt'')   * You can use block template in ''libs/blocks/BlockTemplate.pm''. To see the results, you can again use TrEd (''tmttred sample.tmt'')
  
  
 //Advanced version//: What happens in case of multiword prepositions? For example, ''because of'', ''instead of''. Can you handle it? //Advanced version//: What happens in case of multiword prepositions? For example, ''because of'', ''instead of''. Can you handle it?
- 
- 
- 
- 
- 
  
  

[ Back to the navigation ] [ Back to the content ]