[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
external:tectomt:tutorial [2009/01/20 17:10]
popel
external:tectomt:tutorial [2009/01/21 10:47]
kravalova
Line 128: Line 128:
   * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example ''sample.tmt'' we have morphological tree (''SEnglishM'') in each bundle. Later on, also an analytical layer (''SEnglishA'') will appear in each bundle as we proceed with our analysis.    * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example ''sample.tmt'' we have morphological tree (''SEnglishM'') in each bundle. Later on, also an analytical layer (''SEnglishA'') will appear in each bundle as we proceed with our analysis. 
   * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node.   * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node.
 +
  
  
Line 149: Line 150:
 ===== Changing the scenario ===== ===== Changing the scenario =====
  
-We'll now add syntax analysis to our scenario by adding four more blocks. Instead of +We'll now add syntax analysis (dependency parsing) to our scenario by adding four more blocks. Instead of 
  
 <code bash> <code bash>
Line 157: Line 158:
                 SEnglishW_to_SEnglishM::Penn_style_tokenization \                 SEnglishW_to_SEnglishM::Penn_style_tokenization \
                 SEnglishW_to_SEnglishM::TagTnT \                 SEnglishW_to_SEnglishM::TagTnT \
-                SEnglishW_to_SEnglishM::Lemmatize_mtree -- sample.tmt+                SEnglishW_to_SEnglishM::Lemmatize_mtree 
 +        -- sample.tmt
 </code> </code>
  
Line 168: Line 170:
                 SEnglishW_to_SEnglishM::Penn_style_tokenization \                 SEnglishW_to_SEnglishM::Penn_style_tokenization \
                 SEnglishW_to_SEnglishM::TagTnT \                 SEnglishW_to_SEnglishM::TagTnT \
-                SEnglishW_to_SEnglishM::Lemmatize_mtree  \+                SEnglishW_to_SEnglishM::Lemmatize_mtree \
                 SEnglishM_to_SEnglishA::McD_parser_local \                 SEnglishM_to_SEnglishA::McD_parser_local \
                 SEnglishM_to_SEnglishA::Fix_McD_Tree \                 SEnglishM_to_SEnglishA::Fix_McD_Tree \
-                SEnglishM_to_SEnglishA::Fill_afun_after_McD -- sample.tmt+                SEnglishM_to_SEnglishA::Fill_afun_after_McD 
 +        -- sample.tmt
 </code> </code>
  
Line 189: Line 192:
 tmttred sample.tmt tmttred sample.tmt
 </code> </code>
 +
  
  
Line 236: Line 240:
 This block illustrates some of the most common methods for accessing objects: This block illustrates some of the most common methods for accessing objects:
  
-  * ''my @bundles = $document->get_bundles'' - an array of bundles contained in the document+  * ''my @bundles = $document->get_bundles()'' - an array of bundles contained in the document
   * ''my $root_node = $bundle->get_tree($layer_name);'' - the root node of the tree of the given type in the given bundle   * ''my $root_node = $bundle->get_tree($layer_name);'' - the root node of the tree of the given type in the given bundle
-  * ''my @children = $node->get_children;'' - array of the node's children +  * ''my @children = $node->get_children();'' - array of the node's children 
-  * ''my @descendants = $node->get_descendants;'' - array of the node's children and their children and children of their children ... +  * ''my @descendants = $node->get_descendants();'' - array of the node's children and their children and children of their children ... 
-  * ''my $parent = $node->get_parent;'' - parent node of the given node, or undef for root +  * ''my $parent = $node->get_parent();'' - parent node of the given node, or undef for root 
-  * ''my $root_node = $node->get_root;'' - the root node of the tree into which the node belongs+  * ''my $root_node = $node->get_root();'' - the root node of the tree into which the node belongs
  
 Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example:  Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: 
Line 247: Line 251:
   * ''$node->set_attr($attr_name, $attr_value);''   * ''$node->set_attr($attr_name, $attr_value);''
  
-Our tutorial block ''Print_node_info.pm'' is ready to use. You only need to add this block to our scenario:+Our tutorial block ''Print_node_info.pm'' is ready to use. You only need to add this block to our scenario, e.g. as a new Makefile target:
  
 <code bash> <code bash>
Line 260: Line 264:
 </code> </code>
  
-Try to change the block so that it prints out the information only for verbs. (You need to look at attribute ''tag'' at the ''m'' level). The tagset used is Penn Treebank Tagset.+Try to change the block so that it prints out the information only for verbs. (You need to look at an attribute ''tag'' at the ''m'' level). The tagset used is Penn Treebank Tagset.
  
  
Line 277: Line 281:
  
 It is assumed that finite clauses can be translated independently, which would reduce computational complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses. It is assumed that finite clauses can be translated independently, which would reduce computational complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses.
 +
  
  
Line 282: Line 287:
  
 ==== Task ==== ==== Task ====
-A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise.+A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_clause_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise. 
 + 
  
  
Line 309: Line 316:
 ==== Instructions ==== ==== Instructions ====
  
-There is a block template with hints in ''libs/blocks/Tutorial/Mark_heads.pm''. You should edit the block so that the ouput of this block is the same a-tree, in addition with attribute ''is_head'' attached to each ''a-node''. There is also a printing block ''libs/blocks/Print_finite_clauses.pm'' which will print out the ''a-nodes'' grouped by clauses:+There is a block template with hints in ''libs/blocks/Tutorial/Mark_heads.pm''. You should edit the block so that the output of this block is the same a-tree, in addition with attribute ''is_clause_head'' attached to each ''a-node''. There is also a printing block ''libs/blocks/Print_finite_clauses.pm'' which will print out the ''a-nodes'' grouped by clauses:
  
 <code bash> <code bash>
Line 315: Line 322:
         brunblocks -S -o \         brunblocks -S -o \
                 Tutorial::Mark_heads \                 Tutorial::Mark_heads \
-                Tutorial::Print_finite_clauses -- sample.tmt+                Tutorial::Print_finite_clauses 
 +        -- sample.tmt
 </code> </code>
  
 You are going to need these methods: You are going to need these methods:
  
-  * ''my root = $bundle->get_tree('tree_name')''+  * ''my $root = $bundle->get_tree('tree_name')''
   * ''my $attr = $node->get_attr('attr_name')''   * ''my $attr = $node->get_attr('attr_name')''
   * ''$node->set_attr('attr_name',$attr_value)''   * ''$node->set_attr('attr_name',$attr_value)''
   * ''my @eff_children = $node->get_eff_children()''   * ''my @eff_children = $node->get_eff_children()''
  
-//Note//: ''get_children'' returns topological node children in a tree, while ''get_eff_children'' returns node children in a linguistic sense. Mostly, these do not differ.+//Note//: ''get_children()'' returns topological node children in a tree, while ''get_eff_children()'' returns node children in a linguistic sense. Mostly, these do not differ.
  
  
  
-//Advanced version//: The output of our block might still be incorrect in special cases - we don't solve coordination and subordinate conjunctions.+//Advanced version//: The output of our block might still be incorrect in special cases - we don't solve coordination (see the second sentence in sample.txt) and subordinate conjunctions.
  
  
Line 353: Line 361:
   * Once you have node ''$object'' and node ''$verb'', use method TODO    * Once you have node ''$object'' and node ''$verb'', use method TODO 
   * For debugging, a method returning word order of a node is useful: ''$node->get_attr('ord')''. It can be used to print out nodes sorted by attribute ''ord''.   * For debugging, a method returning word order of a node is useful: ''$node->get_attr('ord')''. It can be used to print out nodes sorted by attribute ''ord''.
 +
 +
 +
 +
 +
  
  
Line 374: Line 387:
  
 ==== Prepositions ==== ==== Prepositions ====
 +
 +{{ external:tectomt:preps.png|}}
  
 **Motivation**: In dependency approach a question "where to hang prepositions" arises. In praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier. **Motivation**: In dependency approach a question "where to hang prepositions" arises. In praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier.
- 
-TODO obrazek 
  
 **Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. **Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child.
Line 394: Line 407:
  
 //Advanced version//: What happens in case of multiword prepositions? For example, ''because of'', ''instead of''. Can you handle it? //Advanced version//: What happens in case of multiword prepositions? For example, ''because of'', ''instead of''. Can you handle it?
- 
- 
- 
- 
- 
  
  

[ Back to the navigation ] [ Back to the content ]