[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
external:tectomt:tutorial [2009/01/20 16:58]
popel
external:tectomt:tutorial [2009/01/20 18:10]
popel
Line 81: Line 81:
  
 This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''. This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''.
 +
  
  
Line 101: Line 102:
 //Example//: Block adding Czech morphological tags (pos, case, gender, etc.) can be found in ''libs/blocks/SCzechW_to_SCzechM/Simple_tagger.pm''. //Example//: Block adding Czech morphological tags (pos, case, gender, etc.) can be found in ''libs/blocks/SCzechW_to_SCzechM/Simple_tagger.pm''.
  
-There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/Tutorial''.+There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/blocks/Tutorial''. 
  
  
Line 110: Line 112:
 ===== First application ===== ===== First application =====
  
-Once you have TectoMT installed on your machine, you can find this tutorial in ''applications/tutorial/''. After you cd in to this directory, you can see our plain text sample data in ''sample.txt''+Once you have TectoMT installed on your machine, you can find this tutorial in ''applications/tutorial/''. After you ''cd'' into this directory, you can see our plain text sample data in ''sample.txt''
  
-Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular ''Makefile'', four blocks are going to be applied on our sample text: sentence segmentation, tokenization, tagging and lemmatization. Since we have our input text in plain text format, the file is going to be converted into ''tmt'' format beforehand (the ''in'' section).+Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular ''Makefile'', four blocks are going to be applied on our sample text: sentence segmentation, tokenization, tagging and lemmatization. Since we have our input text in plain text format, the file is going to be converted into ''tmt'' format beforehand (the ''in'' target in the Makefile).
  
 We can run the application: We can run the application:
Line 120: Line 122:
 </code> </code>
  
-Our plain text data ''sample.txt'' have been transformed into ''tmt'', internal TectoMT format, and saved into ''sample.tmt''. Then, all four blocks have been loaded and our data has been processed. We can now examine ''sample.tmt'' using a regular text editor. We'll now stop and describe data structure in TectoMT.+Our plain text data ''sample.txt'' have been transformed into ''tmt'', an internal TectoMT format, and saved into ''sample.tmt''. Then, all four blocks have been loaded and our data has been processed. We can now examine ''sample.tmt'' using a regular text editor. We'll now stop and describe data structure in TectoMT.
  
-  * One physical file corresponds to one document.+  * One physical ''tmt'' file corresponds to one document.
   * A document consists of a sequence of bundles (''<bundle>''), mirroring a sequence of natural language sentences originating from the text. So, for one sentence we have one ''<bundle>''.   * A document consists of a sequence of bundles (''<bundle>''), mirroring a sequence of natural language sentences originating from the text. So, for one sentence we have one ''<bundle>''.
   * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example ''sample.tmt'' we have morphological tree (''SEnglishM'') in each bundle. Later on, also an analytical layer (''SEnglishA'') will appear in each bundle as we proceed with our analysis.    * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example ''sample.tmt'' we have morphological tree (''SEnglishM'') in each bundle. Later on, also an analytical layer (''SEnglishA'') will appear in each bundle as we proceed with our analysis. 
   * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node.   * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node.
 +
  
  
Line 147: Line 150:
 ===== Changing the scenario ===== ===== Changing the scenario =====
  
-We'll now add syntax analysis to our scenario by adding four more blocks. Instead of +We'll now add syntax analysis (dependency parsing) to our scenario by adding four more blocks. Instead of 
  
 <code bash> <code bash>
Line 155: Line 158:
                 SEnglishW_to_SEnglishM::Penn_style_tokenization \                 SEnglishW_to_SEnglishM::Penn_style_tokenization \
                 SEnglishW_to_SEnglishM::TagTnT \                 SEnglishW_to_SEnglishM::TagTnT \
-                SEnglishW_to_SEnglishM::Lemmatize_mtree -- sample.tmt+                SEnglishW_to_SEnglishM::Lemmatize_mtree 
 +        -- sample.tmt
 </code> </code>
  
Line 166: Line 170:
                 SEnglishW_to_SEnglishM::Penn_style_tokenization \                 SEnglishW_to_SEnglishM::Penn_style_tokenization \
                 SEnglishW_to_SEnglishM::TagTnT \                 SEnglishW_to_SEnglishM::TagTnT \
-                SEnglishW_to_SEnglishM::Lemmatize_mtree  \+                SEnglishW_to_SEnglishM::Lemmatize_mtree \
                 SEnglishM_to_SEnglishA::McD_parser_local \                 SEnglishM_to_SEnglishA::McD_parser_local \
                 SEnglishM_to_SEnglishA::Fix_McD_Tree \                 SEnglishM_to_SEnglishA::Fix_McD_Tree \
-                SEnglishM_to_SEnglishA::Fill_afun_after_McD -- sample.tmt+                SEnglishM_to_SEnglishA::Fill_afun_after_McD 
 +        -- sample.tmt
 </code> </code>
  
Line 187: Line 192:
 tmttred sample.tmt tmttred sample.tmt
 </code> </code>
 +
  
  
Line 234: Line 240:
 This block illustrates some of the most common methods for accessing objects: This block illustrates some of the most common methods for accessing objects:
  
-  * ''my @bundles = $document->get_bundles'' - an array of bundles contained in the document+  * ''my @bundles = $document->get_bundles()'' - an array of bundles contained in the document
   * ''my $root_node = $bundle->get_tree($layer_name);'' - the root node of the tree of the given type in the given bundle   * ''my $root_node = $bundle->get_tree($layer_name);'' - the root node of the tree of the given type in the given bundle
-  * ''my @children = $node->get_children;'' - array of the node's children +  * ''my @children = $node->get_children();'' - array of the node's children 
-  * ''my @descendants = $node->get_descendants;'' - array of the node's children and their children and children of their children ... +  * ''my @descendants = $node->get_descendants();'' - array of the node's children and their children and children of their children ... 
-  * ''my $parent = $node->get_parent;'' - parent node of the given node, or undef for root +  * ''my $parent = $node->get_parent();'' - parent node of the given node, or undef for root 
-  * ''my $root_node = $node->get_root;'' - the root node of the tree into which the node belongs+  * ''my $root_node = $node->get_root();'' - the root node of the tree into which the node belongs
  
 Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example:  Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: 
Line 245: Line 251:
   * ''$node->set_attr($attr_name, $attr_value);''   * ''$node->set_attr($attr_name, $attr_value);''
  
-Our tutorial block ''Print_node_info.pm'' is ready to use. You only need to add this block to our scenario:+Our tutorial block ''Print_node_info.pm'' is ready to use. You only need to add this block to our scenario, e.g. as a new Makefile target:
  
 <code bash> <code bash>
Line 258: Line 264:
 </code> </code>
  
-Try to change the block so that it prints out the information only for verbs. (You need to look at attribute ''tag'' at the ''m'' level). The tagset used is Penn Treebank Tagset.+Try to change the block so that it prints out the information only for verbs. (You need to look at an attribute ''tag'' at the ''m'' level). The tagset used is Penn Treebank Tagset.
  
  
Line 281: Line 287:
 ==== Task ==== ==== Task ====
 A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise. A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise.
 +
  
  
Line 313: Line 320:
         brunblocks -S -o \         brunblocks -S -o \
                 Tutorial::Mark_heads \                 Tutorial::Mark_heads \
-                Tutorial::Print_finite_clauses -- sample.tmt+                Tutorial::Print_finite_clauses 
 +        -- sample.tmt
 </code> </code>
  
 You are going to need these methods: You are going to need these methods:
  
-  * ''my root = $bundle->get_tree('tree_name')''+  * ''my $root = $bundle->get_tree('tree_name')''
   * ''my $attr = $node->get_attr('attr_name')''   * ''my $attr = $node->get_attr('attr_name')''
   * ''$node->set_attr('attr_name',$attr_value)''   * ''$node->set_attr('attr_name',$attr_value)''
   * ''my @eff_children = $node->get_eff_children()''   * ''my @eff_children = $node->get_eff_children()''
  
-//Note//: ''get_children'' returns topological node children in a tree, while ''get_eff_children'' returns node children in a linguistic sense. Mostly, these do not differ.+//Note//: ''get_children()'' returns topological node children in a tree, while ''get_eff_children()'' returns node children in a linguistic sense. Mostly, these do not differ.
  
  

[ Back to the navigation ] [ Back to the content ]