[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
external:tectomt:tutorial [2009/01/19 13:08]
kravalova
external:tectomt:tutorial [2009/01/20 15:31]
kravalova
Line 2: Line 2:
  
 Welcome at TectoMT Tutorial. This tutorial should take about 2 hours. Welcome at TectoMT Tutorial. This tutorial should take about 2 hours.
 +
  
  
Line 7: Line 8:
  
 TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces.  TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces. 
 +
  
 ===== Prerequisities ===== ===== Prerequisities =====
 +
 +In this tutorial, we assume 
 +
 +  * Your system is Linux
 +  * Your shell is bash
 +  * You have basic experience bash and you can read Perl
 +
 +
 +
 +
 +
  
  
Line 14: Line 27:
 ==== Installation and setup ==== ==== Installation and setup ====
  
-TODO popsat instalaci+  * Checkout SVN repository. If you are running this installation in computer lab in Prague, you have checkout the repository into directory /home/BIG (because data quotas don't apply here):
  
-Before running any experiments with TectoMT, you must set up your environment by running+<code bash> 
 +    cd ~/BIG 
 +    svn --username <username> co https://svn.ms.mff.cuni.cz/svn/tectomt_devel/trunk tectomt 
 +</code> 
 + 
 +  * In ''tectomt/install/'' run ./install.sh:
  
 <code bash> <code bash>
-source config/init_devel_environ.sh+    cd tectomt/install 
 +    ./install.sh
 </code> </code>
 +
 +  * In your ''.bashrc'' file, add line (or source this file every time before experimenting with TectoMT):
 +
 +<code bash>
 +    source ~/BIG/tectomt/config/init_devel_environ.sh
 +</code>
 +
 +
 +
 +
  
  
  
  
-==== Theoretical background ==== 
  
-TODO obrazek 
  
  
Line 34: Line 61:
  
  
-==== TrEd ==== 
  
-TODO malicko o TrEdu a obrazek 
  
  
Line 54: Line 79:
  
 This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''. This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''.
 +
  
  
Line 59: Line 85:
  
 ==== Layers of Linguistic Structures ==== ==== Layers of Linguistic Structures ====
 +
 +{{ external:tectomt:pyramid.gif?300x190|MT pyramid in terms of PDT layers}}
  
 TectoMT blocks repository is saved in ''libs/blocks/''. In correspondence with ..., the blocks are located in directories describing their purpose.  TectoMT blocks repository is saved in ''libs/blocks/''. In correspondence with ..., the blocks are located in directories describing their purpose. 
Line 71: Line 99:
  
 There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/Tutorial''. There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/Tutorial''.
 +
  
  
Line 78: Line 107:
 ===== First application ===== ===== First application =====
  
-Once you have TectoMT installed on your machine, you can find this tutorial in ''devel/applications/tutorial/''. After you cd in to this directory, you can see our plain text sample data in ''sample.txt''+Once you have TectoMT installed on your machine, you can find this tutorial in ''applications/tutorial/''. After you cd in to this directory, you can see our plain text sample data in ''sample.txt''
  
 Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular ''Makefile'', four blocks are going to be applied on our sample text: sentence segmentation, tokenization, tagging and lemmatization. Since we have our input text in plain text format, the file is going to be converted into ''tmt'' format beforehand (the ''in'' section). Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular ''Makefile'', four blocks are going to be applied on our sample text: sentence segmentation, tokenization, tagging and lemmatization. Since we have our input text in plain text format, the file is going to be converted into ''tmt'' format beforehand (the ''in'' section).
Line 155: Line 184:
 tmttred sample.tmt tmttred sample.tmt
 </code> </code>
 +
 +
 +
 +
 +
 +
 +
  
  
Line 206: Line 242:
   * ''$node->set_attr($attr_name, $attr_value);''   * ''$node->set_attr($attr_name, $attr_value);''
  
-Our tutorial block ''Print_node_info.pm'' is ready to use:+Our tutorial block ''Print_node_info.pm'' is ready to use. You only need to add this block to our scenario:
  
-  * Copy the block to the right place in blocks repository ''devel/libs/blocks/Print'' (because it is a printing block) +<code bash> 
-    <code bash> +print_info
-    cp libs/blocks/Tutorial/Print_node_info.pm devel/libs/blocks/Print/Print_node_info.pm +        brunblocks -S -o Tutorial::Print_node_info -- sample.tmt 
-    </code> +</code>        
-  * in copied file ''Print/Print_node_info.pm'', edit the block package from ''package Tutorial::Print_node_info.pm'' to ''package Print::Print_node_info.pm'' +
-  * Add this block to our scenario:  +
-    <code bash> +
-    print_afun: +
-            brunblocks -S -o Print::Print_node_info -- sample.tmt +
-    </code>        +
  
 We can observe our new block behaviour: We can observe our new block behaviour:
  
 <code bash> <code bash>
-make print_afun+make print_info
 </code> </code>
  
 +Try to change the block so that it prints out the information only for verbs. (You need to look at attribute ''tag'' at the ''m'' level). The tagset used is Penn Treebank Tagset.
  
  
  
  
 +
 +
 +===== Advanced block: finite clauses =====
  
  
  
  
-===== Advanced block: finite clauses ===== 
  
  
 ==== Motivation ==== ==== Motivation ====
 +
 +It is assumed that finite clauses can be translated independently, which would reduce computational complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses.
 +
 +
  
  
 ==== Task ==== ==== Task ====
-A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with nummerical attribute ''<clause>'' so that nodes in the same finite clause are marked with the same number of clause.+A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise. 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
  
  
  
-==== Algorithm ==== 
  
  
Line 252: Line 304:
 ==== Instructions ==== ==== Instructions ====
  
-There is a block template with hints in ''devel/libs/blocks/Tutorial/Tutorial_fill_finite_clauses.pm''Copy the file to ''SEnglishA_to_SEnglishT'' and edit this file using the hints in it. Also, don't forget to change the name of package (to ''SEnglishA_to_SEnglishT::Tutorial_fill_finite_clauses''). The ouput of this block should be the same a-tree with nummerical value ''<clause>'' attached to each ''a-node''. There is also a printing block ''devel/libs/blocks/Tutorial_print_finite_clauses.pm'' which will print out the ''a-nodes'' grouped by clauses:+There is a block template with hints in ''libs/blocks/Tutorial/Mark_heads.pm''You should edit the block so that the ouput of this block is the same a-tree, in addition with attribute ''is_head'' attached to each ''a-node''. There is also a printing block ''libs/blocks/Print_finite_clauses.pm'' which will print out the ''a-nodes'' grouped by clauses:
  
 <code bash> <code bash>
 finite_clauses: finite_clauses:
         brunblocks -S -o \         brunblocks -S -o \
-                SEnglishA_to_SEnglishT::Tutorial_fill_finite_clauses +                Tutorial::Mark_heads 
-                Print::Tutorial_print_finite_clauses -- sample.tmt+                Tutorial::Print_finite_clauses -- sample.tmt
 </code> </code>
  
 You are going to need these methods: You are going to need these methods:
  
-  * ''$bundle->get_tree($tree_name)'' +  * ''my root = $bundle->get_tree('tree_name')'' 
-  * ''$node->get_attr($attr_name)''+  * ''my $attr = $node->get_attr('attr_name')''
   * ''$node->set_attr('attr_name',$attr_value)''   * ''$node->set_attr('attr_name',$attr_value)''
-  * ''$node->get_eff_children()'' +  * ''my @eff_children = $node->get_eff_children()''
-  * ''$node->get_children()''+
  
 +//Note//: ''get_children'' returns topological node children in a tree, while ''get_eff_children'' returns node children in a linguistic sense. Mostly, these do not differ.
  
  
 +
 +//Advanced version//: The output of our block might still be incorrect in special cases - we don't solve coordination and subordinate conjunctions.
  
  
Line 279: Line 333:
  
  
-==== Coordination ==== 
  
-This time ...  
  
-You can use block template in ''devel/libs/blocks/BlockTemplate.pm''. To see the results, you can again use TrEd (''tmttred sample.tmt'')+ 
 +==== SVO to SOV ==== 
 + 
 +**Motivation**: During translation from an SVO based language (English) to an SOV based language (Korean) we might need to change the word order from SVO to SOV.  
 + 
 +**Task**: On analytical layer, change the word order from SVO to SOV. 
 + 
 +**Instructions**: To find an object to a verb, look for ''$afun eq 'Obj' '' among effective children of a verb. 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 + 
 +==== Prepositions ==== 
 + 
 +**Motivation**: In dependency approach a question "where to hang prepositions" arises. In praguian style (PDT), prepositions are heads of the subtree and the noun/pronoun is dependent on the preposition. However, another ordering might be preferable: The noun/pronoun might be the head of subtree, while the preposition would take the role of a modifier. 
 + 
 +TODO obrazek 
 + 
 +**Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. 
 + 
 +** Instructions**: 
 + 
 +You are going to need these new methods: 
 +  * ''my @children = $node->get_children'' 
 +  * ''my $parent = $node->get_parent'' 
 +  * ''$node->set_parent($parent)'' 
 + 
 +//Hint//:  
 +  * On analytical layer, you can use this test to recognize prepositions: ''$afun eq 'AuxP' ''  
 +  * You can use block template in ''libs/blocks/BlockTemplate.pm''. To see the results, you can again use TrEd (''tmttred sample.tmt'') 
 + 
 + 
 +//Advanced version//: What happens in case of multiword prepositions? For example, ''because of'', ''instead of''. Can you handle it? 
 + 
 + 
  
  
Line 291: Line 391:
   * [[http://ufallab2.ms.mff.cuni.cz/~bojar/cruise_control_tmt/last_doc/generated/guide/guidelines.html|TectoMT Developer's Guide]] - obsolete   * [[http://ufallab2.ms.mff.cuni.cz/~bojar/cruise_control_tmt/last_doc/generated/guide/guidelines.html|TectoMT Developer's Guide]] - obsolete
   * Questions? Ask ''kravalova'' at ''ufal.mff.cuni.cz''   * Questions? Ask ''kravalova'' at ''ufal.mff.cuni.cz''
-  * Solutions to +  * Solutions to this tutorial tasks are in ''libs/blocks/Tutorial/*solution.pm''
 +  * [[http://ufal.mff.cuni.cz/~pajas/tred/|TrEd]] - tree editor
  
  
  
  

[ Back to the navigation ] [ Back to the content ]