[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
external:tectomt:tutorial [2010/04/14 01:30]
popel použití tutorial.scen, element <bundle> není (jmenuje se <LM>)
external:tectomt:tutorial [2010/11/10 16:39] (current)
popel SEnglishM_to_SEnglishA::Clone_MTree is needed now
Line 2: Line 2:
  
 Welcome to the TectoMT Tutorial. This tutorial should take about 3 hours. Welcome to the TectoMT Tutorial. This tutorial should take about 3 hours.
- 
- 
  
  
Line 9: Line 7:
  
 TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to facilitate and significantly accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces.  TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to facilitate and significantly accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces. 
- 
- 
- 
  
  
Line 21: Line 16:
   * Your shell is bash   * Your shell is bash
   * You have basic experience with bash and can read basic Perl   * You have basic experience with bash and can read basic Perl
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
  
Line 68: Line 47:
     source .bashrc     source .bashrc
 </code> </code>
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
  
 ===== TectoMT Architecture ===== ===== TectoMT Architecture =====
- 
- 
- 
- 
- 
  
 ==== Blocks, scenarios and applications ==== ==== Blocks, scenarios and applications ====
Line 106: Line 60:
  
 This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''. This tutorial itself has its blocks in ''libs/blocks/Tutorial'' and the application in ''applications/tutorial''.
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
  
Line 136: Line 79:
  
 There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/blocks/Tutorial/''. There are also other directories for other purpose blocks, for example blocks which only print out some information go to ''libs/Print''. Our tutorial blocks are in ''libs/blocks/Tutorial/''.
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
  
Line 166: Line 99:
   * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example ''sample.tmt'' we have morphological tree (''SEnglishM'') in each bundle (actually, it is a flat tree: one technical root and its children are the tokens). Later on, also an analytical layer (''SEnglishA'') will appear in each bundle as we proceed with our analysis.    * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example ''sample.tmt'' we have morphological tree (''SEnglishM'') in each bundle (actually, it is a flat tree: one technical root and its children are the tokens). Later on, also an analytical layer (''SEnglishA'') will appear in each bundle as we proceed with our analysis. 
   * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node.   * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node.
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
  
 ===== Changing the scenario ===== ===== Changing the scenario =====
  
-We'll now add a syntax analysis (dependency parsing) to our scenario by adding three more blocks. Instead of +We'll now add a syntax analysis (dependency parsing) to our scenario by adding five more blocks to ''tutorial.scen''. Instead of 
  
-<code bash+<code> 
-analyze: +SEnglishW_to_SEnglishM::Sentence_segmentation_simple 
-        brunblocks -S -o \ +SEnglishW_to_SEnglishM::Tokenization 
-                SEnglishW_to_SEnglishM::Sentence_segmentation_simple \ +SEnglishW_to_SEnglishM::TagMxPost 
-                SEnglishW_to_SEnglishM::Penn_style_tokenization \ +SEnglishW_to_SEnglishM::Lemmatize_mtree
-                SEnglishW_to_SEnglishM::TagMxPost \ +
-                SEnglishW_to_SEnglishM::Lemmatize_mtree +
-        -- sample.tmt+
 </code> </code>
  
Line 215: Line 115:
  
 <code bash> <code bash>
-analyze: +SEnglishW_to_SEnglishM::Sentence_segmentation_simple 
-        brunblocks -S -o \ +SEnglishW_to_SEnglishM::Tokenization 
-                SEnglishW_to_SEnglishM::Sentence_segmentation_simple \ +SEnglishW_to_SEnglishM::TagMxPost 
-                SEnglishW_to_SEnglishM::Penn_style_tokenization \ +SEnglishW_to_SEnglishM::Lemmatize_mtree 
-                SEnglishW_to_SEnglishM::TagMxPost \ +SEnglishM_to_SEnglishA::Clone_MTree 
-                SEnglishW_to_SEnglishM::Lemmatize_mtree \ +SEnglishM_to_SEnglishA::McD_parser 
-                SEnglishM_to_SEnglishA::McD_parser_local \ +SEnglishM_to_SEnglishA::Fill_is_member_from_deprel 
-                SEnglishM_to_SEnglishA::Fix_McD_Tree \ +SEnglishM_to_SEnglishA::Fix_McD_topology 
-                SEnglishM_to_SEnglishA::Fill_afun_after_McD \ +SEnglishM_to_SEnglishA::Fill_afun_AuxCP_Coord 
-        -- sample.tmt+SEnglishM_to_SEnglishA::Fill_afun
 </code> </code>
- 
-//Note//: ''Makefiles'' use tabulators to mark command lines. Make sure your lines start with a tabulator (or two tabulators) and not, for example, with 4 spaces. 
  
 After running After running
Line 240: Line 138:
  
 <code bash> <code bash>
-SEnglishM_to_SEnglishA::McD_parser_local \+SEnglishM_to_SEnglishA::McD_parser
 </code> </code>
  
Line 246: Line 144:
  
 <code bash> <code bash>
-SEnglishM_to_SEnglishA::McD_parser_local TMT_PARAM_MCD_EN_MODEL=conll_mcd_order2_0.1.model \+SEnglishM_to_SEnglishA::McD_parser TMT_PARAM_MCD_EN_MODEL=conll_mcd_order2_0.1.model
 </code> </code>
  
Line 259: Line 157:
 //Note//: For more information about tree editor TrEd, see [[http://ufal.mff.cuni.cz/~pajas/tred/ar01-toc.html|TrEd User's Manual]]. //Note//: For more information about tree editor TrEd, see [[http://ufal.mff.cuni.cz/~pajas/tred/ar01-toc.html|TrEd User's Manual]].
  
-If you are not familiar with ''Makefile'' syntax, another way of running a scenario in TectoMT is using ''.scen'' file (see ''applications/tutorial.scen''). This file lists the blocks to be run - one block on a single line.  +If you are not familiar with ''Makefile'' syntax, you can run the scenario with a simple ''bash'' script (see ''applications/tutorial/run_all.sh''):
- +
-<code bash> +
-$TMT_ROOT/tools/format_convertors/plaintext_to_tmt/plaintext_to_tmt.pl English sample.txt +
-brunblocks -S -o --scen tutorial.scen -- sample.tmt +
-</code> +
- +
-Finally, yet another way is to use a simple ''bash'' script (see ''applications/tutorial/run_all.sh''):+
  
 <code bash> <code bash>
 ./run_all.sh ./run_all.sh
 </code> </code>
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
  
Line 339: Line 205:
 <code bash> <code bash>
 print_info: print_info:
-        brunblocks -S -o Tutorial::Print_node_info -- sample.tmt+        brunblocks -o Tutorial::Print_node_info -- sample.tmt
 </code>         </code>        
  
Line 349: Line 215:
  
 Try to change the block so that it prints out the information only for verbs. (You need to look at an attribute ''tag'' at the ''m'' level). The tagset used is Penn Treebank Tagset. Try to change the block so that it prints out the information only for verbs. (You need to look at an attribute ''tag'' at the ''m'' level). The tagset used is Penn Treebank Tagset.
- 
- 
- 
- 
  
  
 ===== Advanced block: finite clauses ===== ===== Advanced block: finite clauses =====
- 
- 
- 
- 
- 
- 
  
 ==== Motivation ==== ==== Motivation ====
  
 It is assumed that finite clauses can be translated independently, which would reduce combinatorial complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses. It is assumed that finite clauses can be translated independently, which would reduce combinatorial complexity or make parallel translation possible. We could even use hybrid translation - each finite clause could be translated by the most self-confident translation system. In this task, we are going to split the sentence into finite clauses.
- 
- 
- 
- 
  
 ==== Task ==== ==== Task ====
 A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_clause_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise. A block which, given an analytical tree (''SEnglishA''), fills each ''a-node'' with boolean attribute ''is_clause_head'' which is set to ''1'' if the ''a-node'' corresponds to a finite verb, and to ''0'' otherwise.
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
 ==== Instructions ==== ==== Instructions ====
Line 414: Line 232:
 <code bash> <code bash>
 finite_clauses: finite_clauses:
-        brunblocks -S -o +        brunblocks -S -o Tutorial::Mark_heads Tutorial::Print_finite_clauses -- sample.tmt
-                Tutorial::Mark_heads +
-                Tutorial::Print_finite_clauses +
-        -- sample.tmt+
 </code> </code>
  
Line 434: Line 249:
  
 The output of our block might still be incorrect in special cases - we don't solve coordination (see the second sentence in sample.txt) and subordinate conjunctions. The output of our block might still be incorrect in special cases - we don't solve coordination (see the second sentence in sample.txt) and subordinate conjunctions.
- 
  
  
 ===== Your turn: more tasks ===== ===== Your turn: more tasks =====
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
 ==== SVO to SOV ==== ==== SVO to SOV ====
Line 469: Line 267:
  
 **Advanced version**: This solution shifts object (or more objects) of a verb just in front of that verb node. So f.e.: //Mr. Brown has urged MPs.// changes to: //Mr. Brown has MPs urged.// You can try to change this solution, so the final sentence would be: //Mr. Brown MPs has urged.// You may need a method ''$node<nowiki>-></nowiki>shift_after_subtree($root_of_that_subtree)''. Subjects should have attribute '''afun' eq 'Sb'''. **Advanced version**: This solution shifts object (or more objects) of a verb just in front of that verb node. So f.e.: //Mr. Brown has urged MPs.// changes to: //Mr. Brown has MPs urged.// You can try to change this solution, so the final sentence would be: //Mr. Brown MPs has urged.// You may need a method ''$node<nowiki>-></nowiki>shift_after_subtree($root_of_that_subtree)''. Subjects should have attribute '''afun' eq 'Sb'''.
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
- 
  
  
Line 524: Line 289:
  
 **Advanced version**: What happens in case of multiword prepositions? For example, ''because of'', ''instead of''. Can you handle it? **Advanced version**: What happens in case of multiword prepositions? For example, ''because of'', ''instead of''. Can you handle it?
- 
- 
- 
  
  
 ===== Further information ===== ===== Further information =====
-  * [[http://ufallab2.ms.mff.cuni.cz/~bojar/cruise_control_tmt/last_doc/generated/guide/guidelines.html|TectoMT Developer's Guide]]+  * [[http://ufal.mff.cuni.cz/tectomt|TectoMT Homepage]]
   * Questions? Ask ''kravalova'' at ''ufal.mff.cuni.cz''   * Questions? Ask ''kravalova'' at ''ufal.mff.cuni.cz''
   * Solutions to this tutorial tasks are in ''libs/blocks/Tutorial/*solution*.pm''.   * Solutions to this tutorial tasks are in ''libs/blocks/Tutorial/*solution*.pm''.
   * [[http://ufal.mff.cuni.cz/~pajas/tred/|TrEd]], [[http://ufal.mff.cuni.cz/~pajas/tred/ar01-toc.html|TrEd User's Manual]] - tree editor   * [[http://ufal.mff.cuni.cz/~pajas/tred/|TrEd]], [[http://ufal.mff.cuni.cz/~pajas/tred/ar01-toc.html|TrEd User's Manual]] - tree editor
  
 +If you are missing some files from //share//, you can download it from [[http://ufallab.ms.mff.cuni.cz/tectomt/share/]].

[ Back to the navigation ] [ Back to the content ]