Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
external:tectomt:tutorial [2009/01/22 10:50] kravalova |
external:tectomt:tutorial [2010/11/10 16:39] (current) popel SEnglishM_to_SEnglishA::Clone_MTree is needed now |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== TectoMT Tutorial ====== | ====== TectoMT Tutorial ====== | ||
- | Welcome | + | Welcome |
===== What is TectoMT ===== | ===== What is TectoMT ===== | ||
- | TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, | + | TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, |
- | + | ||
- | + | ||
Line 20: | Line 16: | ||
* Your shell is bash | * Your shell is bash | ||
* You have basic experience with bash and can read basic Perl | * You have basic experience with bash and can read basic Perl | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
==== Installation and setup ==== | ==== Installation and setup ==== | ||
- | * Checkout SVN repository. If you are running this installation in computer lab in Prague, you have to checkout the repository into directory '' | + | * Checkout SVN repository. If you are running this installation in computer lab in Prague, you have to checkout the repository into directory '' |
<code bash> | <code bash> | ||
cd ~/BIG | cd ~/BIG | ||
- | svn --username | + | svn --username |
</ | </ | ||
+ | |||
+ | * accept the certificate and provide a password which is same as the username ie. : //public// | ||
* In '' | * In '' | ||
Line 48: | Line 36: | ||
</ | </ | ||
- | * In your '' | + | * In your '' |
<code bash> | <code bash> | ||
Line 59: | Line 47: | ||
source .bashrc | source .bashrc | ||
</ | </ | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
===== TectoMT Architecture ===== | ===== TectoMT Architecture ===== | ||
- | |||
- | |||
- | |||
==== Blocks, scenarios and applications ==== | ==== Blocks, scenarios and applications ==== | ||
Line 90: | Line 55: | ||
In TectoMT, there is the following hierarchy of processing units (software components that process data): | In TectoMT, there is the following hierarchy of processing units (software components that process data): | ||
- | * The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, | + | * The basic units are **blocks**. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, |
- | * To solve a more complex task, selected blocks can be chained into a block sequence, called | + | * To solve a more complex task, selected blocks can be chained into a block sequence, called |
- | * The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), | + | * The highest unit is called |
This tutorial itself has its blocks in '' | This tutorial itself has its blocks in '' | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
Line 121: | Line 76: | ||
Blocks in block repository '' | Blocks in block repository '' | ||
- | // | + | // |
There are also other directories for other purpose blocks, for example blocks which only print out some information go to '' | There are also other directories for other purpose blocks, for example blocks which only print out some information go to '' | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
Line 137: | Line 85: | ||
Once you have TectoMT installed on your machine, you can find this tutorial in '' | Once you have TectoMT installed on your machine, you can find this tutorial in '' | ||
- | Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular | + | Most applications are defined in '' |
We can run the application: | We can run the application: | ||
Line 148: | Line 96: | ||
* One physical '' | * One physical '' | ||
- | * A document consists of a sequence of bundles (''< | + | * A document consists of a sequence of bundles (element |
- | * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example '' | + | * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example '' |
- | * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently | + | * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node. |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
===== Changing the scenario ===== | ===== Changing the scenario ===== | ||
- | We'll now add a syntax analysis (dependency parsing) to our scenario by adding | + | We'll now add a syntax analysis (dependency parsing) to our scenario by adding |
- | < | + | < |
- | analyze: | + | SEnglishW_to_SEnglishM:: |
- | brunblocks -S -o \ | + | SEnglishW_to_SEnglishM:: |
- | | + | SEnglishW_to_SEnglishM:: |
- | SEnglishW_to_SEnglishM:: | + | SEnglishW_to_SEnglishM:: |
- | SEnglishW_to_SEnglishM:: | + | |
- | SEnglishW_to_SEnglishM:: | + | |
- | -- sample.tmt | + | |
</ | </ | ||
Line 194: | Line 115: | ||
<code bash> | <code bash> | ||
- | analyze: | + | SEnglishW_to_SEnglishM:: |
- | brunblocks -S -o \ | + | SEnglishW_to_SEnglishM:: |
- | | + | SEnglishW_to_SEnglishM:: |
- | SEnglishW_to_SEnglishM:: | + | SEnglishW_to_SEnglishM:: |
- | SEnglishW_to_SEnglishM:: | + | SEnglishM_to_SEnglishA:: |
- | SEnglishW_to_SEnglishM:: | + | SEnglishM_to_SEnglishA:: |
- | SEnglishM_to_SEnglishA:: | + | SEnglishM_to_SEnglishA:: |
- | SEnglishM_to_SEnglishA:: | + | SEnglishM_to_SEnglishA:: |
- | SEnglishM_to_SEnglishA:: | + | SEnglishM_to_SEnglishA:: |
- | -- sample.tmt | + | SEnglishM_to_SEnglishA:: |
</ | </ | ||
- | |||
- | //Note//: Makefiles use tabulators to mark command lines. Make sure your lines start with tabulator (or two tabulators) and not, for example, with 4 spaces. | ||
After running | After running | ||
Line 219: | Line 138: | ||
<code bash> | <code bash> | ||
- | SEnglishM_to_SEnglishA:: | + | SEnglishM_to_SEnglishA:: |
</ | </ | ||
Line 225: | Line 144: | ||
<code bash> | <code bash> | ||
- | SEnglishM_to_SEnglishA:: | + | SEnglishM_to_SEnglishA:: |
</ | </ | ||
Line 236: | Line 155: | ||
Try to click on some nodes to see their parameters (tag, lemma, form, analytical function etc). | Try to click on some nodes to see their parameters (tag, lemma, form, analytical function etc). | ||
+ | //Note//: For more information about tree editor TrEd, see [[http:// | ||
+ | If you are not familiar with '' | ||
- | + | <code bash> | |
- | + | ./ | |
- | + | </ | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
Line 290: | Line 191: | ||
Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: | Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: | ||
+ | |||
* '' | * '' | ||
* '' | * '' | ||
- | Our tutorial block '' | + | Some interesting attributes on morphologic layer are '' |
+ | |||
+ | <code bash> | ||
+ | tmttred sample.tmt | ||
+ | </ | ||
+ | |||
+ | Our tutorial block '' | ||
<code bash> | <code bash> | ||
print_info: | print_info: | ||
- | brunblocks | + | brunblocks -o Tutorial:: |
</ | </ | ||
Line 307: | Line 215: | ||
Try to change the block so that it prints out the information only for verbs. (You need to look at an attribute '' | Try to change the block so that it prints out the information only for verbs. (You need to look at an attribute '' | ||
- | |||
- | |||
- | |||
- | |||
===== Advanced block: finite clauses ===== | ===== Advanced block: finite clauses ===== | ||
- | |||
- | |||
- | |||
- | |||
- | |||
==== Motivation ==== | ==== Motivation ==== | ||
- | It is assumed that finite clauses can be translated independently, | + | It is assumed that finite clauses can be translated independently, |
- | + | ||
- | + | ||
- | + | ||
==== Task ==== | ==== Task ==== | ||
A block which, given an analytical tree ('' | A block which, given an analytical tree ('' | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
==== Instructions ==== | ==== Instructions ==== | ||
Line 365: | Line 232: | ||
<code bash> | <code bash> | ||
finite_clauses: | finite_clauses: | ||
- | brunblocks -S -o \ | + | brunblocks -S -o Tutorial:: |
- | | + | |
- | | + | |
- | | + | |
</ | </ | ||
Line 378: | Line 242: | ||
* '' | * '' | ||
- | //Note//: '' | + | //Note//: '' |
+ | |||
+ | //Hint//: Finite clauses in English usually require grammatical subject to be present. | ||
==== Advanced version ==== | ==== Advanced version ==== | ||
The output of our block might still be incorrect in special cases - we don't solve coordination (see the second sentence in sample.txt) and subordinate conjunctions. | The output of our block might still be incorrect in special cases - we don't solve coordination (see the second sentence in sample.txt) and subordinate conjunctions. | ||
- | |||
===== Your turn: more tasks ===== | ===== Your turn: more tasks ===== | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
==== SVO to SOV ==== | ==== SVO to SOV ==== | ||
- | **Motivation**: | + | **Motivation**: |
**Task**: Change the word order from SVO to SOV. | **Task**: Change the word order from SVO to SOV. | ||
Line 409: | Line 261: | ||
**Instructions**: | **Instructions**: | ||
- | * To find an object | + | |
+ | | ||
* For debugging, a method returning surface word order of a node is useful: '' | * For debugging, a method returning surface word order of a node is useful: '' | ||
- | * Once you have node '' | + | * Once you have the node '' |
- | + | ||
- | **Advanced version**: This solution shifts object (or more objects) of a verb just in front of that verb node. So f.e.: //Mr. Brown has urged MPs.// changes to: //Mr. Brown has MPs urged.// You can try to change this solution, so the final sentence would be: //Mr. Brown MPs has urged.// You may need a method '' | + | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
+ | **Advanced version**: This solution shifts object (or more objects) of a verb just in front of that verb node. So f.e.: //Mr. Brown has urged MPs.// changes to: //Mr. Brown has MPs urged.// You can try to change this solution, so the final sentence would be: //Mr. Brown MPs has urged.// You may need a method '' | ||
Line 451: | Line 273: | ||
{{ external: | {{ external: | ||
- | **Motivation**: | + | **Motivation**: |
**Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. | **Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. | ||
Line 464: | Line 286: | ||
// | // | ||
* On analytical layer, you can use this test to recognize prepositions: | * On analytical layer, you can use this test to recognize prepositions: | ||
- | * You can use block template in '' | ||
* To see the results, you can again use TrEd ('' | * To see the results, you can again use TrEd ('' | ||
**Advanced version**: What happens in case of multiword prepositions? | **Advanced version**: What happens in case of multiword prepositions? | ||
- | |||
===== Further information ===== | ===== Further information ===== | ||
- | * [[http://ufallab2.ms.mff.cuni.cz/ | + | * [[http://ufal.mff.cuni.cz/ |
* Questions? Ask '' | * Questions? Ask '' | ||
* Solutions to this tutorial tasks are in '' | * Solutions to this tutorial tasks are in '' | ||
- | * [[http:// | + | * [[http:// |
- | + | ||
- | + | ||
+ | If you are missing some files from //share//, you can download it from [[http:// |