Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
external:tectomt:tutorial [2009/01/19 16:09] kravalova |
external:tectomt:tutorial [2009/01/22 11:51] kravalova |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== TectoMT Tutorial ====== | ====== TectoMT Tutorial ====== | ||
- | Welcome at TectoMT Tutorial. This tutorial should take about 2 hours. | + | Welcome at TectoMT Tutorial. This tutorial should take about 3 hours. |
Line 7: | Line 8: | ||
===== What is TectoMT ===== | ===== What is TectoMT ===== | ||
- | TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, | + | TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, |
+ | |||
+ | |||
+ | |||
===== Prerequisities ===== | ===== Prerequisities ===== | ||
+ | |||
+ | In this tutorial, we assume | ||
+ | |||
+ | * Your system is Linux | ||
+ | * Your shell is bash | ||
+ | * You have basic experience with bash and can read basic Perl | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Line 15: | Line 38: | ||
==== Installation and setup ==== | ==== Installation and setup ==== | ||
- | TODO popsat instalaci | + | * Checkout SVN repository. If you are running this installation in computer lab in Prague, you have to checkout the repository into directory ''/ |
- | Before running any experiments with TectoMT, you must set up your environment by running | + | <code bash> |
+ | cd ~/BIG | ||
+ | svn --username mtm co https:// | ||
+ | </ | ||
+ | |||
+ | * In '' | ||
<code bash> | <code bash> | ||
- | source config/init_devel_environ.sh | + | cd tectomt/install |
+ | ./install.sh | ||
</ | </ | ||
+ | |||
+ | * In your '' | ||
+ | |||
+ | <code bash> | ||
+ | source ~/ | ||
+ | </ | ||
+ | |||
+ | * In your '' | ||
+ | |||
+ | <code bash> | ||
+ | source .bashrc | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
- | ==== Theoretical background ==== | ||
- | TODO obrazek | ||
Line 35: | Line 80: | ||
- | ==== TrEd ==== | ||
- | TODO malicko o TrEdu a obrazek | ||
Line 43: | Line 86: | ||
===== TectoMT Architecture ===== | ===== TectoMT Architecture ===== | ||
+ | |||
Line 50: | Line 94: | ||
In TectoMT, there is the following hierarchy of processing units (software components that process data): | In TectoMT, there is the following hierarchy of processing units (software components that process data): | ||
- | * The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, | + | * The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, |
* To solve a more complex task, selected blocks can be chained into a block sequence, called also a scenario. Technically, | * To solve a more complex task, selected blocks can be chained into a block sequence, called also a scenario. Technically, | ||
- | * The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), | + | * The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), |
This tutorial itself has its blocks in '' | This tutorial itself has its blocks in '' | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Line 61: | Line 113: | ||
==== Layers of Linguistic Structures ==== | ==== Layers of Linguistic Structures ==== | ||
- | TectoMT blocks repository is saved in '' | + | {{ external: |
- | Thus, the set of TectoMT layers is Cartesian product {S,T} x {English, | + | The notion of ' |
+ | |||
+ | Thus, the set of TectoMT layers is a Cartesian product {S,T} x {English, | ||
* {S,T} distinguishes whether the data was created by analysis or transfer/ | * {S,T} distinguishes whether the data was created by analysis or transfer/ | ||
Line 69: | Line 123: | ||
* {W, | * {W, | ||
- | // | + | Blocks in block repository '' |
+ | |||
+ | // | ||
+ | |||
+ | There are also other directories for other purpose blocks, for example blocks which only print out some information go to '' | ||
+ | |||
+ | |||
+ | |||
- | There are also other directories for other purpose blocks, for example blocks which only print out some information go to '' | ||
Line 80: | Line 141: | ||
===== First application ===== | ===== First application ===== | ||
- | Once you have TectoMT installed on your machine, you can find this tutorial in '' | + | Once you have TectoMT installed on your machine, you can find this tutorial in '' |
- | Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular '' | + | Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular '' |
We can run the application: | We can run the application: | ||
Line 90: | Line 151: | ||
</ | </ | ||
- | Our plain text data '' | + | Our plain text data '' |
- | * One physical file corresponds to one document. | + | * One physical |
* A document consists of a sequence of bundles (''< | * A document consists of a sequence of bundles (''< | ||
* Each bundle contains tree shaped sentence representations on various linguistic layers. In our example '' | * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example '' | ||
- | * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently | + | * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node. |
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Line 117: | Line 187: | ||
===== Changing the scenario ===== | ===== Changing the scenario ===== | ||
- | We'll now add syntax analysis to our scenario by adding | + | We'll now add a syntax analysis |
<code bash> | <code bash> | ||
Line 124: | Line 194: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
- | SEnglishW_to_SEnglishM:: | + | SEnglishW_to_SEnglishM:: |
- | SEnglishW_to_SEnglishM:: | + | SEnglishW_to_SEnglishM:: |
+ | | ||
</ | </ | ||
Line 135: | Line 206: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
- | SEnglishW_to_SEnglishM:: | + | SEnglishW_to_SEnglishM:: |
- | SEnglishW_to_SEnglishM:: | + | SEnglishW_to_SEnglishM:: |
SEnglishM_to_SEnglishA:: | SEnglishM_to_SEnglishA:: | ||
SEnglishM_to_SEnglishA:: | SEnglishM_to_SEnglishA:: | ||
- | SEnglishM_to_SEnglishA:: | + | SEnglishM_to_SEnglishA:: |
+ | | ||
</ | </ | ||
- | //Note//: Makefiles use tabulators to mark command lines. Make sure your lines start with tabulator (or two tabulators) and not, for example, with 4 spaces. | + | //Note//: Makefiles use tabulators to mark command lines. Make sure your lines start with a tabulator (or two tabulators) and not, for example, with 4 spaces. |
After running | After running | ||
Line 151: | Line 223: | ||
we can examine our '' | we can examine our '' | ||
+ | |||
+ | Blocks can also be parametrized. For syntax parser, we might want to use a smaller but faster model. To achieve this, replace the line | ||
+ | |||
+ | <code bash> | ||
+ | SEnglishM_to_SEnglishA:: | ||
+ | </ | ||
+ | |||
+ | with | ||
+ | |||
+ | <code bash> | ||
+ | SEnglishM_to_SEnglishA:: | ||
+ | </ | ||
You can view the trees in '' | You can view the trees in '' | ||
Line 157: | Line 241: | ||
tmttred sample.tmt | tmttred sample.tmt | ||
</ | </ | ||
+ | |||
+ | Try to click on some nodes to see their parameters (tag, lemma, form, analytical function etc). | ||
+ | |||
+ | //Note//: For more information about tree editor TrEd, see [[http:// | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Line 201: | Line 294: | ||
This block illustrates some of the most common methods for accessing objects: | This block illustrates some of the most common methods for accessing objects: | ||
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | * '' |
Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: | Attributes of documents, bundles or nodes can be accessed by attribute getters and setters, for example: | ||
- | * '' | ||
- | * '' | ||
- | Our tutorial block '' | + | * '' |
+ | * '' | ||
+ | |||
+ | Some interesting attributes on morphologic layer are '' | ||
+ | |||
+ | <code bash> | ||
+ | tmttred sample.tmt | ||
+ | </ | ||
+ | |||
+ | Our tutorial block '' | ||
<code bash> | <code bash> | ||
Line 225: | Line 325: | ||
</ | </ | ||
- | Try to change the block so that it prints out the information only for verbs. (You need to look at attribute '' | + | Try to change the block so that it prints out the information only for verbs. (You need to look at an attribute '' |
Line 233: | Line 333: | ||
===== Advanced block: finite clauses ===== | ===== Advanced block: finite clauses ===== | ||
+ | |||
+ | |||
+ | |||
Line 238: | Line 341: | ||
==== Motivation ==== | ==== Motivation ==== | ||
- | TODO prelozit motivaci | + | It is assumed that finite clauses can be translated independently, |
+ | |||
+ | |||
==== Task ==== | ==== Task ==== | ||
- | A block which, given an analytical tree ('' | + | A block which, given an analytical tree ('' |
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
- | ==== Algorithm ==== | ||
- | TODO obrazek a vysvetleni algoritmu | ||
Line 258: | Line 383: | ||
==== Instructions ==== | ==== Instructions ==== | ||
- | There is a block template with hints in '' | + | There is a block template with hints in '' |
<code bash> | <code bash> | ||
Line 264: | Line 389: | ||
brunblocks -S -o \ | brunblocks -S -o \ | ||
Tutorial:: | Tutorial:: | ||
- | Tutorial:: | + | Tutorial:: |
+ | | ||
</ | </ | ||
You are going to need these methods: | You are going to need these methods: | ||
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | * '' |
- | * '' | + | |
+ | //Note//: '' | ||
+ | //Hint//: Finite clauses in English usually require grammatical subject to be present. | ||
+ | ==== Advanced version ==== | ||
- | ==== Is it perfect? ==== | + | The output of our block might still be incorrect in special cases - we don't solve coordination |
- | + | ||
- | The output of our block is not ideal - we don't solve coordination and subordinate conjunctions. But for the time being, we can move on to some other tasks. | + | |
Line 287: | Line 413: | ||
- | ==== SVO typology ==== | ||
- | TODO | ||
- | ==== Coordination ==== | ||
- | This time TODO vysvetleni a obrazek | ||
- | You can use block template in '' | ||
Line 305: | Line 426: | ||
+ | ==== SVO to SOV ==== | ||
- | ===== Further information ===== | + | **Motivation**: During translation from an SVO based language (e.g. English) to an SOV based language (e.g. Korean) we might need to change the word order from SVO to SOV. |
- | | + | |
- | | + | |
- | | + | |
- | * [[http:// | + | |
+ | **Task**: Change the word order from SVO to SOV. | ||
+ | **Instructions**: | ||
+ | |||
+ | * You can use block template in '' | ||
+ | * To find an object to a verb, look for objects among effective children of a verb ('' | ||
+ | * For debugging, a method returning surface word order of a node is useful: '' | ||
+ | * Once you have node '' | ||
+ | |||
+ | **Advanced version**: This solution shifts object (or more objects) of a verb just in front of that verb node. So f.e.: //Mr. Brown has urged MPs.// changes to: //Mr. Brown has MPs urged.// You can try to change this solution, so the final sentence would be: //Mr. Brown MPs has urged.// You may need a method '' | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Prepositions ==== | ||
+ | |||
+ | {{ external: | ||
+ | |||
+ | **Motivation**: | ||
+ | |||
+ | **Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. | ||
+ | |||
+ | ** Instructions**: | ||
+ | |||
+ | You are going to need these new methods: | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | // | ||
+ | * On analytical layer, you can use this test to recognize prepositions: | ||
+ | * To see the results, you can again use TrEd ('' | ||
+ | |||
+ | **Advanced version**: What happens in case of multiword prepositions? | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ===== Further information ===== | ||
+ | * [[http:// | ||
+ | * Questions? Ask '' | ||
+ | * Solutions to this tutorial tasks are in '' | ||
+ | * [[http:// | ||