Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
external:tectomt:tutorial [2009/01/15 13:25] kravalova |
external:tectomt:tutorial [2009/01/20 15:29] kravalova |
||
---|---|---|---|
Line 2: | Line 2: | ||
Welcome at TectoMT Tutorial. This tutorial should take about 2 hours. | Welcome at TectoMT Tutorial. This tutorial should take about 2 hours. | ||
+ | |||
Line 7: | Line 8: | ||
TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, | TectoMT is a highly modular NLP (Natural Language Processing) software system implemented in Perl programming language under Linux. It is primarily aimed at Machine Translation, | ||
+ | |||
===== Prerequisities ===== | ===== Prerequisities ===== | ||
- | In this tutorial, we assume | + | In this tutorial, we assume |
- | Before | + | * Your system is Linux |
+ | * Your shell is bash | ||
+ | * You have basic experience bash and you can read Perl | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Installation and setup ==== | ||
+ | |||
+ | * Checkout SVN repository. If you are running | ||
<code bash> | <code bash> | ||
- | source devel/config/init_shell_environ.sh | + | cd ~/BIG |
+ | svn --username < | ||
</ | </ | ||
+ | |||
+ | * In '' | ||
+ | |||
+ | <code bash> | ||
+ | cd tectomt/ | ||
+ | ./ | ||
+ | </ | ||
+ | |||
+ | * In your '' | ||
+ | |||
+ | <code bash> | ||
+ | source ~/ | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Line 23: | Line 59: | ||
- | ===== Layers of linguistic structures ===== | ||
- | ===== TrEd ===== | ||
Line 33: | Line 67: | ||
===== TectoMT Architecture ===== | ===== TectoMT Architecture ===== | ||
+ | |||
+ | |||
+ | |||
+ | ==== Blocks, scenarios and applications ==== | ||
In TectoMT, there is the following hierarchy of processing units (software components that process data): | In TectoMT, there is the following hierarchy of processing units (software components that process data): | ||
- | * The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, | + | * The basic units are blocks. They serve for some very limited, well defined, and often linguistically interpretable tasks (e.g., tokenization, |
* To solve a more complex task, selected blocks can be chained into a block sequence, called also a scenario. Technically, | * To solve a more complex task, selected blocks can be chained into a block sequence, called also a scenario. Technically, | ||
- | * The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), | + | * The highest unit is called application. Applications correspond to end-to-end tasks, be they real end-user applications (such as machine translation), |
+ | |||
+ | This tutorial | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Layers of Linguistic Structures ==== | ||
+ | |||
+ | {{ external: | ||
+ | |||
+ | TectoMT blocks repository is saved in '' | ||
+ | |||
+ | Thus, the set of TectoMT layers is Cartesian product {S,T} x {English, | ||
+ | |||
+ | * {S,T} distinguishes whether the data was created by analysis or transfer/ | ||
+ | * {English, | ||
+ | * {W, | ||
+ | |||
+ | // | ||
+ | There are also other directories for other purpose blocks, for example blocks which only print out some information go to '' | ||
Line 48: | Line 107: | ||
===== First application ===== | ===== First application ===== | ||
- | Once you have TectoMT installed on your machine, you can find this tutorial in '' | + | Once you have TectoMT installed on your machine, you can find this tutorial in '' |
Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular '' | Most applications are defined in Makefiles, which describe sequence of blocks to be applied on our data. In our particular '' | ||
Line 64: | Line 123: | ||
* Each bundle contains tree shaped sentence representations on various linguistic layers. In our example '' | * Each bundle contains tree shaped sentence representations on various linguistic layers. In our example '' | ||
* Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node. | * Trees are formed by nodes and edges. Attributes can be attached only to nodes. Edge's attributes must be equivalently stored as the lower node's attributes. Tree's attributes must be stored as attributes of the root node. | ||
+ | |||
Line 88: | Line 148: | ||
<code bash> | <code bash> | ||
analyze: | analyze: | ||
- | | + | |
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
Line 99: | Line 159: | ||
<code bash> | <code bash> | ||
analyze: | analyze: | ||
- | | + | |
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
SEnglishW_to_SEnglishM:: | SEnglishW_to_SEnglishM:: | ||
Line 118: | Line 178: | ||
we can examine our '' | we can examine our '' | ||
+ | |||
+ | You can view the trees in '' | ||
+ | |||
+ | <code bash> | ||
+ | tmttred sample.tmt | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Line 138: | Line 220: | ||
* node - '' | * node - '' | ||
- | We'll now examine an example of a new block in file '' | + | You can get TectoMT automatically execute your block code on each document or bundle by defining the main block entry point: |
- | This block illustrates the most common methods for accessing objects: | + | * '' |
+ | * '' | ||
+ | |||
+ | Each block must have exactly one entry point. | ||
+ | |||
+ | We'll now examine an example of a new block in file '' | ||
+ | |||
+ | This block illustrates | ||
* '' | * '' | ||
Line 153: | Line 242: | ||
* '' | * '' | ||
- | Our tutorial block '' | + | Our tutorial block '' |
- | ==== '' | + | <code bash> |
+ | print_info: | ||
+ | brunblocks -S -o Tutorial:: | ||
+ | </code> | ||
- | TectoMT blocks repository is saved in '' | + | We can observe our new block behaviour: |
- | Thus, the set of TectoMT layers is Cartesian product {S,T} x {English, | + | <code bash> |
+ | make print_info | ||
+ | </ | ||
- | * {S,T} distinguishes whether the data was created by analysis or transfer/ | + | Try to change |
- | * {English, | + | |
- | * {W, | + | |
- | There are also other directories for other purpose blocks. Our blocks only prints out some information, | ||
- | <code bash> | ||
- | cp devel/ | ||
- | </ | ||
- | We also have to add this block to our scenario which can be done by adding new '' | ||
- | <code bash> | ||
- | printafun: | ||
- | eval ${BRUNBLOCKS} -S -o \ | ||
- | Print:: | ||
- | </ | ||
- | And we'll observe our new block's behaviour: | + | |
+ | ===== Advanced block: finite clauses ===== | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Motivation ==== | ||
+ | |||
+ | It is assumed that finite clauses can be translated independently, | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Task ==== | ||
+ | A block which, given an analytical tree ('' | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Instructions ==== | ||
+ | |||
+ | There is a block template with hints in '' | ||
<code bash> | <code bash> | ||
- | make printfun | + | finite_clauses: |
+ | brunblocks -S -o \ | ||
+ | Tutorial:: | ||
+ | Tutorial:: | ||
</ | </ | ||
+ | |||
+ | You are going to need these methods: | ||
+ | |||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | //Note//: '' | ||
+ | |||
+ | |||
+ | |||
+ | //Advanced version//: The output of our block might still be incorrect in special cases - we don't solve coordination and subordinate conjunctions. | ||
+ | |||
+ | |||
+ | |||
+ | ===== Your turn: more tasks ===== | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== SVO typology ==== | ||
+ | |||
+ | **Motivation**: | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Prepositions ==== | ||
+ | |||
+ | **Motivation**: | ||
+ | |||
+ | TODO obrazek | ||
+ | |||
+ | **Task**: The task is to rehang all prepositions as indicated at the picture. You may assume that prepositions have at most 1 child. | ||
+ | |||
+ | ** Instructions**: | ||
+ | |||
+ | You are going to need these new methods: | ||
+ | * '' | ||
+ | * '' | ||
+ | * '' | ||
+ | |||
+ | // | ||
+ | * On analytical layer, you can use this test to recognize prepositions: | ||
+ | * You can use block template in '' | ||
+ | |||
+ | |||
+ | //Advanced version//: What happens in case of multiword prepositions? | ||
+ | |||
- | ===== More advanced block ===== | ||
- | In this application, | ||
===== Further information ===== | ===== Further information ===== | ||
- | * [[http:// | + | * [[http:// |
+ | * Questions? Ask '' | ||
+ | * Solutions to this tutorial tasks are in '' | ||
+ | * [[http:// | ||