Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
courses:rg:extracting-parallel-sentences-from-comparable-corpora [2011/05/22 18:35] ivanova |
courses:rg:extracting-parallel-sentences-from-comparable-corpora [2011/05/22 18:39] ivanova |
||
---|---|---|---|
Line 16: | Line 16: | ||
===== Category 1: Features derived from word alignment ===== | ===== Category 1: Features derived from word alignment ===== | ||
- | - Číslovaný seznam log probability of the alignment | + | - Číslovaný seznam log probability of the alignment; |
- | - | + | - number of aligned/ |
+ | - longest aligned/ | ||
+ | - sentence length; | ||
+ | - the difference in relative document position of the two sentences. | ||
+ | Last two features are independent from word alignment. All these features are defined on sentence pairs and included in the binary classification and ranking models. | ||
+ | |||
+ | ==== Category 2: Distortion features ==== | ||
+ | One set of features bins distances between previous and current aligned sentences. Another set of features looks at the absolute difference between the expected position (one after the previous aligned sentence) and the actual position. | ||
+ | |||
+ | === Category 3: Features derived from Wikipedia markup === | ||