Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
courses:rg:extracting-parallel-sentences-from-comparable-corpora [2011/05/22 19:11] ivanova |
courses:rg:extracting-parallel-sentences-from-comparable-corpora [2011/05/22 19:13] ivanova |
||
---|---|---|---|
Line 26: | Line 26: | ||
One set of features bins distances between previous and current aligned sentences. Another set of features looks at the absolute difference between the expected position (one after the previous aligned sentence) and the actual position. | One set of features bins distances between previous and current aligned sentences. Another set of features looks at the absolute difference between the expected position (one after the previous aligned sentence) and the actual position. | ||
- | === Category 3: features derived from Wikipedia markup === | + | ==== Category 3: features derived from Wikipedia markup |
- number of matching links in the sentence pairs; | - number of matching links in the sentence pairs; | ||
- image feature (if two sentences are captions of the same image); | - image feature (if two sentences are captions of the same image); | ||
Line 32: | Line 32: | ||
- bias feature (if the alignment is non-null). | - bias feature (if the alignment is non-null). | ||
- | === Category 4: word-level induced lexicon features === | + | ==== Category 4: word-level induced lexicon features |
- Translation probability; | - Translation probability; | ||
- position difference; | - position difference; | ||
Line 41: | Line 41: | ||
Using this model, the authors generate a new translation table which is used to define another HMM word-alignment | Using this model, the authors generate a new translation table which is used to define another HMM word-alignment | ||
- | == Evaluation == | + | ==== Evaluation |
__Data for evaluation: | __Data for evaluation: | ||
20 Wikipedia article pairs for Spanish-English, | 20 Wikipedia article pairs for Spanish-English, |