[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
courses:rg:extracting-parallel-sentences-from-comparable-corpora [2011/05/22 19:11]
ivanova
courses:rg:extracting-parallel-sentences-from-comparable-corpora [2011/05/22 19:13]
ivanova
Line 26: Line 26:
 One set of features bins distances between previous and current aligned sentences. Another set of features looks at the absolute difference between the expected position (one after the previous aligned sentence) and the actual position. One set of features bins distances between previous and current aligned sentences. Another set of features looks at the absolute difference between the expected position (one after the previous aligned sentence) and the actual position.
  
-=== Category 3: features derived from Wikipedia markup ===+==== Category 3: features derived from Wikipedia markup ====
   - number of matching links in the sentence pairs;   - number of matching links in the sentence pairs;
   - image feature (if two sentences are captions of the same image);   - image feature (if two sentences are captions of the same image);
Line 32: Line 32:
   - bias feature (if the alignment is non-null).   - bias feature (if the alignment is non-null).
  
-=== Category 4: word-level induced lexicon features ===+==== Category 4: word-level induced lexicon features ====
   - Translation probability;   - Translation probability;
   - position difference;   - position difference;
Line 41: Line 41:
 Using this model, the authors generate a new translation table which is used to define another HMM word-alignment  model for use in sentence extraction model. Using this model, the authors generate a new translation table which is used to define another HMM word-alignment  model for use in sentence extraction model.
  
-== Evaluation ==+==== Evaluation ====
 __Data for evaluation:__ __Data for evaluation:__
 20 Wikipedia article pairs for Spanish-English, Bulgarian-English and German-English. Positive examples of sentence pairs in the datasets are the sentences that are mostly parallel with some missing words and sentences that are direct translations. 20 Wikipedia article pairs for Spanish-English, Bulgarian-English and German-English. Positive examples of sentence pairs in the datasets are the sentences that are mostly parallel with some missing words and sentences that are direct translations.

[ Back to the navigation ] [ Back to the content ]