Differences
This shows you the differences between two versions of the page.
Next revision Both sides next revision | |||
courses:rg:extracting-parallel-sentences-from-comparable-corpora [2011/05/22 18:33] ivanova vytvořeno |
courses:rg:extracting-parallel-sentences-from-comparable-corpora [2011/05/22 18:35] ivanova |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Introduction ====== | ====== Introduction ====== | ||
- | |||
Article is about parallel sentence extraction from Wikipedia. This resource can be viewed as comparable corpus in which the document alignment is already provided by the interwiki links. | Article is about parallel sentence extraction from Wikipedia. This resource can be viewed as comparable corpus in which the document alignment is already provided by the interwiki links. | ||
====== Training models ====== | ====== Training models ====== | ||
- | |||
Authors train three models: | Authors train three models: | ||
* binary classifier model; | * binary classifier model; | ||
* ranking model; | * ranking model; | ||
- | * Conditional Random Field (CRF) model | + | * conditional random field (CRF) model. |
When the binary classifier is used, there is a substantial class imbalance: O(n) positive examples and O(n²) negative examples. | When the binary classifier is used, there is a substantial class imbalance: O(n) positive examples and O(n²) negative examples. | ||
Line 18: | Line 16: | ||
===== Category 1: Features derived from word alignment ===== | ===== Category 1: Features derived from word alignment ===== | ||
- | + | - Číslovaný seznam log probability of the alignment | |
+ | - | ||