Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
user:zeman:joshua [2010/03/08 15:46] zeman Long sentences are problem. |
user:zeman:joshua [2010/03/08 15:47] zeman |
||
---|---|---|---|
Line 307: | Line 307: | ||
===== Troubleshooter ===== | ===== Troubleshooter ===== | ||
+ | |||
Line 313: | Line 314: | ||
If you encounter this exception during corpus binarization or (in older releases of Joshua) during grammar extraction, check your alignment file whether it matches your source and target corpus. Did you switch translation direction accidentially? | If you encounter this exception during corpus binarization or (in older releases of Joshua) during grammar extraction, check your alignment file whether it matches your source and target corpus. Did you switch translation direction accidentially? | ||
- | Another source of this error could be sentences with 100 or more words. This is not a strict limit, often I was able to extract grammars for corpora unchecked for such sentences, but according to Lance Schwartz, long sentences can cause problems. And after all, they are suspicious anyway, and their contribution to the learnt model is doubtful. | + | Another source of this error could be sentences with 100 or more words. This is not a strict limit, often I was able to extract grammars for corpora unchecked for such sentences, but according to Lance Schwartz, long sentences can cause problems. |
==== ZMERT: corrupted temp file ==== | ==== ZMERT: corrupted temp file ==== |