Training Phrase Translation Models with Leaving-One-Out
paper by Joern Wuebker, Arne Mauser and Hermann Ney
presented by Bushra Jawaid
report by Rudolf Rosa
Presentation
The paper was well presented. Bushra talked about the paper in great detail, even including some information from the related papers. However, this lead to a time shortage towards the end of the presentation.
Bushra managed to understand and explain the work of the authors, pointing out unclear passages, which were then addressed in the discussion.
Discussion
Many questions were raised during and after the presentation of the paper, most often by Martin Popel. Some of them were answered by Bushra, some of them by Martin, some by other participants, and several have been left unanswered.
The discussion also included mere comments, such as inacuracies in the paper.
Why does the paper talk about “forced alignment”?
The paper describes three ways of using the alignment (best alignment, n-best alignments, all alignments), but Formula (3) only applies to the first way.
The authors claim to have avoided using “heuristics” for phrase alignment. However, they do use the heuristics both in the first part and then in the interpolation.
From the paper, it is not clear why in Figure 2 the whole sentence is not extracted.
The whole sentence is by definition always consistent with the word alignment.
There may be a hard limit for maximum phrase length, but this is not mentioned in the paper.
We discussed whether even singleton phrases should be extracted or whether it would be better to skip them.
We found that the meaning of “cross-validation” is unclear from the paper.
It seems to us that the authors simply used a misleading term here, as they use the procedure not for validation but for training (probably they just perform “leaving 10,000 out” instead of “leaving 1 out”).
We discussed whether in Table 4, N stands for the number of different alignments for a pair of sentences, but found out that we are rather unsure about that.