[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki

[ Back to the navigation ]

Table of Contents

Soft Syntactic Constraints for Hierarchical Phrase-based Translation Using Latent Syntactic Distributions

Zhongqiang Huang, Martin Čmejrek and Bowen Zhou
Conference on Empirical Methods in NLP, 2010

Presented by Jindřich Helcl
Report by Petr Jankovský


Authors introduce us to the hierarchical models and tell us about its (dis)advantages. Moreover, they decide to augment hierarchical phrase-based translation systems with novel syntactic features.

Section 2

We have explained what SCFG is, and on the examples showed how does it work. Authors define terms phrase, tight phrase pair and phrase pair. On the example pictures we showed word-aligned sentence pairs, and tight phrase pairs marked in a matrix representation. We asked, if there can be a cycle in alignment. Therefore we discussed, where is an alignment from, and find it out at page 139 - the SCFG rules of hierarchical phrase-based models are extracted automatically from corpora of word-aligned parallel sentence pairs (Brown et al., 1993; Och and Ney, 2000). But there is nothing more about it, so we agreed, that there can be some heuristics, which cycles does not allow. We explain us the extraction of SCFG rules and showed it at examples.

Section 3

In section 3 authors showed their idea to add distribution to each possible option of tag sequence in rules.

Other parts we have considered as quite technical, partly because of lack of time…


Authors inform us about relatively small statistically significant improvement of 0.6 BLEU on the test set of the English-to-German task, and smaller improvement of 0.41 BLEU on the test set of English-to-Chinese task (but there we don't know, whether this improvement is statistically significant or not)… Potential explanation for different results can be reason, that the sentences on the English-to-Chinese task are much shorter (6 words per sentence in average) compared to English-to-German task (15 words per sentence).

One of the author, Martin Čmejrek, is from time to time seen at UFAL, so there is possibility to ask him more questions.

[ Back to the navigation ] [ Back to the content ]