===== Unsupervised Construction of Large Paraphrase Corpora ===== Bill Dolan, Chris Quirk, and Chris Brockett: [[http://research.microsoft.com/pubs/68974/para_coling2004.pdf|Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources]]. In: Proceedings of the 20th international conference on Computational Linguistics (COLING '04), 2004. ===== Questions ===== - First, check the formula for AER presented in the paper - what do you think about it? \\ Second, check the original Och&Ney's AER formula on page 4 of http://www.aclweb.org/anthology/P00-1056.pdf. - Align manually the following two sentences: \\ There was no chance it would endanger our planet, astronomers said \\ NASA emphasized that there was never danger of a collision\\ sure-links = \\ possible-links = - Suppose another annotator produced this alignment: \\ sure-links = {There-there, was-was, planet-collision} \\ possible-links = {no-never, our-a} \\ Take this as a gold-standard and your sure-links from the previous question as the set A and compute Precision, Recall and AER (according to the Och&Ney's formula). - Sum up (in 3-5 sentences) what you want to remember from reading this paper. - Formulate at least 1 question that you would like to ask the authors of the paper. - Bonus question: Check validity of the first sentence heuristic in another language - Find an article about certain event / same topic in three different non-English newspapers. Are the first two sentences paraphrases of each other?