Unsupervised Construction of Large Paraphrase Corpora

Bill Dolan, Chris Quirk, and Chris Brockett: Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th international conference on Computational Linguistics (COLING '04), 2004.


  1. First, check the formula for AER presented in the paper - what do you think about it?
    Second, check the original Och&Ney's AER formula on page 4 of http://www.aclweb.org/anthology/P00-1056.pdf.
  2. Align manually the following two sentences:
    There was no chance it would endanger our planet, astronomers said
    NASA emphasized that there was never danger of a collision
    sure-links =
    possible-links =
  3. Suppose another annotator produced this alignment:
    sure-links = {There-there, was-was, planet-collision}
    possible-links = {no-never, our-a}
    Take this as a gold-standard and your sure-links from the previous question as the set A and compute Precision, Recall and AER (according to the Och&Ney's formula).
  4. Sum up (in 3-5 sentences) what you want to remember from reading this paper.
  5. Formulate at least 1 question that you would like to ask the authors of the paper.
  6. Bonus question: Check validity of the first sentence heuristic in another language - Find an article about certain event / same topic in three different non-English newspapers. Are the first two sentences paraphrases of each other?

