Semantic textual similarity using maximal weighted bipartite graph matching
1) What are the drawbacks of WordNet-based word similarity?
2) Suppose we changed randomly the word order of the input sentences.
Which systems' (baseline, system 1, 2, 3) output similarity scores will be affected?
3) What is the baseline similarity score of the following two sentence pairs?
a)
A man is riding a bicycle.
A man is riding a bike.
b)
John loves Mary
Mary loves John
4) Check “Gold Standard” guidelines in http://www.cs.york.ac.uk/semeval-2012/task6/data/uploads/datasets/train-readme.txt and assign gold standard scores to the two sentence pairs above.
Compute Pearson correlation of baseline and gold standard scores.
5) Imagine you are a reviewer of this paper and write a review (just the main points/objections, skip the intro/abstract).
Hints:
http://en.wikipedia.org/wiki/Cosine_similarity
http://en.wikipedia.org/wiki/Pearson_correlation_coefficient