This is an old revision of the document!
Table of Contents
Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering
written by Stephen Tratz and Eduard Hovy (Information Sciences Institute, University of Southern Carolina)
presented by Martin Popel
reported by Michal Novák
Introduction
The paper describes a high-quality conversion of Penn Treebank to dependency trees. The authors introduce an improved labeled dependency scheme based on the Stanford's one. In addition, they extend the non-directional easy-first first algorithm of Goldberg and Elhadad to support non-projective trees by adding “move” actions inspired by Nivre's swap-based reordering for shift-reduce parsing. Their parser is capable of producing shallow semantic annotations for prepositions, possesives and noun compounds.
Notes
Dependency conversion structure
- in general, there are (at least) 3 possible types of dependency labels:
- unlabeled - is it really a set of labels?
- coarse labels of the CoNLL tasks
- 10-20 labels
- for example NMOD is always under a noun - it's an easy task and the result is not quite useful
- their scheme is based on the Stanford's dependency labels