Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering

written by Stephen Tratz and Eduard Hovy (Information Sciences Institute, University of Southern Carolina)

presented by Martin Popel

reported by Michal Novák

Introduction

The paper describes a high-quality conversion of Penn Treebank to dependency trees. The authors introduce an improved labeled dependency scheme based on the Stanford's one. In addition, they extend the non-directional easy-first first algorithm of Goldberg and Elhadad to support non-projective trees by adding “move” actions inspired by Nivre's swap-based reordering for shift-reduce parsing. Their parser is capable of producing shallow semantic annotations for prepositions, possesives and noun compounds.

Notes

Dependency conversion structure

in general, there are (at least) 3 possible types of dependency labels:
- unlabeled - is it really a set of labels?
- coarse labels of the CoNLL tasks
  - 10-20 labels
  - for example NMOD is always under a noun - it's an easy task and the result is not quite useful
- their scheme is based on the Stanford's dependency labels

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Table of Contents

Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering

Introduction

Notes

Dependency conversion structure