[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering

written by Stephen Tratz and Eduard Hovy (Information Sciences Institute, University of Southern Carolina)

presented by Martin Popel

reported by Michal Novák

Introduction

The paper describes a high-quality conversion of Penn Treebank to dependency trees. The authors introduce an improved labeled dependency scheme based on the Stanford's one. In addition, they extend the non-directional easy-first first algorithm of Goldberg and Elhadad to support non-projective trees by adding “move” actions inspired by Nivre's swap-based reordering for shift-reduce parsing. Their parser is capable of producing shallow semantic annotations for prepositions, possesives and noun compounds.

Notes

Dependency conversion structure

Conversion process

Parser

MST parser <latex>\mathop O(n2)</latex>
MALT parser <latex>\mathop O(n)</latex> in fact slower
this parser <latex>\mathop O(n\log(n))</latex> <latex>\mathop O(n2)</latex> - naive implementation
this parser - non-projective <latex>\mathop O(n2\log(n))</latex> <latex>\mathop O(n3)</latex> - naive implementation

Features

Features

Evaluation

Shallow semantic annotation

Conclusion


[ Back to the navigation ] [ Back to the content ]