[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki

[ Back to the navigation ]

Massimo Poesio, University of Essex
Empirical methods in the study of anaphora: lessons learned, remaining problems

In the last ten years we witnessed the creation of anaphorically annotated corpora [1] of substantial size (between 500,000 and 1 million tokens) and for many languages, including Arabic, Catalan, Chinese, Czech, Dutch, English, German, Italian, Japanese, and Spanish. These resources have enabled a flourishing of evaluation initiatives devoted to the cross-lingual computational study of anaphora, such as SEMEVAL-2010, the CONLL 2011 shared task, and now the CONLL 2012 shared task (Arabic, Chinese and English). The results obtained in such campaigns indicate, however, that there is still a way to go before this task is understood to the degree of other aspects of natural language interpretation, including tasks such as semantic role labelling. In this talk I will discuss the lessons learned during our experience with the annotation of the GNOME and ARRAU corpora of English, the LiveMemories corpus of Italian, and the ongoing annotation using the Phrase Detective game [2] and the issues that still remain to be tackled.

[1] I will use the term ‘anaphora’ to refer to the linguistic task as defined, say, in Discourse Representation Theory, in contrast with the ’coreference’ task in the sense of ACE and MUC.
[2] www.phrasedetectives.org

[ Back to the navigation ] [ Back to the content ]