Kathleen McKeown, Columbia University
Penn Discourse Treebank Relations and their Potential for Language Generation

(presentation)

In the early eighties, language generation researchers explored the use of rhetorical relations, in the form of schemata or common patterns of rhetorical structure (McKeown 85) and later in the form of rhetorical structure theory (RST) (Mann 84). Researchers in language generation showed how discourse structure could be used to plan the content of a text (McKeown 85, Moore and Paris 93, Hovy 88). In most cases, structure was linked in some way to content, whether directly or through planning how to satisfy speaker intentions, and this was critical to the success of using discourse structure for content planning. Later work (Barzilay 2010, Barzilay and Lapata 2005) took a modern approach to this problem, developing techniques to learn common discourse structures for specific domains and using these learned discourse structures to control content selection and organization.

In this panel discussion, I will address questions about how the Penn Discourse Treebank could be used for generation or summarization.

Using PDTB relations for determining content in text summarization has recently been addressed by Louis et al (Louis et al 2010). While they found that discourse structure was a strong indicator for determining salience for text summaries, they also found that lexical overlap performed equally well at determining salience and was easier to compute. This is a topic that could use further exploration. Could further research on the use of PDTB relations improve their performance to surpass the use of lexical indicators? Lexical indicators have been used for years in summarization and it would be somehow more satisfactory if other factors could be shown to play an important role. Could PDTB relations be used in conjunction with abstractive methods more effectively than extractive methods?

In language generation, discourse structure relations often play a prescriptive role in determining what to say next. If content has already been selected, that content in conjunction with discourse structure can be used to constrain what gets said next. PDTB relations have been empirically determined through analysis of text and there has been an effort to limit the range of relations. One natural question is whether PDTB relations should serve the same role as RST in generating of text or whether there is a difference in how they could be applied. Could the specific annotation of senses associated with relations be used to help determine content? There is an aspect of the PDTB which differs from earlier work on RST as it ties in closer to the syntactic structure of the text. Could the close coupling of discourse structure, syntactic structure and sense annotation offer an advantage over previous methods? One possibility would be to explore the role it could play in sentence planning, the problem of determining how to combine simple propositions to generate more complex sentences.

Regina Barzilay. 2010. Probabilistic Approaches for Modeling Text Structure and Their Application to Text-to-Text Generation. In Emiel Krahmer and Mariet Theune, editors, Empirical Methods in Natural Language Generation: Data-oriented Methods and Empirical Evaluation, Springer, 2010.

Regina Barzilay and Mirella Lapata. 2005. Collective Content Selection for Concept-To-Text Generation. In Proceedings of EMNLP, 2005.

Eduard Hovy. 1988. Planning coherent multisentential text. In Proceedings of the 26th annual meeting on Association for Computational Linguistics, 1988, pages 163-169.

Annie Louis and Ani Nenkova. 2012. A coherence model based on syntactic patterns. In Proceedings of EMNLP-CoNLL, 2012.

Bill Mann. 1984. Discourse structures for text generation. In Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics, Stroudsburg, PA, 1984, pp. 367-375.

McKeown, K.R. 1985. Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Cambridge University Press, Cambridge, England, 1985.

Johanna Moore and Cecile Paris. 1993. Planning text for advisory dialogues: capturing intentional and rhetorical information. In Journal Computational Linguistics, Volume 19 Issue 4, December 1993, pages 651-694.