Differences

This shows you the differences between two versions of the page.

--- diskurz_doporucena_literatura [2012/03/12 08:33]
hladka
+++ diskurz_doporucena_literatura [2012/06/11 15:23]
hladka
@@ Line 4: / Line 4: @@
 **Abstract**
 This article describes an implemented system which uses centering theory for planning of coherent texts and choice of referring expressions. We argue that text and sentence planning need to be driven in part by the goal of maintaining referential continuity and thereby facilitating pronoun resolution: Obtaining a favorable ordering of clauses, and of arguments within clauses, is likely to increase opportunities for nonambiguous pronoun use. Centering theory provides the basis for such an integrated approach. Generating coherent texts according to centering theory is treated as a constraint satisfaction problem. The well-known Rule 2 of centering theory is reformulated in terms of a set of constraints—cohesion, salience, cheapness, and continuity—and we show sample outputs obtained under a particular weighting of these constraints. This framework facilitates detailed research into evaluation metrics and will therefore provide a productive research tool in addition to the immediate practical benefit of improving the fluency and readability of generated texts. The technique is generally applicable to natural language generation systems, which perform hierarchical text structuring based on a theory of coherence relations with certain additional assumptions.
+Článek se týká generování (koherentního) textu – snaží se nalézt a popsat pravidla, podle kterých lze automaticky generovat koherentní text. Vychází přitom z tzv. pravidlového přístupu (ne ze statistického).
+Teoretickým základem je tzv. teorie center („centering theory“). „Centering theory“ je spojena zejm. se jmény Grosz, Joshi a Weinstein a dále např. McCoy, Strube, Henschel, Cheng a Poesio.
+K //centering theory// (CT)
+Týká se určování vztahů mezi anaforicky provázanými elementy – srov. věty:
+(1) Petra má ráda Pavlu. Často ji navštěvuje. Ráda se dívá na filmy, a proto obě často chodí do kina.
+Gramatický rod už nestačí k tomu, abychom ve druhé větě správně identifikovali zájmeno ji (resp. kdo koho navštěvuje). Přesto víme, že ji se s velkou pravděpodobností vztahuje na Pavlu a že podmětem k přísudku navštěvuje je spíše Petra. CT se snaží charakterizovat právě tento druh diskurzní koherence. K tomu slouží mj. pojem SALIENCE (= stupeň aktivovanosti urč. prvku v textu). Petra je aktivovanějším prvkem (má vyšší stupeň salience) než Pavla, protože je v roli podmětu v první větě.
+(2) S Radkem není něco v pořádku. Chová se divně. Včera volal ve tři ráno Kubovi. (#Radek ho chtěl…) Chtěl ho za každou cenu vidět.
+Na druhou stranu v příkladu (2) v poslední větě užijeme spíše nevyjádřeného podmětu, než abychom opakovali slovo Radek. Je to proto, že prvek Radek je stále vysoce aktivovaný, a tudíž není problém spojit si jej při čtení s nevyjádřeným podmětem poslední věty.
+Každý text se skládá ze sekvencí několika vět U1, U2, U3 … Un. Každý diskurzní referent v nich má určitý stupeň salience. Stupeň aktivovanosti jednoho diskurzního referentu se přitom v „průběhu“ textu mění.
+Pro každou větnou jednotku Ui existuje seznam diskurzních referentů s dosahem „dopředu“, což je tzv. Forward-looking Center, značí se CF (Ui, D). Každý prvek z tohoto CF musí být jazykově realizován (týká se angličtiny).
+V seznamu existuje určité pořadí. To je spojeno se syntaktickou rolí diskurzních referentů: podmět > přímý předmět > nepřímý předmět > ostatní.
+Nejvýše řazený prvek z CF je tzv. preferované centrum („Preferred Center“), značí se CP (Ui, D) – podle výše zmíněné stupnice je jím tedy podmět.
+Pro věty Ui diskurzu D, kterými text nezačíná, existuje tzv. Backward-looking Center, značí se CB (Ui, D). Tím je nejvýše řazený prvek v CF (Ui-1, D), který se vyskytuje také v CF (Ui, D).
    *Florian Wolf, Edward Gibson. 2005. Representing Discourse Coherence: A Corpus-Based Study. //Computational Linguistics//, Vol. 31, No. 2, pp. 249--287. [[http://acl.ldc.upenn.edu/J/J05/J05-2005.pdf|pdf]]
@@ Line 13: / Line 40: @@
    *Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi, Bonnie Webber. 2008. The Penn Discourse TreeBank 2.0. In //Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008)// [[http://www.seas.upenn.edu/~pdtb/papers/pdtb-lrec08.pdf]]
 **Abstract**
-We present the second version of the Penn Discourse Treebank, PDTB-2.0, describing its lexically-grounded annotations of discourse
+We present the second version of the Penn Discourse Treebank, PDTB-2.0, describing its lexically-grounded annotations of discourse relations and their two abstract object arguments over the 1 million word Wall Street Journal corpus. We describe all aspects of the annotation, including (a) the argument structure of discourse relations, (b) the sense annotation of the relations, and (c) the attribution of discourse relations and each of their arguments. We list the differences between PDTB-1.0 and PDTB-2.0. We present representative statistics for several aspects of the annotation in the corpus.
-relations and their two abstract object arguments over the 1 million word Wall Street Journal corpus. We describe all aspects of the
-annotation, including (a) the argument structure of discourse relations, (b) the sense annotation of the relations, and (c) the attribution
-of discourse relations and each of their arguments. We list the differences between PDTB-1.0 and PDTB-2.0. We present representative
-statistics for several aspects of the annotation in the corpus.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences