[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
nianwen_abstract [2012/11/21 11:09]
ufal odstraněno
— (current)
Line 1: Line 1:
-**Nianwen (Bert) Xue, Assistant Professor Brandeis University. USA** 
-//Explicit and implicit discourse relations from a cross-lingual perspective – from experience in working on Chinese discourse annotation//  
- 
-**Abstract**  
- 
-In the field of computational linguistics or natural language 
-processing, progress in discourse analysis has been relatively slow, 
-as compared with syntactic parsing or semantic analysis (e.g., word 
-sense disambiguation, semantic role labeling). In this age when 
-statistical, data-driven approaches dominate the field, having a 
-common linguistic resource that is widely accepted by the community is 
-key to advancing the state of the art in this area. To create 
-consistently annotated data for discourse analysis is particularly 
-challenging because one has to deal with larger linguistic structures 
-and there are few linguistic rules to follow. The key to successful 
-discourse annotation is to identify a well-grounded linguistic theory 
-that can be easily operationalized. In the Penn Discourse Treebank 
-(Prasad et al 2008, Webber and Joshi 1998) the field may have found 
-such a theory. In the PDTB conception, discourse relations revolve 
-around discourse connectives, where each discourse connective is a 
-predicate that takes two arguments. In this way, discourse annotations 
-are anchored by discourse connectives and are thus lexicalized. In our 
-view, lexicalization has been crucial to the success of the PDTB as an 
-annotation project, a large-scale effort characterized by high 
-inter-annotator agreement, a standard metric for annotation 
-consistency. Lexicalization makes highly abstract discourse relations 
-grounded to a specific lexical item.  In doing so, it localizes the 
-ambiguity in discourse relations to discourse connectives, where a 
-lexical item can have either a discourse connective use or a 
-non-discourse connective use (e.g., ``when"), and one discourse 
-connective can be ambiguous between different discourse relations 
-(e.g., ``since"). As a result, it reduces the cognitive load of the 
-annotation task because each annotator can focus on only one discourse 
-connective at a time instead of scores of discourse relations. This in 
-turn enlarges the annotator pool and more annotators will be able to 
-perform the task without having to have extensive training. The long 
-list of annotators who worked on the PDTB annotation attests to this 
-observation. A larger annotator pool and a shorter learning curve 
-translates to the scalability of such an approach. 
- 
-If lexicalization is so important to discourse annotation, what about 
-discourse relations that are not anchored by an explicit discourse 
-connective? The PDTB addresses this by assuming there is an {\it 
-implicit} discourse connective that connects its two arguments, which 
-are typically (parts of) adjacent sentences. This is operationalized 
-by identifying punctuation marks (e.g., periods) that  serve as 
-boundaries of two adjacent sentences as anchors of implicit discourse 
-relations.  The specific discourse relation is determined by testing 
-which discourse connective can be plausibly inserted between these two 
-adjacent sentences. In doing so, the PDTB assumes that (1) the range 
-of possible discourse relations anchored by implicit discourse 
-connectives are basically the same as those anchored by explicit 
-discourse relations, and (2) discourse relations anchored by implicit 
-discourse connectives are mostly local. The first assumption is 
-largely born out in the PDTB. Either a discourse connective can be 
-inserted between two adjacent sentences,  or they are related by the 
-fact that they talk about the same entities, or there is no relation 
-between them. The last possibility has a direct bearing on the second 
-question: if there is no relation between two adjacent sentences, does 
-that mean that these sentences have no discourse relations at all with 
-the rest of the text, or that they are related to other discourse 
-segments that are non-local?  It is reasonable to assume that all 
-discourse segments are related in a coherent piece of text, and large 
-number of such ``no-relations" would call for a significant expansion 
-to the PDTB approach. 
- 
-While it might not be too much to expect that the same high-level 
-discourse relations hold across languages, it is almost certainly too 
-much to expect that discourse relations are lexicalized in the same 
-way across languages. The question is whether a lexicalized approach 
-to discourse analysis can still be maintained in languages where 
-discourse relations are lexicalized in ways that are significantly 
-different from English . Our experience in a pilot PDTB-style Chinese 
-discourse annotation project shows that the lexicalized approach can 
-be effectively adopted, although significant adaptations have to be 
-made. Chinese has the same types of discourse connectives (subordinate 
-conjunctions, coordinate conjunctions, and discourse adverbials) as 
-English, but they occur much less frequently because they can often be 
- dropped. The ratio of implicit and explicit connectives  is about 
-80/20 (Zhou and Xue, 2012) rather than the roughly 50/50 split 
-reported for PDTB (Prasad et al 2008). However, by identifying 
-punctuation marks as boundaries of discourse segments and test whether 
-lexicalized discourse relations hold between adjacent comma-separated 
-discourse segments, we are able to show that  Chinese discourse 
-annotation can be performed with very good consistency. More evidence 
-has to be gathered from the experience of other languages to test the 
-feasibility of lexicalized approaches to discourse annotation in a 
-multi-lingual setting, and such evidence will come soon now that such 
-an approach has been adopted in a number of discourse annotation 
-projects for a variety of  different languages. 
- 
- 
- 
-Bonnie Webber and Aravind Joshi. 1998. Anchoring a Lexicalized 
-Tree-Adjoining grammar for Discourse. In Proceedings of ACL/COLING 
-Workshop on Discourse Relations and Discourse Markers, Montreal, 
-Canada. August 1998. 
-  
-Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio 
-Robaldo, Aravind Joshi, and Bonnie Webber. 2010. The Penn Discourse 
-Treebank 2.0. 
-In Proceedings of the 6th International Conference on Language 
-Resources and Evaluation (LREC 2008). Marrackech, Morocco. June 2008. 
- 
-Yuping Zhou and Nianwen Xue. 2012. PDTB-style discourse annotation of 
-Chinese text. In Proceedings of ACL-2012. Jeju Island, Korea. 
- 
  

[ Back to the navigation ] [ Back to the content ]