Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision | |||
|
nianwen_abstract [2012/11/21 11:09] ufal odstraněno |
— (current) | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | **Nianwen (Bert) Xue, Assistant Professor Brandeis University. USA** | ||
| - | //Explicit and implicit discourse relations from a cross-lingual perspective – from experience in working on Chinese discourse annotation// | ||
| - | |||
| - | **Abstract** | ||
| - | |||
| - | In the field of computational linguistics or natural language | ||
| - | processing, progress in discourse analysis has been relatively slow, | ||
| - | as compared with syntactic parsing or semantic analysis (e.g., word | ||
| - | sense disambiguation, | ||
| - | statistical, | ||
| - | common linguistic resource that is widely accepted by the community is | ||
| - | key to advancing the state of the art in this area. To create | ||
| - | consistently annotated data for discourse analysis is particularly | ||
| - | challenging because one has to deal with larger linguistic structures | ||
| - | and there are few linguistic rules to follow. The key to successful | ||
| - | discourse annotation is to identify a well-grounded linguistic theory | ||
| - | that can be easily operationalized. In the Penn Discourse Treebank | ||
| - | (Prasad et al 2008, Webber and Joshi 1998) the field may have found | ||
| - | such a theory. In the PDTB conception, discourse relations revolve | ||
| - | around discourse connectives, | ||
| - | predicate that takes two arguments. In this way, discourse annotations | ||
| - | are anchored by discourse connectives and are thus lexicalized. In our | ||
| - | view, lexicalization has been crucial to the success of the PDTB as an | ||
| - | annotation project, a large-scale effort characterized by high | ||
| - | inter-annotator agreement, a standard metric for annotation | ||
| - | consistency. Lexicalization makes highly abstract discourse relations | ||
| - | grounded to a specific lexical item. In doing so, it localizes the | ||
| - | ambiguity in discourse relations to discourse connectives, | ||
| - | lexical item can have either a discourse connective use or a | ||
| - | non-discourse connective use (e.g., ``when" | ||
| - | connective can be ambiguous between different discourse relations | ||
| - | (e.g., ``since" | ||
| - | annotation task because each annotator can focus on only one discourse | ||
| - | connective at a time instead of scores of discourse relations. This in | ||
| - | turn enlarges the annotator pool and more annotators will be able to | ||
| - | perform the task without having to have extensive training. The long | ||
| - | list of annotators who worked on the PDTB annotation attests to this | ||
| - | observation. A larger annotator pool and a shorter learning curve | ||
| - | translates to the scalability of such an approach. | ||
| - | |||
| - | If lexicalization is so important to discourse annotation, what about | ||
| - | discourse relations that are not anchored by an explicit discourse | ||
| - | connective? The PDTB addresses this by assuming there is an {\it | ||
| - | implicit} discourse connective that connects its two arguments, which | ||
| - | are typically (parts of) adjacent sentences. This is operationalized | ||
| - | by identifying punctuation marks (e.g., periods) that serve as | ||
| - | boundaries of two adjacent sentences as anchors of implicit discourse | ||
| - | relations. | ||
| - | which discourse connective can be plausibly inserted between these two | ||
| - | adjacent sentences. In doing so, the PDTB assumes that (1) the range | ||
| - | of possible discourse relations anchored by implicit discourse | ||
| - | connectives are basically the same as those anchored by explicit | ||
| - | discourse relations, and (2) discourse relations anchored by implicit | ||
| - | discourse connectives are mostly local. The first assumption is | ||
| - | largely born out in the PDTB. Either a discourse connective can be | ||
| - | inserted between two adjacent sentences, | ||
| - | fact that they talk about the same entities, or there is no relation | ||
| - | between them. The last possibility has a direct bearing on the second | ||
| - | question: if there is no relation between two adjacent sentences, does | ||
| - | that mean that these sentences have no discourse relations at all with | ||
| - | the rest of the text, or that they are related to other discourse | ||
| - | segments that are non-local? | ||
| - | discourse segments are related in a coherent piece of text, and large | ||
| - | number of such ``no-relations" | ||
| - | to the PDTB approach. | ||
| - | |||
| - | While it might not be too much to expect that the same high-level | ||
| - | discourse relations hold across languages, it is almost certainly too | ||
| - | much to expect that discourse relations are lexicalized in the same | ||
| - | way across languages. The question is whether a lexicalized approach | ||
| - | to discourse analysis can still be maintained in languages where | ||
| - | discourse relations are lexicalized in ways that are significantly | ||
| - | different from English . Our experience in a pilot PDTB-style Chinese | ||
| - | discourse annotation project shows that the lexicalized approach can | ||
| - | be effectively adopted, although significant adaptations have to be | ||
| - | made. Chinese has the same types of discourse connectives (subordinate | ||
| - | conjunctions, | ||
| - | English, but they occur much less frequently because they can often be | ||
| - | | ||
| - | 80/20 (Zhou and Xue, 2012) rather than the roughly 50/50 split | ||
| - | reported for PDTB (Prasad et al 2008). However, by identifying | ||
| - | punctuation marks as boundaries of discourse segments and test whether | ||
| - | lexicalized discourse relations hold between adjacent comma-separated | ||
| - | discourse segments, we are able to show that Chinese discourse | ||
| - | annotation can be performed with very good consistency. More evidence | ||
| - | has to be gathered from the experience of other languages to test the | ||
| - | feasibility of lexicalized approaches to discourse annotation in a | ||
| - | multi-lingual setting, and such evidence will come soon now that such | ||
| - | an approach has been adopted in a number of discourse annotation | ||
| - | projects for a variety of different languages. | ||
| - | |||
| - | |||
| - | |||
| - | Bonnie Webber and Aravind Joshi. 1998. Anchoring a Lexicalized | ||
| - | Tree-Adjoining grammar for Discourse. In Proceedings of ACL/COLING | ||
| - | Workshop on Discourse Relations and Discourse Markers, Montreal, | ||
| - | Canada. August 1998. | ||
| - | |||
| - | Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio | ||
| - | Robaldo, Aravind Joshi, and Bonnie Webber. 2010. The Penn Discourse | ||
| - | Treebank 2.0. | ||
| - | In Proceedings of the 6th International Conference on Language | ||
| - | Resources and Evaluation (LREC 2008). Marrackech, Morocco. June 2008. | ||
| - | |||
| - | Yuping Zhou and Nianwen Xue. 2012. PDTB-style discourse annotation of | ||
| - | Chinese text. In Proceedings of ACL-2012. Jeju Island, Korea. | ||
| - | |||
