[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
cost-training-school-2017:synopsis_dz [2016/10/03 12:27]
ufal
cost-training-school-2017:synopsis_dz [2016/10/03 12:28] (current)
ufal
Line 3: Line 3:
  
 Corpora annotated at the discourse level can help theoretical advancements and are ultimately inputs to various language technology tasks. Corpora annotated at the discourse level can help theoretical advancements and are ultimately inputs to various language technology tasks.
-The Penn Discourse TreeBank (PDTB) is a richly-annotated resource for discourse relations in English (Prasad, et al. 2014). It has already been used reliably for annotating discourse in various languages such as Hindi (Oza, et al., 2009), Chinese (Zhou & Xue, 2015), and Turkish (Zeyrek, et al. 2013). This talk will introduce a new multilingual discourse annotation effort, an initiative undertaken by a group of scholars within Textlink, annotating discourse in the PDTB style. Different from the monolingual corpora, TED Multilingual Discourse Bank, or TED-MDB involves the parallel annotation of a subset of TED talks in six languages — English, Turkish, European Portuguese, Polish, German and Russian. It annotates both explicit and implicit discourse relations at the inter-sentential level, focusing on explicit relations at the intra-sentential level. I will describe our on-going work on the corpus, and discuss the benefits and challenges involved in creating it. +The Penn Discourse TreeBank (PDTB) is a richly-annotated resource for discourse relations in English (Prasad, et al. 2014). It has already 
 +been used reliably for annotating discourse in various languages such as Hindi (Oza, et al., 2009), Chinese (Zhou & Xue, 2015), 
 +and Turkish (Zeyrek, et al. 2013). This talk will introduce a new multilingual discourse annotation effort, an initiative undertaken 
 +by a group of scholars within Textlink, annotating discourse in the PDTB style. Different from the monolingual corpora, TED Multilingual 
 +Discourse Bank, or TED-MDB involves the parallel annotation of a subset of TED talks in six languages — English, Turkish, 
 +European Portuguese, Polish, German and Russian. It annotates both explicit and implicit discourse relations at the inter-sentential level, 
 +focusing on explicit relations at the intra-sentential level. I will describe our on-going work on the corpus, and discuss the benefits 
 +and challenges involved in creating it. 
 === References === === References ===
 Oza, U. et al. (2009). The hindi discourse relation bank. Proceedings of the third linguistic annotation workshop (pp. 158-161). Association for Computational Linguistics. Oza, U. et al. (2009). The hindi discourse relation bank. Proceedings of the third linguistic annotation workshop (pp. 158-161). Association for Computational Linguistics.

[ Back to the navigation ] [ Back to the content ]