Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
cost-training-school-2017:synopsis_dz [2016/10/03 12:27] ufal |
cost-training-school-2017:synopsis_dz [2016/10/03 12:28] (current) ufal |
||
---|---|---|---|
Line 3: | Line 3: | ||
Corpora annotated at the discourse level can help theoretical advancements and are ultimately inputs to various language technology tasks. | Corpora annotated at the discourse level can help theoretical advancements and are ultimately inputs to various language technology tasks. | ||
- | The Penn Discourse TreeBank (PDTB) is a richly-annotated resource for discourse relations in English (Prasad, et al. 2014). It has already been used reliably for annotating discourse in various languages such as Hindi (Oza, et al., 2009), Chinese (Zhou & Xue, 2015), and Turkish (Zeyrek, et al. 2013). This talk will introduce a new multilingual discourse annotation effort, an initiative undertaken by a group of scholars within Textlink, annotating discourse in the PDTB style. Different from the monolingual corpora, TED Multilingual Discourse Bank, or TED-MDB involves the parallel annotation of a subset of TED talks in six languages — English, Turkish, European Portuguese, Polish, German and Russian. It annotates both explicit and implicit discourse relations at the inter-sentential level, focusing on explicit relations at the intra-sentential level. I will describe our on-going work on the corpus, and discuss the benefits and challenges involved in creating it. | + | The Penn Discourse TreeBank (PDTB) is a richly-annotated resource for discourse relations in English (Prasad, et al. 2014). It has already |
+ | been used reliably for annotating discourse in various languages such as Hindi (Oza, et al., 2009), Chinese (Zhou & Xue, 2015), | ||
+ | and Turkish (Zeyrek, et al. 2013). This talk will introduce a new multilingual discourse annotation effort, an initiative undertaken | ||
+ | by a group of scholars within Textlink, annotating discourse in the PDTB style. Different from the monolingual corpora, TED Multilingual | ||
+ | Discourse Bank, or TED-MDB involves the parallel annotation of a subset of TED talks in six languages — English, Turkish, | ||
+ | European Portuguese, Polish, German and Russian. It annotates both explicit and implicit discourse relations at the inter-sentential level, | ||
+ | focusing on explicit relations at the intra-sentential level. I will describe our on-going work on the corpus, and discuss the benefits | ||
+ | and challenges involved in creating it. | ||
=== References === | === References === | ||
Oza, U. et al. (2009). The hindi discourse relation bank. Proceedings of the third linguistic annotation workshop (pp. 158-161). Association for Computational Linguistics. | Oza, U. et al. (2009). The hindi discourse relation bank. Proceedings of the third linguistic annotation workshop (pp. 158-161). Association for Computational Linguistics. |