[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
cost-training-school-2017:synopsis_sc [2017/01/12 10:04]
ufal created
cost-training-school-2017:synopsis_sc [2017/01/27 15:29] (current)
ufal
Line 1: Line 1:
-**** Statistics in linguistics - basics and case examples ​****+==== Statistics in linguistics - basics and case examples ​====
 //(Silvie Cinková)// //(Silvie Cinková)//
  
 The tutorial seeks to provide students with a basic understanding of data analysis applied to a particular linguistic data set and to a set of working hypotheses concerning the association between genre and discourse structure, using a few common statistical methods. The tutorial seeks to provide students with a basic understanding of data analysis applied to a particular linguistic data set and to a set of working hypotheses concerning the association between genre and discourse structure, using a few common statistical methods.
  
-The dataset contains annotations of discourse ​connectors ​extracted from the Prague Dependency Treebank 3.0. The individual occurrences of discourse ​connectors ​are annotated with two different label sets (“discourse type” and “discourse class”). In addition, the data contains sentence ID and information about the genre and size of the document for each occurrence. This data set will be used to exemplify how to:+The dataset contains annotations of discourse ​connectives ​extracted from the Prague Dependency Treebank 3.0. The individual occurrences of discourse ​connectives ​are annotated with two different label sets (“discourse type” and “discourse class”). In addition, the data contains sentence ID and information about the genre and size of the document for each occurrence. This data set will be used to exemplify how to:
  
 1. describe and summarize the data set, as well as prepare it for further statistical analysis (key words: "tidy data" and "data wrangling"​) 1. describe and summarize the data set, as well as prepare it for further statistical analysis (key words: "tidy data" and "data wrangling"​)
Line 13: Line 13:
 This tutorial is meant to serve as a starter for individual studies of quantitative methods in linguistics aided by R. No effort is being made to explain the mathematical background of the statistical concepts and methods used. This tutorial is meant to serve as a starter for individual studies of quantitative methods in linguistics aided by R. No effort is being made to explain the mathematical background of the statistical concepts and methods used.
  
 +=== References: ===
 +Poláková Lucie, Jínová Pavlína, Mírovský Jiří: [[http://​www.lrec-conf.org/​proceedings/​lrec2014/​pdf/​195_Paper.pdf|Genres in the Prague Discourse Treebank.]] In: //​Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014)//, European Language Resources Association,​ Reykjavík, Iceland, ISBN 978-2-9517408-8-4,​ pp. 1320-1326, 2014
  

[ Back to the navigation ] [ Back to the content ]