[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:treebanks:grc [2011/12/06 15:00]
zeman Inside, sample and parsing.
user:zeman:treebanks:grc [2011/12/06 15:02]
zeman
Line 42: Line 42:
 The native file format of the treebank is based on XML. Greek letters are romanized using [[http://www.tlg.uci.edu/encoding/quickbeta.pdf|Beta Code]], a romanization scheme used widely not only in the Perseus project. It can be mapped 1-1 on the original Greek letters in UTF-8; however, embedded non-Greek words (such as the lemmas “comma” and “other”) cannot be identified automatically (and we do not want to decode them). The native file format of the treebank is based on XML. Greek letters are romanized using [[http://www.tlg.uci.edu/encoding/quickbeta.pdf|Beta Code]], a romanization scheme used widely not only in the Perseus project. It can be mapped 1-1 on the original Greek letters in UTF-8; however, embedded non-Greek words (such as the lemmas “comma” and “other”) cannot be identified automatically (and we do not want to decode them).
  
-Morphological annotation consists of lemma and nine-character positional morphosyntactic tags. Disambiguation has been done manually (gold standard).+Morphological annotation consists of lemma and nine-character positional morphosyntactic tag. Disambiguation has been done manually (gold standard).
  
 The syntactic annotation style is very similar to that of the Prague Dependency Treebank. The syntactic tags (analytical functions) are almost identical, too. However, in AGDT some combined values are permitted that are not valid in PDT, e.g. ''ATR_AP_ExD0_APOS''. The syntactic annotation style is very similar to that of the Prague Dependency Treebank. The syntactic tags (analytical functions) are almost identical, too. However, in AGDT some combined values are permitted that are not valid in PDT, e.g. ''ATR_AP_ExD0_APOS''.

[ Back to the navigation ] [ Back to the content ]