[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:hr [2014/07/17 20:59]
zeman Size and Inside.
user:zeman:treebanks:hr [2014/07/17 21:16]
zeman
Line 42: Line 42:
  
 All sentences in the improved pre-release version are manually annotated on morphological and syntactic levels. The officially available version 1 is a mixture of manual and automatic annotation, see the section on sizes above. All sentences in the improved pre-release version are manually annotated on morphological and syntactic levels. The officially available version 1 is a mixture of manual and automatic annotation, see the section on sizes above.
 +
 +The treebank is distributed in the [[:format-conll|CoNLL 2006]] file format. Multext-East morphosyntactic tags appear in both the CPOS and POS columns, while the FEAT column is empty.
 +
 +In Version 1, if there is a token that has empty ("_") value of the DEPREL column, then the sentence has not been syntactically annotated (even though there //are// numbers in the HEAD column; these are fake head links, typically they refer to the same node).
 +
 +All sentences in the improved pre-release contain dependency information; however, at a few places there are errors introduced by the annotation software that result in a cyclic graph (not a tree).
 +
 +The syntactic tags (DEPREL) are simplistic but somewhat inspired by the Prague Dependency Treebank, there are only 15 of them:
 +
 +^ Tag ^ Percent ^ Example ^ Description ^
 +| Adv |  5% | Kosovu | adverbial modifier |
 +| Ap |  3% | Esat | appositional modifier, incl. first name attached to last name |
 +| Atr |  26% | privatizacije | attribute modifying a noun phrase |
 +| Atv |  2% | iskoristiti | ? |
 +| Aux |  7% | se | ? |
 +| Co |  3% | a | conjunction as coordination head (Prague-style coordinations) |
 +| Elp |  0.6% | Proces | ellipsis |
 +| Obj |  7% | privatizacije | object of a verb |
 +| Oth |  2% | Barem | other |
 +| Pnom |  2% | složen | nominal predicate attached to copula |
 +| Pred |  10% | analizira | predicate (verbal) |
 +| Prep |  10% | na | preposition |
 +| Punc |  13% | . | punctuation |
 +| Sb |  7% | Kosovo | subject |
 +| Sub |  4% | da | subordinating conjunction |
 +
 +(The sum of the percentages exceeds 100% because of rounding.)
  
 ==== XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ==== ==== XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ====

[ Back to the navigation ] [ Back to the content ]