[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks:hr [2014/07/17 21:16]
zeman
user:zeman:treebanks:hr [2014/07/17 21:27]
zeman Finalizing the page.
Line 38: Line 38:
  
 The improved pre-release version contains 83640 tokens in 3736 sentences, yielding 22.39 tokens per sentence on average. The improved pre-release version contains 83640 tokens in 3736 sentences, yielding 22.39 tokens per sentence on average.
 +
 +There is no official training-test division of the original data. For HamleDT, we have split the data 90:10, i.e. the first 3362 sentences (75236 tokens) for training and the remaining 374 sentences (8404 tokens) for testing.
  
 ==== Inside ==== ==== Inside ====
Line 70: Line 72:
 (The sum of the percentages exceeds 100% because of rounding.) (The sum of the percentages exceeds 100% because of rounding.)
  
-==== XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ==== 
 ==== Sample ==== ==== Sample ====
  
-The first three sentences of the CoNLL 2006 training data:+The first three sentences of the improved pre-relase version:
  
-| 1 | Глава Nc | _ | 0 | ROOT ROOT +| 1 | Proces proces Ncmsn Ncmsn <nowiki>_</nowiki> | 0 | Elp <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 2 | трета Mo gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 1 | mod mod |+| 2 | privatizacije privatizacija Ncfsg Ncfsg | <nowiki>_</nowiki> | 1 | Obj | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
 +| 3 | na | na | Sl | Sl | <nowiki>_</nowiki> | 1 | Prep <nowiki>_</nowiki> <nowiki>_</nowiki>
 +| 4 | Kosovu | Kosovo | Npnsl | Npnsl | <nowiki>_</nowiki> | 3 | Adv | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 5 | pod | pod | Si | Si | <nowiki>_</nowiki> | 0 | Prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 6 | povećalom | povećalo | Ncnsi | Ncnsi | <nowiki>_</nowiki> | 5 | Elp | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
 | |||||||||| | ||||||||||
-| 1 | НАРОДНО | _ | An gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i mod mod +| 1 | Kosovo Kosovo | Npnsn | Npnsn | <nowiki>_</nowiki> Sb | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-СЪБРАНИЕ | _ | Nc gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i ROOT | 0 | ROOT |+| 2 | ozbiljno | ozbiljno | Rgp | Rgp | <nowiki>_</nowiki> | 3 | Adv <nowiki>_</nowiki><nowiki>_</nowiki> | 
 +analizira | analizirati | Vmr3s | Vmr3s | <nowiki>_</nowiki> | 0 | Pred | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +proces proces | Ncmsan | Ncmsan | <nowiki>_</nowiki> Obj | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +| 5 | privatizacije | privatizacija | Ncfsg | Ncfsg | <nowiki>_</nowiki> | 4 | Atr | <nowiki>_</nowiki><nowiki>_</nowiki> | 
 +| 6 | u | u | Sl | Sl | <nowiki>_</nowiki> | 3 | Prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 7 | svjetlu | svjetlo | Ncnsl | Ncnsl | <nowiki>_</nowiki> | 6 | Obj | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 8 | učestalih | učestao | Agpfpg | Agpfpg | <nowiki>_</nowiki> | 9 | Atr | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 9 | pritužbi | pritužba | Ncfpg | Ncfpg | <nowiki>_</nowiki> | 7 | Atr | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 10 | <nowiki>.</nowiki> | <nowiki>.</nowiki> | Z | Z | <nowiki>_</nowiki> | 0 | Punc | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
 | |||||||||| | ||||||||||
-| 1 | Народното An gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d 2 | mod | 2 | mod | +| 1 | Barem barem Rgp Rgp | <nowiki>_</nowiki> | 2 | Oth | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
-събрание | _ | Nc gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 3 | subj | 3 | subj +| 2 | na na Sl Sl <nowiki>_</nowiki> Prep | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 3 | осъществява Vpi trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 0 | ROOT | 0 | ROOT +| 3 | papiru papir Ncmsl Ncmsl | <nowiki>_</nowiki> | 2 | Obj | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | законодателната | _ | A | Af | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d mod 5 | mod | +| 4 | <nowiki>,</nowiki> <nowiki>,</nowiki>| <nowiki>_</nowiki> | 2 | Punc | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 5 | власт | _ | N | Nc | _ | 3 | obj | 3 | obj | +izgleda izgledati Vmr3s Vmr3s | <nowiki>_</nowiki>Pred | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 6 | и | _ | C | Cp | _ | 3 | conj | 3 | conj | +kao kao Cs Cs <nowiki>_</nowiki> Oth | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 7 | упражнява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 3 | conjarg | 3 | conjarg +odlična odličan Agpfsn Agpfsn | <nowiki>_</nowiki>Atr | <nowiki>_</nowiki> <nowiki>_</nowiki>
-парламентарен Am gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 9 | mod | 9 | mod | +ideja ideja Ncfsn Ncfsn | <nowiki>_</nowiki>Adv | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 9 | контрол | _ | N Nc gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 7 | obj | 7 | obj +| <nowiki>.</nowiki> <nowiki>.</nowiki>| <nowiki>_</nowiki> | 0 | Punc | <nowiki>_</nowiki> <nowiki>_</nowiki> |
-10 Punct Punct | _ | punct 3 | punct | +
- +
-The first three sentences of the CoNLL 2006 test data: +
- +
-| 1 | Единственото | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod | 2 | mod +
-решение Nc gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i ROOT | 0 | ROOT | +
-| |||||||||| +
-| 1 | Ерик | _ | N | Np | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT +
-Франк Np gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 1 | mod | 1 | mod | +
-| 3 | Ръсел | _ | H Hm gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod +
-| |||||||||| +
-| 1 | Пълен | _ | A | Am | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i 2 | mod | 2 | mod | +
-| 2 | мрак | _ | N Nc gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | +
-| 3 | и | _ | C | Cp | _ | 2 | conj | 2 | conj | +
-| 4 | пълна | _ | A | Af gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 5 | mod | 5 | mod | +
-| 5 | самота | _ | N | Nc | _ | 2 | conjarg | 2 | conjarg | +
-| 6 | . | _ | Punct | Punct | _ | 2 | punct | 2 | punct |+
  
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%). +Nonprojectivities in SETimes.HR are rare. Only 461 of the 83640 tokens in the pre-release version are attached nonprojectively (0.55%).
- +
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Bulgarian: +
- +
-^ Parser (Authors) ^ LAS ^ UAS ^ +
-| MST (McDonald et al.) | 87.57 | 92.04 | +
-| Malt (Nivre et al.) | 87.41 | 91.72 | +
-| Nara (Yuchang Cheng) | 86.34 | 91.30 |+
  
 +//Are there any published parsing results on this corpus?//

[ Back to the navigation ] [ Back to the content ]