[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:grc [2011/12/06 15:02]
zeman
user:zeman:treebanks:grc [2011/12/06 16:04]
zeman
Line 36: Line 36:
 ==== Size ==== ==== Size ====
  
-AGDT contains 309,092 tokens in 21165 sentences, yielding 14.60 tokens per sentence on average. No official training-test data split is defined. For our HamleDT experiments, we took the smallest file called ''1999.01.0015.xml'' (5949 tokens / 529 sentences; Aeschylus: //Suppliants//) for testing and the rest (303,143 tokens / 20636 sentences) for training.+AGDT contains 308,882 tokens in 21160 non-empty sentences, yielding 14.60 tokens per sentence on average. No official training-test data split is defined. For our HamleDT experiments, we took the smallest file called ''1999.01.0015.xml'' (5925 tokens / 528 sentences; Aeschylus: //Suppliants//) for testing and the rest (302,957 tokens / 20632 sentences) for training.
  
 ==== Inside ==== ==== Inside ====
Line 111: Line 111:
  </sentence></code>  </sentence></code>
  
-The same sentence converted to the CoNLL format, with Greek letters decoded:+The first sentence of the corpus converted to the CoNLL format, with Greek letters decoded (note that this is not the same sentence as above because the conversion script reorders sentences according to their sentence id):
  
 | 1 | ἄσημα | ἄσημος | a | a | pos=a<nowiki>|</nowiki>per=-<nowiki>|</nowiki>num=p<nowiki>|</nowiki>ten=-<nowiki>|</nowiki>mod=-<nowiki>|</nowiki>voi=-<nowiki>|</nowiki>gen=n<nowiki>|</nowiki>cas=a<nowiki>|</nowiki>deg=- | 6 | OBJ | _ | _ | | 1 | ἄσημα | ἄσημος | a | a | pos=a<nowiki>|</nowiki>per=-<nowiki>|</nowiki>num=p<nowiki>|</nowiki>ten=-<nowiki>|</nowiki>mod=-<nowiki>|</nowiki>voi=-<nowiki>|</nowiki>gen=n<nowiki>|</nowiki>cas=a<nowiki>|</nowiki>deg=- | 6 | OBJ | _ | _ |
Line 131: Line 131:
 ==== Parsing ==== ==== Parsing ====
  
-AGDT is an extremely nonprojective treebank, exceeding the nonprojectivity level found in other treebanks by an order of magnitude. 60469 out of the total 309,092 tokens are attached nonprojectively (19.56%).+AGDT is an extremely nonprojective treebank, exceeding the nonprojectivity level found in other treebanks by an order of magnitude. 60469 out of the total 308,882 tokens are attached nonprojectively (19.58%).
  
 I am not aware of any published evaluation of Ancient Greek parsing accuracy. I am not aware of any published evaluation of Ancient Greek parsing accuracy.
  

[ Back to the navigation ] [ Back to the content ]