[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:grc [2011/12/06 15:02]
zeman
user:zeman:treebanks:grc [2011/12/06 16:04]
zeman
Line 36: Line 36:
 ==== Size ==== ==== Size ====
  
-AGDT contains ​309,092 tokens in 21165 sentences, yielding 14.60 tokens per sentence on average. No official training-test data split is defined. For our HamleDT experiments,​ we took the smallest file called ''​1999.01.0015.xml''​ (5949 tokens / 529 sentences; Aeschylus: //​Suppliants//​) for testing and the rest (303,143 tokens / 20636 sentences) for training.+AGDT contains ​308,882 tokens in 21160 non-empty ​sentences, yielding 14.60 tokens per sentence on average. No official training-test data split is defined. For our HamleDT experiments,​ we took the smallest file called ''​1999.01.0015.xml''​ (5925 tokens / 528 sentences; Aeschylus: //​Suppliants//​) for testing and the rest (302,957 tokens / 20632 sentences) for training.
  
 ==== Inside ==== ==== Inside ====
Line 111: Line 111:
  </​sentence></​code>​  </​sentence></​code>​
  
-The same sentence converted to the CoNLL format, with Greek letters decoded:+The first sentence ​of the corpus ​converted to the CoNLL format, with Greek letters decoded ​(note that this is not the same sentence as above because the conversion script reorders sentences according to their sentence id):
  
 | 1 | ἄσημα | ἄσημος | a | a | pos=a<​nowiki>​|</​nowiki>​per=-<​nowiki>​|</​nowiki>​num=p<​nowiki>​|</​nowiki>​ten=-<​nowiki>​|</​nowiki>​mod=-<​nowiki>​|</​nowiki>​voi=-<​nowiki>​|</​nowiki>​gen=n<​nowiki>​|</​nowiki>​cas=a<​nowiki>​|</​nowiki>​deg=- | 6 | OBJ | _ | _ | | 1 | ἄσημα | ἄσημος | a | a | pos=a<​nowiki>​|</​nowiki>​per=-<​nowiki>​|</​nowiki>​num=p<​nowiki>​|</​nowiki>​ten=-<​nowiki>​|</​nowiki>​mod=-<​nowiki>​|</​nowiki>​voi=-<​nowiki>​|</​nowiki>​gen=n<​nowiki>​|</​nowiki>​cas=a<​nowiki>​|</​nowiki>​deg=- | 6 | OBJ | _ | _ |
Line 131: Line 131:
 ==== Parsing ==== ==== Parsing ====
  
-AGDT is an extremely nonprojective treebank, exceeding the nonprojectivity level found in other treebanks by an order of magnitude. 60469 out of the total 309,092 tokens are attached nonprojectively (19.56%).+AGDT is an extremely nonprojective treebank, exceeding the nonprojectivity level found in other treebanks by an order of magnitude. 60469 out of the total 308,882 tokens are attached nonprojectively (19.58%).
  
 I am not aware of any published evaluation of Ancient Greek parsing accuracy. I am not aware of any published evaluation of Ancient Greek parsing accuracy.
  

[ Back to the navigation ] [ Back to the content ]