The LDT is freely downloadable from here under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 license. The license in short:
LDT was created by volunteering students and researchers from across the world. It is part of the Perseus Digital Library, a project on classical languages, hosted at the Tufts University, Medford, MA 02155, Massachusetts, USA.
Caesar: Bello Gallico Book 2 selections (50 BC); Cicero: In Catilinam 1.1-2.11 (63 BC); Jerome: Vulgate: Apocalypse (AD 400); Ovid: Metamorphoses: Book I (AD 8); Petronius: Satyricon 26-78 (Cena Trimalchionis) (AD 60); Propertius: Elegies: Book I (25 BC); Sallust: Catilina (63 BC); Vergil: Aeneid (Book 6 selections) (19 BC).
LDT contains 53143 tokens in 3473 non-empty sentences, yielding 15.30 tokens per sentence on average. No official training-test data split is defined. For our HamleDT experiments, we took the medium-sized file called 1999.02.0029.xml
(4789 tokens / 316 sentences; Ovid: Metamorphoses) for testing and the rest (48354 tokens / 3157 sentences) for training.
The native file format of the treebank is based on XML.
Morphological annotation consists of lemma and nine-character positional morphosyntactic tag. Disambiguation has been done manually (gold standard).
The syntactic annotation style is very similar to that of the Prague Dependency Treebank. The syntactic tags (analytical functions) are almost identical, too.
The first sentence of the corpus in its native XML format:
<?xml version="1.0"?> <treebank version="1.5" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:treebank="http://nlp.perseus.tufts.edu/syntax/treebank/1.5" xsi:schemaLocation="http://nlp.perseus.tufts.edu/syntax/treebank/1.5 treebank-1.5.xsd"> <sentence id="1" document_id="Perseus:text:1999.02.0002" subdoc="Book=2:chapter=1" span="Cum0:dare0"> <word id="1" form="Cum" lemma="cum1" postag="c--------" head="20" relation="AuxC" /> <word id="2" form="esset" lemma="sum1" postag="v3sisa---" head="1" relation="ADV" /> <word id="3" form="Caesar" lemma="Caesar1" postag="n-s---mn-" head="2" relation="SBJ" /> <word id="4" form="in" lemma="in1" postag="r--------" head="2" relation="AuxP" /> <word id="5" form="citeriore" lemma="citer1" postag="a-s---fbc" head="6" relation="ATR" /> <word id="6" form="Gallia" lemma="Gallia1" postag="n-s---fb-" head="4" relation="ADV" /> <word id="7" form="in" lemma="in1" postag="r--------" head="2" relation="AuxP" /> <word id="8" form="hibernis" lemma="hibernus1" postag="n-p---nb-" head="7" relation="ADV" /> <word id="9" form="," lemma="comma1" postag="u--------" head="13" relation="AuxX" /> <word id="10" form="ita" lemma="ita1" postag="d--------" head="2" relation="AuxY" /> <word id="11" form="uti" lemma="uti1" postag="c--------" head="10" relation="AuxC" /> <word id="12" form="supra" lemma="supra1" postag="d--------" head="13" relation="ADV" /> <word id="13" form="demonstravimus" lemma="demonstro1" postag="v1pria---" head="11" relation="ADV" /> <word id="14" form="," lemma="comma1" postag="u--------" head="13" relation="AuxX" /> <word id="15" form="crebri" lemma="creber1" postag="a-p---mn-" head="18" relation="ATR" /> <word id="16" form="ad" lemma="ad1" postag="r--------" head="19" relation="AuxP" /> <word id="17" form="eum" lemma="is1" postag="p-s---ma-" head="16" relation="OBJ" /> <word id="18" form="rumores" lemma="rumor1" postag="n-p---mn-" head="19" relation="SBJ" /> <word id="19" form="adferebantur" lemma="affero1" postag="v3piip---" head="20" relation="PRED_CO" /> <word id="20" form="que" lemma="que1" postag="c--------" head="0" relation="COORD" /> <word id="21" form="litteris" lemma="littera1" postag="n-p---fb-" head="25" relation="ADV" /> <word id="22" form="item" lemma="item1" postag="d--------" head="21" relation="AuxZ" /> <word id="23" form="Labieni" lemma="Labienus1" postag="n-s---mg-" head="21" relation="ATR" /> <word id="24" form="certior" lemma="certus1" postag="a-s---mnc" head="25" relation="PNOM" /> <word id="25" form="fiebat" lemma="fio1" postag="v3s-ia---" head="20" relation="PRED_CO" /> <word id="26" form="omnes" lemma="omnis1" postag="a-p---ma-" head="27" relation="ATR" /> <word id="27" form="Belgas" lemma="Belgae1" postag="n-p---ma-" head="40" relation="SBJ" /> <word id="28" form="," lemma="comma1" postag="u--------" head="34" relation="AuxX" /> <word id="29" form="quam" lemma="qui1" postag="p-s---fa-" head="31" relation="SBJ" /> <word id="30" form="tertiam" lemma="tertius1" postag="a-s---fa-" head="33" relation="ATR" /> <word id="31" form="esse" lemma="sum1" postag="v--pna---" head="34" relation="OBJ" /> <word id="32" form="Galliae" lemma="Gallia1" postag="n-s---fg-" head="33" relation="ATR" /> <word id="33" form="partem" lemma="pars1" postag="n-s---fa-" head="31" relation="PNOM" /> <word id="34" form="dixeramus" lemma="dico2" postag="v1plia---" head="27" relation="ATR" /> <word id="35" form="," lemma="comma1" postag="u--------" head="34" relation="AuxX" /> <word id="36" form="contra" lemma="contra1" postag="r--------" head="39" relation="AuxP" /> <word id="37" form="populum" lemma="populus1" postag="n-s---ma-" head="36" relation="ADV" /> <word id="38" form="Romanum" lemma="Romanus1" postag="a-s---ma-" head="37" relation="ATR" /> <word id="39" form="coniurare" lemma="conjuro1" postag="v--pna---" head="40" relation="OBJ_CO" /> <word id="40" form="que" lemma="que1" postag="c--------" head="24" relation="COORD" /> <word id="41" form="obsides" lemma="obses1" postag="n-p---ma-" head="44" relation="OBJ" /> <word id="42" form="inter" lemma="inter1" postag="r--------" head="44" relation="AuxP" /> <word id="43" form="se" lemma="sui1" postag="p-p---ma-" head="42" relation="OBJ" /> <word id="44" form="dare" lemma="do1" postag="v--pna---" head="40" relation="OBJ_CO" /> </sentence>
The first sentence of the corpus converted to the CoNLL format:
1 | Cum | cum1 | c | c | pos=c|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 20 | AuxC | _ | _ |
2 | esset | sum1 | v | v | pos=v|per=3|num=s|ten=i|mod=s|voi=a|gen=-|cas=-|deg=- | 1 | ADV | _ | _ |
3 | Caesar | Caesar1 | n | n | pos=n|per=-|num=s|ten=-|mod=-|voi=-|gen=m|cas=n|deg=- | 2 | SBJ | _ | _ |
4 | in | in1 | r | r | pos=r|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 2 | AuxP | _ | _ |
5 | citeriore | citer1 | a | a | pos=a|per=-|num=s|ten=-|mod=-|voi=-|gen=f|cas=b|deg=c | 6 | ATR | _ | _ |
6 | Gallia | Gallia1 | n | n | pos=n|per=-|num=s|ten=-|mod=-|voi=-|gen=f|cas=b|deg=- | 4 | ADV | _ | _ |
7 | in | in1 | r | r | pos=r|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 2 | AuxP | _ | _ |
8 | hibernis | hibernus1 | n | n | pos=n|per=-|num=p|ten=-|mod=-|voi=-|gen=n|cas=b|deg=- | 7 | ADV | _ | _ |
9 | , | comma1 | u | u | pos=u|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 13 | AuxX | _ | _ |
10 | ita | ita1 | d | d | pos=d|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 2 | AuxY | _ | _ |
11 | uti | uti1 | c | c | pos=c|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 10 | AuxC | _ | _ |
12 | supra | supra1 | d | d | pos=d|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 13 | ADV | _ | _ |
13 | demonstravimus | demonstro1 | v | v | pos=v|per=1|num=p|ten=r|mod=i|voi=a|gen=-|cas=-|deg=- | 11 | ADV | _ | _ |
14 | , | comma1 | u | u | pos=u|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 13 | AuxX | _ | _ |
15 | crebri | creber1 | a | a | pos=a|per=-|num=p|ten=-|mod=-|voi=-|gen=m|cas=n|deg=- | 18 | ATR | _ | _ |
16 | ad | ad1 | r | r | pos=r|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 19 | AuxP | _ | _ |
17 | eum | is1 | p | p | pos=p|per=-|num=s|ten=-|mod=-|voi=-|gen=m|cas=a|deg=- | 16 | OBJ | _ | _ |
18 | rumores | rumor1 | n | n | pos=n|per=-|num=p|ten=-|mod=-|voi=-|gen=m|cas=n|deg=- | 19 | SBJ | _ | _ |
19 | adferebantur | affero1 | v | v | pos=v|per=3|num=p|ten=i|mod=i|voi=p|gen=-|cas=-|deg=- | 20 | PRED_CO | _ | _ |
20 | que | que1 | c | c | pos=c|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 0 | COORD | _ | _ |
21 | litteris | littera1 | n | n | pos=n|per=-|num=p|ten=-|mod=-|voi=-|gen=f|cas=b|deg=- | 25 | ADV | _ | _ |
22 | item | item1 | d | d | pos=d|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 21 | AuxZ | _ | _ |
23 | Labieni | Labienus1 | n | n | pos=n|per=-|num=s|ten=-|mod=-|voi=-|gen=m|cas=g|deg=- | 21 | ATR | _ | _ |
24 | certior | certus1 | a | a | pos=a|per=-|num=s|ten=-|mod=-|voi=-|gen=m|cas=n|deg=c | 25 | PNOM | _ | _ |
25 | fiebat | fio1 | v | v | pos=v|per=3|num=s|ten=-|mod=i|voi=a|gen=-|cas=-|deg=- | 20 | PRED_CO | _ | _ |
26 | omnes | omnis1 | a | a | pos=a|per=-|num=p|ten=-|mod=-|voi=-|gen=m|cas=a|deg=- | 27 | ATR | _ | _ |
27 | Belgas | Belgae1 | n | n | pos=n|per=-|num=p|ten=-|mod=-|voi=-|gen=m|cas=a|deg=- | 40 | SBJ | _ | _ |
28 | , | comma1 | u | u | pos=u|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 34 | AuxX | _ | _ |
29 | quam | qui1 | p | p | pos=p|per=-|num=s|ten=-|mod=-|voi=-|gen=f|cas=a|deg=- | 31 | SBJ | _ | _ |
30 | tertiam | tertius1 | a | a | pos=a|per=-|num=s|ten=-|mod=-|voi=-|gen=f|cas=a|deg=- | 33 | ATR | _ | _ |
31 | esse | sum1 | v | v | pos=v|per=-|num=-|ten=p|mod=n|voi=a|gen=-|cas=-|deg=- | 34 | OBJ | _ | _ |
32 | Galliae | Gallia1 | n | n | pos=n|per=-|num=s|ten=-|mod=-|voi=-|gen=f|cas=g|deg=- | 33 | ATR | _ | _ |
33 | partem | pars1 | n | n | pos=n|per=-|num=s|ten=-|mod=-|voi=-|gen=f|cas=a|deg=- | 31 | PNOM | _ | _ |
34 | dixeramus | dico2 | v | v | pos=v|per=1|num=p|ten=l|mod=i|voi=a|gen=-|cas=-|deg=- | 27 | ATR | _ | _ |
35 | , | comma1 | u | u | pos=u|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 34 | AuxX | _ | _ |
36 | contra | contra1 | r | r | pos=r|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 39 | AuxP | _ | _ |
37 | populum | populus1 | n | n | pos=n|per=-|num=s|ten=-|mod=-|voi=-|gen=m|cas=a|deg=- | 36 | ADV | _ | _ |
38 | Romanum | Romanus1 | a | a | pos=a|per=-|num=s|ten=-|mod=-|voi=-|gen=m|cas=a|deg=- | 37 | ATR | _ | _ |
39 | coniurare | conjuro1 | v | v | pos=v|per=-|num=-|ten=p|mod=n|voi=a|gen=-|cas=-|deg=- | 40 | OBJ_CO | _ | _ |
40 | que | que1 | c | c | pos=c|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 24 | COORD | _ | _ |
41 | obsides | obses1 | n | n | pos=n|per=-|num=p|ten=-|mod=-|voi=-|gen=m|cas=a|deg=- | 44 | OBJ | _ | _ |
42 | inter | inter1 | r | r | pos=r|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 44 | AuxP | _ | _ |
43 | se | sui1 | p | p | pos=p|per=-|num=p|ten=-|mod=-|voi=-|gen=m|cas=a|deg=- | 42 | OBJ | _ | _ |
44 | dare | do1 | v | v | pos=v|per=-|num=-|ten=p|mod=n|voi=a|gen=-|cas=-|deg=- | 40 | OBJ_CO | _ | _ |
The first sentence of the HamleDT test data in the CoNLL format:
1 | In | in1 | r | r | pos=r|per=-|num=-|ten=-|mod=-|voi=-|gen=-|cas=-|deg=- | 5 | AuxP | _ | _ |
2 | nova | novus1 | a | a | pos=a|per=-|num=p|ten=-|mod=-|voi=-|gen=n|cas=a|deg=- | 8 | ATR | _ | _ |
3 | fert | fero1 | v | v | pos=v|per=3|num=s|ten=p|mod=i|voi=a|gen=-|cas=-|deg=- | 0 | PRED | _ | _ |
4 | animus | animus1 | n | n | pos=n|per=-|num=s|ten=-|mod=-|voi=-|gen=m|cas=n|deg=- | 3 | SBJ | _ | _ |
5 | mutatas | muto1 | t | t | pos=t|per=-|num=p|ten=r|mod=p|voi=p|gen=f|cas=a|deg=- | 7 | ATR | _ | _ |
6 | dicere | dico2 | v | v | pos=v|per=-|num=-|ten=p|mod=n|voi=a|gen=-|cas=-|deg=- | 3 | OBJ | _ | _ |
7 | formas | forma1 | n | n | pos=n|per=-|num=p|ten=-|mod=-|voi=-|gen=f|cas=a|deg=- | 6 | OBJ | _ | _ |
8 | corpora | corpus1 | n | n | pos=n|per=-|num=p|ten=-|mod=-|voi=-|gen=n|cas=a|deg=- | 1 | OBJ | _ | _ |
LDT is an extremely nonprojective treebank. 4042 out of the total 53143 tokens are attached nonprojectively (7.61%).
I am not aware of any published evaluation of Latin parsing accuracy.