Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
user:zeman:treebanks:eu [2011/11/28 23:37] zeman vytvořeno |
user:zeman:treebanks:eu [2011/12/13 10:11] (current) zeman Typo. |
||
---|---|---|---|
Line 5: | Line 5: | ||
==== Versions ==== | ==== Versions ==== | ||
- | * CoNLL 2007 | + | * CoNLL 2007 (BDT-I) |
- | * Extended version | + | * BDT-II |
==== Obtaining and License ==== | ==== Obtaining and License ==== | ||
There does not seem to be any regular distribution channel for the Basque Dependency Treebank. The CoNLL 2007 version had a restricted license for the duration of the shared task only. Republication of the CoNLL version in LDC is planned but it has not happenned yet. In the meantime, one can ask Koldo Gojenola (koldo (dot) gojenola (at) ehu (dot) es) about availability of the corpus. | There does not seem to be any regular distribution channel for the Basque Dependency Treebank. The CoNLL 2007 version had a restricted license for the duration of the shared task only. Republication of the CoNLL version in LDC is planned but it has not happenned yet. In the meantime, one can ask Koldo Gojenola (koldo (dot) gojenola (at) ehu (dot) es) about availability of the corpus. | ||
+ | |||
+ | Informally agreed upon terms: | ||
+ | * no redistribution | ||
+ | * cite the principal publication (see below) in publications | ||
BDT was created by members of the [[http:// | BDT was created by members of the [[http:// | ||
Line 23: | Line 27: | ||
* Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Aitziber Atutxa, Arantza Díaz de Ilarraza, Aitzpea Garmendia, Maite Oronoz: [[http:// | * Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Aitziber Atutxa, Arantza Díaz de Ilarraza, Aitzpea Garmendia, Maite Oronoz: [[http:// | ||
* Documentation | * Documentation | ||
- | * Description of tags and feature values is provided in the '' | + | * Description of tags and feature values is hard to find; the '' |
+ | * María Jesús Aranzabe, José Mari Arriola, Aitziber Atutxa, Irene Balza, Larraitz Uria: [[http:// | ||
+ | * [[http:// | ||
+ | * José Ignacio Hualde, Jon Ortiz de Urbina: [[http:// | ||
==== Domain ==== | ==== Domain ==== | ||
- | Mixed (“GDT consists of randomly selected textual fragments | + | Newswire + unknown |
==== Size ==== | ==== Size ==== | ||
- | The CoNLL 2007 version contains 70223 tokens | + | The CoNLL 2007 dataset was officially split into training and test part. The data split of BDT-II was provided by Koldo Gojenola and should correspond to data split used in parsing experiments published by the IXA Group. |
+ | |||
+ | ^ Version ^ Train Sentences ^ Train Tokens ^ D-test Sentences ^ D-test Tokens ^ E-test Sentences ^ E-test Tokens ^ Total Sentences ^ Total Tokens ^ Sentence Length ^ | ||
+ | | CoNLL 2007 | 3190 | 50526 | 334 | 5390 | | ||
+ | | BDT-II | 9094 | 124,684 | 1010 | 12625 | 1122 | 14295 | 11226 | 151,604 | 13.50 | | ||
==== Inside ==== | ==== Inside ==== | ||
- | The syntactic annotation style and the tagset for dependency relations | + | Both versions (CoNLL 2007 and BDT-II) are in the CoNLL 2006/2007 format. |
+ | |||
+ | Part of speech tag description | ||
+ | |||
+ | * IZE = noun | ||
+ | * ARR = common | ||
+ | * IZB = proper name | ||
+ | * LIB = place name | ||
+ | * ZKI = number | ||
+ | * ADJ = adjective | ||
+ | * ARR = common | ||
+ | * GAL = question | ||
+ | * ADI = verb | ||
+ | * SIN = simple | ||
+ | * ADK = composed | ||
+ | * ADP = periphrastic | ||
+ | * FAK = factitive | ||
+ | * ADB = adverb | ||
+ | * ARR = common | ||
+ | * GAL = question | ||
+ | * DET = determiner | ||
+ | * ERKARR = demonstrative common | ||
+ | * ERKIND = demonstrative emphatic | ||
+ | * NOLARR = indefinite common | ||
+ | * NOLGAL = indefinite question | ||
+ | * ZNB = number | ||
+ | * DZH = definite | ||
+ | * BAN = distributive | ||
+ | * ORD = ordinal | ||
+ | * DZG = indefinite | ||
+ | * ORO = general | ||
+ | * IOR = pronoun | ||
+ | * PERARR = personal common | ||
+ | * PERIND = personal emphatic | ||
+ | * IZGMGB = indefinite | ||
+ | * IZGGAL = question | ||
+ | * BIH = ??? | ||
+ | * ELK = ??? | ||
+ | * LOT = link | ||
+ | * LOK = connector | ||
+ | * JNT = conjunction | ||
+ | * PRT = particle | ||
+ | * ITJ = interjection | ||
+ | * BST = other | ||
+ | * ADL = auxiliary verb | ||
+ | * ADT = synthetic verb | ||
+ | * SIG = acronym | ||
+ | * SNB = symbol | ||
+ | * LAB = abbreviation | ||
+ | |||
+ | Main features: | ||
+ | |||
+ | * KAS = case. Various descriptions of Basque grammar list different numbers of cases and it is not easy to match all of the BDT case tags with them. Some but not all of them are described | ||
+ | * KAS:ABL (984) = ablativo = ablative | ||
+ | * KAS:ABS (22805) = absolutivo = absolutive | ||
+ | * KAS:ABU (32) = adlativo terminal (" | ||
+ | * KAS:ABZ (27) = adlativo direccional (" | ||
+ | * KAS:ALA (1093) = adlativo = allative | ||
+ | * KAS:BNK (13) =? special case of the locative genitive (" | ||
+ | * KAS:DAT (1451) = dativo = dative | ||
+ | * KAS:DES (181) = destinativo = benefactive (" | ||
+ | * KAS:DESK (223) =? descriptive locative genitive (" | ||
+ | * KAS:EM (705) = multiword token with postposition (e.g. " | ||
+ | * KAS:ERG (6059) = ergativo = ergative | ||
+ | * KAS:GEL (6259) = genitivo locativo = locative genitive | ||
+ | * KAS:GEN (4307) = genitivo de posesión = possessive genitive | ||
+ | * KAS:INE (7690) = inesivo = inessive | ||
+ | * KAS:INS (1370) = instrumental | ||
+ | * KAS:MOT (165) = motivativo = causative | ||
+ | * KAS:PAR (930) = partitivo = partitive | ||
+ | * KAS:PRO (89) = prolativo = essive | ||
+ | * KAS:SOZ (928) = asociativo = comitative | ||
+ | * ASP = aspect | ||
+ | * ERL = relation (relative sentence, completive sentence, indirect question...) | ||
+ | |||
+ | List of all 286 features found in the corpus with frequencies: | ||
+ | * ADM: | ||
+ | * ADM: | ||
+ | * ADM: | ||
+ | * ASP: | ||
+ | * ASP: | ||
+ | * ASP: | ||
+ | * ASP: | ||
+ | * BIZ: | ||
+ | * BIZ: | ||
+ | * ENT:??? | ||
+ | * ENT: | ||
+ | * ENT: | ||
+ | * ENT: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * HIT:NO 50 | ||
+ | * HIT:TO 38 | ||
+ | * IZAUR: | ||
+ | * IZAUR: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KLM:AM 80 | ||
+ | * KLM:HAS 2 | ||
+ | * MAI: | ||
+ | * MAI: | ||
+ | * MAI: | ||
+ | * MAI: | ||
+ | * MDN: | ||
+ | * MDN: | ||
+ | * MDN:A4 1 | ||
+ | * MDN: | ||
+ | * MDN: | ||
+ | * MDN: | ||
+ | * MDN:B3 11 | ||
+ | * MDN:B4 59 | ||
+ | * MDN:B5A 1 | ||
+ | * MDN: | ||
+ | * MDN:B6 1 | ||
+ | * MDN:B7 79 | ||
+ | * MDN:B8 38 | ||
+ | * MDN:C 52 | ||
+ | * MOD: | ||
+ | * MOD: | ||
+ | * MTKAT: | ||
+ | * MTKAT: | ||
+ | * MTKAT: | ||
+ | * MUG: | ||
+ | * MUG: | ||
+ | * MW:B 3615 | ||
+ | * NEUR: | ||
+ | * NMG: | ||
+ | * NMG: | ||
+ | * NMG: | ||
+ | * NOR: | ||
+ | * NOR: | ||
+ | * NOR:HI 20 | ||
+ | * NOR: | ||
+ | * NOR: | ||
+ | * NOR:ZU 93 | ||
+ | * NOR: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NUM: | ||
+ | * NUM: | ||
+ | * NUM: | ||
+ | * PER: | ||
+ | * PER: | ||
+ | * PER:HI 14 | ||
+ | * PER: | ||
+ | * PER: | ||
+ | * PER:ZU 60 | ||
+ | * PER: | ||
+ | * PLU:+ 149 | ||
+ | * PLU: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * ZENB: | ||
+ | * _ 36940 | ||
+ | |||
+ | The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags. | ||
+ | |||
+ | Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. Espainia_Poliziak, | ||
==== Sample ==== | ==== Sample ==== | ||
Line 41: | Line 417: | ||
The first sentence of the CoNLL 2007 training data: | The first sentence of the CoNLL 2007 training data: | ||
- | | 1 | " | + | | 1 | espainiako_poliziak |
- | | 2 | Τα | ο | At | AtDf | Ne< | + | | 2 | hiru | hiru | DET | DET_DZH |
- | | 3 | αντισώματα | + | | 3 | gazte | gazte | IZE | IZE_ARR |
- | | 4 | IgG | IgG | Rg | RgFwOr | _ | 3 | Atr | _ | _ | | + | | 4 | atxilotu |
- | | 5 | είναι | είμαι | Vb | VbMn | Id< | + | | 5 | ditu | *edun | ADL | ADL | A1< |
- | | 6 | σαν | + | | 6 | atarrabian |
- | | 7 | μακροπρόθεσμη | μακροπρόθεσμος | Aj | Aj | Ba< | + | | 7 | , | , | PUNC | PUNC_KOMA |
- | | 8 | μνήμη | + | | 8 | eta | eta | LOT | LOT_JNT |
- | | 9 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | | + | | 9 | madrilera |
- | | 10 | ενώ | ενώ | Cj | CjCo | _ | 26 | Coord | _ | _ | | + | | 10 | eraman |
- | | 11 | το | ο | At | AtDf | Ne< | + | | 11 | ditu | *edun | ADL | ADL | A1< |
- | | 12 | IgA | IgA | Rg | RgFwOr | _ | 15 | Sb | _ | _ | | + | | 12 | . | . | PUNC | PUNC_PUNC |
- | | 13 | πιστεύεται | πιστεύεται | Vb | VbMn | Id< | + | |
- | | 14 | ότι | + | |
- | | 15 | είναι | + | |
- | | 16 | ένας | + | |
- | | 17 | συγκεκριμένος | συγκεκριμένος | Aj | Aj | Ba< | + | |
- | | 18 | δείκτης | δείκτης | No | NoCm | Ma< | + | |
- | | 19 | για | για | AsPp | AsPpSp | _ | 18 | AuxP | _ | _ | | + | |
- | | 20 | πρόσφατες | πρόσφατος | Aj | Aj | Ba< | + | |
- | | 21 | ή | ή | Cj | CjCo | _ | 23 | Coord | _ | _ | | + | |
- | | 22 | χρόνιες | χρόνιος | Aj | Aj | Ba< | + | |
- | | 23 | λοιμώξεις | λοίμωξη | No | NoCm | Fe< | + | |
- | | 24 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | | + | |
- | | 25 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | | + | |
- | | 26 | εξηγεί | εξηγώ | Vb | VbMn | Id< | + | |
- | | 27 | η | ο | At | AtDf | Fe< | + | |
- | | 28 | Δρ | Δρ | Rg | RgFwTr | _ | 26 | Sb | _ | _ | | + | |
- | | 29 | Αρκάρι | Αρκάρι | No | NoCm | Ne< | + | |
- | | 30 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | | + | |
The first sentence of the CoNLL 2007 test data: | The first sentence of the CoNLL 2007 test data: | ||
- | | 1 | Η | ο | At | AtDf | Fe< | + | | 1 | epaileek |
- | | 2 | Σίφνος | + | | 2 | diote | esan | ADT | ADT | PNT< |
- | | 3 | φημίζεται | + | | 3 | eaeko | EAE | IZE | IZE_LIB | SIG< |
- | | 4 | και | + | | 4 | parlamentarioek | parlamentario | ADJ | ADJ_ARR | IZAUR-< |
- | | 5 | για | + | | 5 | eaetik_kanpo | EAE | SIG | SIG- | DEK< |
- | | 6 | τα | ο | At | AtDf | Ne< | + | | 6 | eginiko | egin | ADI | ADI_SIN | PART< |
- | | 7 | καταγάλανα | + | | 7 | delituak | delitu | IZE | IZE_ARR | BIZ-< |
- | | 8 | νερά | + | | 8 | ikertzea | ikertu | ADI | ADI_SIN | ADIZE< |
- | | 9 | των | + | | 9 | eta | eta | LOT | LOT_JNT | - | |
- | | 10 | πανέμορφων | + | | 10 | epaitzea | epaitu | ADI | ADI_SIN | ADIZE< |
- | | 11 | ακτών | + | | 11 | auzitegi_gorenari | auzitegi_gora | ADJ | ADJ_IZO | DEK< |
- | | 12 | της | + | | 12 | dagokiola | egon | ADT | ADT | PNT< |
- | | 13 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | | + | | 13 | , | , | PUNC | PUNC_KOMA |
+ | | 14 | baina | baina | LOT | LOT_JNT | AURK | | ||
+ | | 15 | atzerrian | atzerri | IZE | IZE_ARR | INE< | ||
+ | | 16 | izaniko | izan | ADI | ADI_SIN | PART< | ||
+ | | 17 | kontaktu | kontaktu | IZE | IZE_ARR | ||
+ | | 18 | horiek | ||
+ | | 19 | ezin_direla | ezin_izan | ADI | ADI_ADK | PNT< | ||
+ | | 20 | delitutzat | delitu | IZE | IZE_ARR | BIZ-< | ||
+ | | 21 | hartu | hartu | ADI | ADI_SIN | PART | | ||
+ | | 22 | . | . | PUNC | PUNC_PUNC | _ | | ||
+ | |||
+ | The first sentence of the BDT-II training data: | ||
+ | |||
+ | | 1 | Estatu_Batuetako_DEAko | Estatu_Batuak_DEA | IZE | LIB | PLU: | ||
+ | | 2 | buru | buru | IZE | ARR | _ | 4 | ncsubj | ||
+ | | 3 | ohiak | ohi | ADJ | ARR | IZAUR:-< | ||
+ | | 4 | aztertuko | aztertu | ADI | SIN | ADM:PART< | ||
+ | | 5 | du | *edun | ADL | ADL | MDN:A1< | ||
+ | | 6 | RUCen | RUC | IZE | IZB | MTKAT:SIG< | ||
+ | | 7 | erreforma | erreforma | IZE | ARR | KAS: | ||
+ | | 8 | . | . | PUNT_MARKA | ||
+ | |||
+ | The first sentence of the BDT-II development data: | ||
+ | |||
+ | | 1 | Irakaskuntzan | ||
+ | | 2 | jardun | jardun | ADI | SIN | ADM: | ||
+ | | 3 | zuen | *edun | ADL | ADL | MDN: | ||
+ | | 4 | Miel | Miel | IZE | IZB | PLU:-< | ||
+ | | 5 | Anjel_Elustondok | Anjel_Elustondo | IZE | IZB | PLU:-< | ||
+ | | 6 | 1980 | 1980 | IZE | ZKI | _ | 7 | ncmod | _ | _ | | ||
+ | | 7 | urtetik | ||
+ | | 8 | 1992ra | 1992 | IZE | ZKI | KAS: | ||
+ | | 9 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 8 | PUNC | _ | _ | | ||
+ | | 10 | hauetatik | hauek | DET | ERKARR | ||
+ | | 11 | hamar | hamar | DET | DZH | NMG:P | 12 | detmod | _ | _ | | ||
+ | | 12 | urtez | urte | IZE | ARR | BIZ: | ||
+ | | 13 | Azpeitiko | Azpeitia | IZE | LIB | PLU: | ||
+ | | 14 | ikastolan | ikastola | IZE | ARR | BIZ: | ||
+ | | 15 | irakasle | irakasle | IZE | ARR | KAS: | ||
+ | | 16 | eta | eta | LOT | JNT | ERL:EMEN | 8 | aponcmod | _ | _ | | ||
+ | | 17 | beste | beste | DET | DZG | _ | 18 | detmod | _ | _ | | ||
+ | | 18 | biak | bi | IZE | ZKI | KAS: | ||
+ | | 19 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 18 | PUNC | _ | _ | | ||
+ | | 20 | Arabako | Araba | IZE | LIB | PLU: | ||
+ | | 21 | ikastolen | ikastola | IZE | ARR | BIZ: | ||
+ | | 22 | elkartean | elkarte | IZE | ARR | BIZ: | ||
+ | | 23 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 22 | PUNC | _ | _ | | ||
+ | |||
+ | The first sentence of the BDT-II test data: | ||
+ | |||
+ | | 1 | Hegoaldean | hegoalde | IZE | ARR | KAS: | ||
+ | | 2 | iduri_zait | iduri_izan | ADI | ADK | ASP: | ||
+ | | 3 | euskararen | euskara | IZE | ARR | BIZ: | ||
+ | | 4 | mundu | mundu | IZE | ARR | BIZ:- | 7 | ncsubj | _ | _ | | ||
+ | | 5 | hau | hau | DET | ERKARR | KAS: | ||
+ | | 6 | adi-adi | adi-adi | ADB | ARR | _ | 7 | ncmod | _ | _ | | ||
+ | | 7 | dagola | egon | ADT | ADT | ASP: | ||
+ | | 8 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 7 | PUNC | _ | _ | | ||
+ | | 9 | Euskaltzaindiak | ||
+ | | 10 | zer | zer | DET | NOLGAL | ||
+ | | 11 | erranen | erran | ADI | SIN | ADM:PART< | ||
+ | | 12 | duen | *edun | ADL | ADL | ERL: | ||
+ | | 13 | zain | zain | ADB | ARR | _ | 7 | cmod | _ | _ | | ||
+ | | 14 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 13 | PUNC | _ | _ | | ||
+ | | 15 | haren | hura | DET | ERKARR | ||
+ | | 16 | arauen | ||
+ | | 17 | berehala | berehala | ADB | ARR | _ | 18 | ncmod | _ | _ | | ||
+ | | 18 | betetzeko | bete | ADI | SIN | ADM:ADIZE< | ||
+ | | 19 | . | . | PUNT_MARKA | ||
==== Parsing ==== | ==== Parsing ==== | ||
- | Nonprojectivities in GDT are not frequent. Only 823 of the 70223 tokens | + | BDT is a mildly nonprojective treebank. 1925 of the 151, |
- | The results of the CoNLL 2007 shared task are [[http:// | + | The results of the CoNLL 2007 shared task are [[http:// |
^ Parser (Authors) ^ LAS ^ UAS ^ | ^ Parser (Authors) ^ LAS ^ UAS ^ | ||
- | | Nakagawa | 76.31 | 84.08 | | + | | Malt (Nilsson et al.) | 76.94 | 82.84 | |
- | | Keith Hall et al. | 74.21 | 82.04 | | + | | Titov et al. | 75.49 | 81.93 | |
- | | Carreras | 73.56 | 81.37 | | + | | Sagae | 74.64 | 81.19 | |
- | | Malt (Nilsson et al.) | 74.65 | 81.22 | | + | | Carreras |
- | | Titov et al. | 73.52 | 81.20 | | + | | Nakagawa |
- | | Chen | 74.42 | 81.16 | | + | | Malt (J. Hall et al.) | 74.99 | 80.61 | |
- | | Duan | 74.29 | 80.77 | | + | | Johansson et al. | 75.08 | 80.43 | |
- | | Attardi et al. | 73.92 | 80.75 | | + | |
- | | Malt (J. Hall et al.) | 74.21 | 80.66 | | + | |
The two Malt parser results of 2007 (single malt and blended) are described in [[http:// | The two Malt parser results of 2007 (single malt and blended) are described in [[http:// | ||
+ | Parsing results on BDT-II have been published in Kepa Bengoetxea, Koldo Gojenola: [[http:// |