Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
user:zeman:treebanks:eu [2011/11/29 10:42] zeman Sample. |
user:zeman:treebanks:eu [2011/12/13 10:11] (current) zeman Typo. |
||
---|---|---|---|
Line 5: | Line 5: | ||
==== Versions ==== | ==== Versions ==== | ||
- | * CoNLL 2007 | + | * CoNLL 2007 (BDT-I) |
* BDT-II (obtained per e-mail in 2011) | * BDT-II (obtained per e-mail in 2011) | ||
Line 29: | Line 29: | ||
* Description of tags and feature values is hard to find; the '' | * Description of tags and feature values is hard to find; the '' | ||
* María Jesús Aranzabe, José Mari Arriola, Aitziber Atutxa, Irene Balza, Larraitz Uria: [[http:// | * María Jesús Aranzabe, José Mari Arriola, Aitziber Atutxa, Irene Balza, Larraitz Uria: [[http:// | ||
+ | * [[http:// | ||
+ | * José Ignacio Hualde, Jon Ortiz de Urbina: [[http:// | ||
==== Domain ==== | ==== Domain ==== | ||
Line 96: | Line 98: | ||
Main features: | Main features: | ||
- | * KAS = case (ERG = ergative, | + | * KAS = case. Various descriptions of Basque grammar list different numbers of cases and it is not easy to match all of the BDT case tags with them. Some but not all of them are described in the Annex 3 of the technical report mentioned above. The following list gives all case tags occurring in BDT with their frequencies in brackets. |
+ | * KAS: | ||
+ | * KAS:ABS (22805) = absolutivo | ||
+ | * KAS:ABU (32) = adlativo terminal (" | ||
+ | * KAS:ABZ (27) = adlativo direccional (" | ||
+ | * KAS:ALA (1093) = adlativo = allative | ||
+ | * KAS:BNK (13) =? special case of the locative genitive (" | ||
+ | * KAS:DAT (1451) = dativo | ||
+ | * KAS:DES (181) = destinativo = benefactive (" | ||
+ | * KAS:DESK (223) =? descriptive locative genitive (" | ||
+ | * KAS:EM (705) = multiword token with postposition (e.g. " | ||
+ | * KAS:ERG (6059) = ergativo = ergative | ||
+ | * KAS:GEL (6259) = genitivo locativo = locative genitive | ||
+ | * KAS:GEN (4307) = genitivo de posesión = possessive genitive | ||
+ | * KAS:INE (7690) = inesivo = inessive | ||
+ | * KAS:INS (1370) = instrumental | ||
+ | * KAS:MOT (165) = motivativo = causative | ||
+ | * KAS:PAR (930) = partitivo = partitive | ||
+ | * KAS:PRO (89) = prolativo = essive | ||
+ | * KAS:SOZ (928) = asociativo = comitative | ||
* ASP = aspect | * ASP = aspect | ||
* ERL = relation (relative sentence, completive sentence, indirect question...) | * ERL = relation (relative sentence, completive sentence, indirect question...) | ||
+ | |||
+ | List of all 286 features found in the corpus with frequencies: | ||
+ | * ADM: | ||
+ | * ADM: | ||
+ | * ADM: | ||
+ | * ASP: | ||
+ | * ASP: | ||
+ | * ASP: | ||
+ | * ASP: | ||
+ | * BIZ:+ 2303 | ||
+ | * BIZ:- 22116 | ||
+ | * ENT:??? 35 | ||
+ | * ENT: | ||
+ | * ENT: | ||
+ | * ENT: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL: | ||
+ | * ERL:MOS 146 | ||
+ | * ERL: | ||
+ | * ERL:ZHG 232 | ||
+ | * HIT:NO 50 | ||
+ | * HIT:TO 38 | ||
+ | * IZAUR: | ||
+ | * IZAUR: | ||
+ | * KAS:ABL 984 | ||
+ | * KAS: | ||
+ | * KAS:ABU 32 | ||
+ | * KAS:ABZ 27 | ||
+ | * KAS: | ||
+ | * KAS:BNK 13 | ||
+ | * KAS: | ||
+ | * KAS:DES 181 | ||
+ | * KAS: | ||
+ | * KAS:EM 707 | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS: | ||
+ | * KAS:MOT 165 | ||
+ | * KAS:PAR 930 | ||
+ | * KAS:PRO 89 | ||
+ | * KAS:SOZ 928 | ||
+ | * KLM:AM 80 | ||
+ | * KLM:HAS 2 | ||
+ | * MAI:GEHI 38 | ||
+ | * MAI:IND 36 | ||
+ | * MAI: | ||
+ | * MAI:SUP 406 | ||
+ | * MDN: | ||
+ | * MDN:A3 107 | ||
+ | * MDN:A4 1 | ||
+ | * MDN:A5 282 | ||
+ | * MDN:B1 6666 | ||
+ | * MDN:B2 185 | ||
+ | * MDN:B3 11 | ||
+ | * MDN:B4 59 | ||
+ | * MDN:B5A 1 | ||
+ | * MDN:B5B 27 | ||
+ | * MDN:B6 1 | ||
+ | * MDN:B7 79 | ||
+ | * MDN:B8 38 | ||
+ | * MDN:C 52 | ||
+ | * MOD: | ||
+ | * MOD:ZIU 126 | ||
+ | * MTKAT: | ||
+ | * MTKAT: | ||
+ | * MTKAT: | ||
+ | * MUG:M 42116 | ||
+ | * MUG:MG 8449 | ||
+ | * MW:B 3615 | ||
+ | * NEUR:- 193 | ||
+ | * NMG:MG 1055 | ||
+ | * NMG:P 2690 | ||
+ | * NMG:S 2156 | ||
+ | * NOR:GU 223 | ||
+ | * NOR: | ||
+ | * NOR:HI 20 | ||
+ | * NOR: | ||
+ | * NOR:NI 337 | ||
+ | * NOR:ZU 93 | ||
+ | * NOR:ZUEK 12 | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORI: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK:HIK 6 | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NORK: | ||
+ | * NUM:P 9347 | ||
+ | * NUM:PH 172 | ||
+ | * NUM:S 32570 | ||
+ | * PER:GU 242 | ||
+ | * PER: | ||
+ | * PER:HI 14 | ||
+ | * PER:HURA 1 | ||
+ | * PER:NI 290 | ||
+ | * PER:ZU 60 | ||
+ | * PER:ZUEK 29 | ||
+ | * PLU:+ 149 | ||
+ | * PLU:- 10257 | ||
+ | * POS:+ 2353 | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS:POSat 6 | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * POS: | ||
+ | * ZENB:- 192 | ||
+ | * _ 36940 | ||
The syntactic guidelines (structure and labels) are described in Spanish in this [[http:// | The syntactic guidelines (structure and labels) are described in Spanish in this [[http:// | ||
+ | |||
+ | Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. Espainia_Poliziak, | ||
==== Sample ==== | ==== Sample ==== | ||
Line 205: | Line 516: | ||
==== Parsing ==== | ==== Parsing ==== | ||
- | Nonprojectivities in GDT are not frequent. Only 823 of the 70223 tokens | + | BDT is a mildly nonprojective treebank. 1925 of the 151, |
- | The results of the CoNLL 2007 shared task are [[http:// | + | The results of the CoNLL 2007 shared task are [[http:// |
^ Parser (Authors) ^ LAS ^ UAS ^ | ^ Parser (Authors) ^ LAS ^ UAS ^ | ||
- | | Nakagawa | 76.31 | 84.08 | | + | | Malt (Nilsson et al.) | 76.94 | 82.84 | |
- | | Keith Hall et al. | 74.21 | 82.04 | | + | | Titov et al. | 75.49 | 81.93 | |
- | | Carreras | 73.56 | 81.37 | | + | | Sagae | 74.64 | 81.19 | |
- | | Malt (Nilsson et al.) | 74.65 | 81.22 | | + | | Carreras |
- | | Titov et al. | 73.52 | 81.20 | | + | | Nakagawa |
- | | Chen | 74.42 | 81.16 | | + | | Malt (J. Hall et al.) | 74.99 | 80.61 | |
- | | Duan | 74.29 | 80.77 | | + | | Johansson et al. | 75.08 | 80.43 | |
- | | Attardi et al. | 73.92 | 80.75 | | + | |
- | | Malt (J. Hall et al.) | 74.21 | 80.66 | | + | |
The two Malt parser results of 2007 (single malt and blended) are described in [[http:// | The two Malt parser results of 2007 (single malt and blended) are described in [[http:// | ||
+ | Parsing results on BDT-II have been published in Kepa Bengoetxea, Koldo Gojenola: [[http:// |