[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:eu [2011/11/29 18:50]
zeman Basque cases.
user:zeman:treebanks:eu [2011/12/13 10:11] (current)
zeman Typo.
Line 30: Line 30:
     * María Jesús Aranzabe, José Mari Arriola, Aitziber Atutxa, Irene Balza, Larraitz Uria: [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|Guía para la anotación sintáctica manual de Eus3LB (corpus del euskera anotado a nivel sintáctico, semántico y pragmático)]]. UPV/EHU/LSI/TR 13-2003, Donostia, Spain, 2003.     * María Jesús Aranzabe, José Mari Arriola, Aitziber Atutxa, Irene Balza, Larraitz Uria: [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|Guía para la anotación sintáctica manual de Eus3LB (corpus del euskera anotado a nivel sintáctico, semántico y pragmático)]]. UPV/EHU/LSI/TR 13-2003, Donostia, Spain, 2003.
     * [[http://www.google.cz/url?sa=t&rct=j&q=adlativo%20direccional%20norantz&source=web&cd=1&ved=0CB0QFjAA&url=http%3A%2F%2Flenguaesp.usal.es%2Fhtml%2Fes%2Fdbfs%2Fdownload.html%3FfileId%3D1118%26_key_%3D248d9f4b64589181dfabafad22b8e483&ei=Qg3VTpKCFpDNswaarJyNDg&usg=AFQjCNEA86oRVR_7sNixk1EKvDFCoSrSsg&sig2=yTsTylb19CsOqsdu-wOtwA&cad=rja|Here]] at the University of Salamanca is a Microsoft Word document in Spanish describing the Basque morphology. It does not mention the treebank but it could help understand some of the tags.     * [[http://www.google.cz/url?sa=t&rct=j&q=adlativo%20direccional%20norantz&source=web&cd=1&ved=0CB0QFjAA&url=http%3A%2F%2Flenguaesp.usal.es%2Fhtml%2Fes%2Fdbfs%2Fdownload.html%3FfileId%3D1118%26_key_%3D248d9f4b64589181dfabafad22b8e483&ei=Qg3VTpKCFpDNswaarJyNDg&usg=AFQjCNEA86oRVR_7sNixk1EKvDFCoSrSsg&sig2=yTsTylb19CsOqsdu-wOtwA&cad=rja|Here]] at the University of Salamanca is a Microsoft Word document in Spanish describing the Basque morphology. It does not mention the treebank but it could help understand some of the tags.
 +    * José Ignacio Hualde, Jon Ortiz de Urbina: [[http://books.google.cz/books?id=Kss999lxKm0C&printsec=frontcover&dq=grammar+of+basque&cd=1&redir_esc=y#v=onepage&q&f=false|A Grammar of Basque]]. Mouton de Gruyter, Berlin, 2003. ISBN 3-11-017683-1.
  
 ==== Domain ==== ==== Domain ====
Line 100: Line 101:
     * KAS:ABL (984) = ablativo = ablative     * KAS:ABL (984) = ablativo = ablative
     * KAS:ABS (22805) = absolutivo = absolutive     * KAS:ABS (22805) = absolutivo = absolutive
-    * KAS:ABU (32) = adlativo terminal ("-raino") = "until" = terminative +    * KAS:ABU (32) = adlativo terminal ("-raino") = "until, as far as" = terminative 
-    * KAS:ABZ (27) = adlativo direccional ("-rantz") = "since= egressive+    * KAS:ABZ (27) = adlativo direccional ("-rantz") = "towards~ lative?
     * KAS:ALA (1093) = adlativo = allative     * KAS:ALA (1093) = adlativo = allative
     * KAS:BNK (13) =? special case of the locative genitive ("-ko", "-eko")     * KAS:BNK (13) =? special case of the locative genitive ("-ko", "-eko")
Line 119: Line 120:
   * ASP = aspect   * ASP = aspect
   * ERL = relation (relative sentence, completive sentence, indirect question...)   * ERL = relation (relative sentence, completive sentence, indirect question...)
 +
 +List of all 286 features found in the corpus with frequencies:
 +  * ADM:ADIZE 3612
 +  * ADM:ADOIN 2919
 +  * ADM:PART 14711
 +  * ASP:BURU 7491
 +  * ASP:EZBU 2421
 +  * ASP:GERO 2166
 +  * ASP:PNT 6631
 +  * BIZ:+ 2303
 +  * BIZ:- 22116
 +  * ENT:??? 35
 +  * ENT:Erakundea 3499
 +  * ENT:Pertsona 4401
 +  * ENT:Tokia 3949
 +  * ERL:AURK 1264
 +  * ERL:BALD 332
 +  * ERL:DENB 390
 +  * ERL:EMEN 5969
 +  * ERL:ERLT 1531
 +  * ERL:ESPL 129
 +  * ERL:HAUT 408
 +  * ERL:HELB 925
 +  * ERL:KAUS 864
 +  * ERL:KONPL 2614
 +  * ERL:KONT 215
 +  * ERL:MOD 1152
 +  * ERL:MOD/DENB 244
 +  * ERL:MOS 146
 +  * ERL:ONDO 160
 +  * ERL:ZHG 232
 +  * HIT:NO 50
 +  * HIT:TO 38
 +  * IZAUR:+ 1499
 +  * IZAUR:- 5930
 +  * KAS:ABL 984
 +  * KAS:ABS 22807
 +  * KAS:ABU 32
 +  * KAS:ABZ 27
 +  * KAS:ALA 1094
 +  * KAS:BNK 13
 +  * KAS:DAT 1451
 +  * KAS:DES 181
 +  * KAS:DESK 223
 +  * KAS:EM 707
 +  * KAS:ERG 6059
 +  * KAS:GEL 6266
 +  * KAS:GEN 4307
 +  * KAS:INE 7693
 +  * KAS:INS 1370
 +  * KAS:MOT 165
 +  * KAS:PAR 930
 +  * KAS:PRO 89
 +  * KAS:SOZ 928
 +  * KLM:AM 80
 +  * KLM:HAS 2
 +  * MAI:GEHI 38
 +  * MAI:IND 36
 +  * MAI:KONP 244
 +  * MAI:SUP 406
 +  * MDN:A1 11766
 +  * MDN:A3 107
 +  * MDN:A4 1
 +  * MDN:A5 282
 +  * MDN:B1 6666
 +  * MDN:B2 185
 +  * MDN:B3 11
 +  * MDN:B4 59
 +  * MDN:B5A 1
 +  * MDN:B5B 27
 +  * MDN:B6 1
 +  * MDN:B7 79
 +  * MDN:B8 38
 +  * MDN:C 52
 +  * MOD:EGI 2244
 +  * MOD:ZIU 126
 +  * MTKAT:LAB 16
 +  * MTKAT:SIG 696
 +  * MTKAT:SNB 22
 +  * MUG:M 42116
 +  * MUG:MG 8449
 +  * MW:B 3615
 +  * NEUR:- 193
 +  * NMG:MG 1055
 +  * NMG:P 2690
 +  * NMG:S 2156
 +  * NOR:GU 223
 +  * NOR:HAIEK 4248
 +  * NOR:HI 20
 +  * NOR:HURA 14342
 +  * NOR:NI 337
 +  * NOR:ZU 93
 +  * NOR:ZUEK 12
 +  * NORI:GURI 124
 +  * NORI:HAIEI 306
 +  * NORI:HARI 1085
 +  * NORI:HIRI-NO 2
 +  * NORI:HIRI-TO 5
 +  * NORI:NIRI 152
 +  * NORI:ZUEI 12
 +  * NORI:ZURI 39
 +  * NORK:GUK 721
 +  * NORK:HAIEK-K 2618
 +  * NORK:HARK 5981
 +  * NORK:HIK 6
 +  * NORK:HIK-NO 10
 +  * NORK:HIK-TO 8
 +  * NORK:NIK 662
 +  * NORK:ZUEK-K 46
 +  * NORK:ZUK 208
 +  * NUM:P 9347
 +  * NUM:PH 172
 +  * NUM:S 32570
 +  * PER:GU 242
 +  * PER:HAIEK 93
 +  * PER:HI 14
 +  * PER:HURA 1
 +  * PER:NI 290
 +  * PER:ZU 60
 +  * PER:ZUEK 29
 +  * PLU:+ 149
 +  * PLU:- 10257
 +  * POS:+ 2353
 +  * POS:POSAldeko 2
 +  * POS:POSAurkako 1
 +  * POS:POSGabeko 1
 +  * POS:POSInguruko 1
 +  * POS:POSKontrako 2
 +  * POS:POSaintzinean 1
 +  * POS:POSaitzina 2
 +  * POS:POSaitzinean 5
 +  * POS:POSaitzineko 2
 +  * POS:POSaitzinetik 3
 +  * POS:POSalboan 2
 +  * POS:POSaldamenetik 1
 +  * POS:POSalde 38
 +  * POS:POSaldean 11
 +  * POS:POSaldeaz 1
 +  * POS:POSaldeko 37
 +  * POS:POSaldera 20
 +  * POS:POSalderat 1
 +  * POS:POSaldetik 25
 +  * POS:POSantzean 1
 +  * POS:POSantzeko 9
 +  * POS:POSantzekoa 2
 +  * POS:POSantzera 3
 +  * POS:POSarabera 135
 +  * POS:POSaraberako 1
 +  * POS:POSarte 82
 +  * POS:POSartean 158
 +  * POS:POSarteetik 1
 +  * POS:POSarteko 108
 +  * POS:POSartekoak 1
 +  * POS:POSat 6
 +  * POS:POSatzean 15
 +  * POS:POSatzeko 6
 +  * POS:POSatzera 1
 +  * POS:POSatzetik 12
 +  * POS:POSaurka 103
 +  * POS:POSaurkaa 1
 +  * POS:POSaurkako 48
 +  * POS:POSaurrean 74
 +  * POS:POSaurreko 10
 +  * POS:POSaurrera 36
 +  * POS:POSaurrerako 2
 +  * POS:POSaurretik 26
 +  * POS:POSazpian 9
 +  * POS:POSazpitik 6
 +  * POS:POSbaitan 12
 +  * POS:POSbarik 2
 +  * POS:POSbarna 1
 +  * POS:POSbarnean 11
 +  * POS:POSbarneko 2
 +  * POS:POSbarnera 1
 +  * POS:POSbarrena 4
 +  * POS:POSbarrenean 1
 +  * POS:POSbarru 7
 +  * POS:POSbarruan 37
 +  * POS:POSbarruetatik 1
 +  * POS:POSbarruko 3
 +  * POS:POSbarrura 1
 +  * POS:POSbarrutik 2
 +  * POS:POSbatera 42
 +  * POS:POSbatera— 1
 +  * POS:POSbegira 31
 +  * POS:POSbehera 11
 +  * POS:POSbestaldean 1
 +  * POS:POSbezala 75
 +  * POS:POSbezalako 15
 +  * POS:POSbezalakoa 1
 +  * POS:POSbezalakoen 1
 +  * POS:POSbidez 45
 +  * POS:POSbila 20
 +  * POS:POSbitarte 2
 +  * POS:POSbitartean 18
 +  * POS:POSbitarteko 5
 +  * POS:POSbitarterako 1
 +  * POS:POSbitartez 13
 +  * POS:POSburuan 7
 +  * POS:POSburuz 47
 +  * POS:POSburuzko 36
 +  * POS:POSeran 1
 +  * POS:POSerdian 11
 +  * POS:POSerdiko 1
 +  * POS:POSerdira 3
 +  * POS:POSerditan 1
 +  * POS:POSeske 2
 +  * POS:POSesker 30
 +  * POS:POSesku 12
 +  * POS:POSeskuetan 5
 +  * POS:POSeskuko 1
 +  * POS:POSeskutik 6
 +  * POS:POSezean 4
 +  * POS:POSgabe 74
 +  * POS:POSgabeko 17
 +  * POS:POSgain 36
 +  * POS:POSgaindi 1
 +  * POS:POSgaindiko 1
 +  * POS:POSgainean 33
 +  * POS:POSgaineko 12
 +  * POS:POSgainera 9
 +  * POS:POSgainerat 1
 +  * POS:POSgainetik 16
 +  * POS:POSgero 1
 +  * POS:POSgeroztik 18
 +  * POS:POSgertu 4
 +  * POS:POSgibeleko 1
 +  * POS:POSgibeletik 2
 +  * POS:POSgisa 34
 +  * POS:POSgisako 1
 +  * POS:POSgisan 2
 +  * POS:POSgisara 1
 +  * POS:POSgoiko 1
 +  * POS:POSgoitik 1
 +  * POS:POSgora 30
 +  * POS:POSgorago 1
 +  * POS:POSgorako 7
 +  * POS:POSgorakoen 1
 +  * POS:POShurbil 8
 +  * POS:POShurrean 1
 +  * POS:POSinguru 16
 +  * POS:POSingurua 1
 +  * POS:POSinguruan 77
 +  * POS:POSinguruetako 1
 +  * POS:POSinguruetan 2
 +  * POS:POSinguruetara 1
 +  * POS:POSinguruko 27
 +  * POS:POSingurura 5
 +  * POS:POSingururako 1
 +  * POS:POSirian 1
 +  * POS:POSkanpo 28
 +  * POS:POSkanpoko 12
 +  * POS:POSkanpora 4
 +  * POS:POSkontra 72
 +  * POS:POSkontrako 38
 +  * POS:POSlanda 7
 +  * POS:POSlandara 2
 +  * POS:POSlegez 1
 +  * POS:POSlekuan 4
 +  * POS:POSlepora 1
 +  * POS:POSmendean 1
 +  * POS:POSmenpe 8
 +  * POS:POSmenpera 1
 +  * POS:POSmoduan 1
 +  * POS:POSmodura 1
 +  * POS:POSondoan 19
 +  * POS:POSondoko 1
 +  * POS:POSondora 1
 +  * POS:POSondoren 32
 +  * POS:POSondorengo 2
 +  * POS:POSondotik 14
 +  * POS:POSordez 9
 +  * POS:POSostean 17
 +  * POS:POSosteko 1
 +  * POS:POSpare 1
 +  * POS:POSparean 5
 +  * POS:POSpareko 2
 +  * POS:POSpartean 3
 +  * POS:POSpartez 1
 +  * POS:POSpean 1
 +  * POS:POStruke 9
 +  * POS:POSurrun 3
 +  * POS:POSurruti 3
 +  * POS:POSzai 2
 +  * POS:POSzain 12
 +  * POS:POSzehar 42
 +  * ZENB:- 192
 +  * _ 36940
  
 The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags. The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags.
Line 229: Line 518:
 BDT is a mildly nonprojective treebank. 1925 of the 151,604 tokens of combined BDT-II training and test sets are attached nonprojectively (1.27%). BDT is a mildly nonprojective treebank. 1925 of the 151,604 tokens of combined BDT-II training and test sets are attached nonprojectively (1.27%).
  
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek:+The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Basque:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^

[ Back to the navigation ] [ Back to the content ]