Next revision
|
Previous revision
|
user:zeman:treebanks:hu [2011/12/13 10:09] zeman vytvořeno |
user:zeman:treebanks:hu [2011/12/13 22:52] (current) zeman Personal names. |
===== Hungarian (hu) ===== | ===== Hungarian (hu) ===== |
| |
Basque Dependency Treebank (BDT) | [[http://www.inf.u-szeged.hu/projectdirs/hlt/hu/Treebank/treebank2.html|Szeged Treebank]] (SzTB) |
| |
==== Versions ==== | ==== Versions ==== |
| |
* CoNLL 2007 (BDT-I) | * Szeged Treebank 1.0 (shallow parse) |
* BDT-II (obtained per e-mail in 2011) | * Szeged Treebank 2.0 (full parse) |
| * CoNLL 2007 (based on SzTB 2.0) |
| |
==== Obtaining and License ==== | ==== Obtaining and License ==== |
| |
There does not seem to be any regular distribution channel for the Basque Dependency Treebank. The CoNLL 2007 version had a restricted license for the duration of the shared task only. Republication of the CoNLL version in LDC is planned but it has not happenned yet. In the meantime, one can ask Koldo Gojenola (koldo (dot) gojenola (at) ehu (dot) es) about availability of the corpus. | The Szeged Treebank is available for research free of charge, provided the user signs the license agreement first. [[http://www.inf.u-szeged.hu/projectdirs/hlt/index_en.html|The website]] uses JavaScript to manage content, which makes it difficult to directly link to relevant sections. Click on “downloads” //(letöltések)// to get the list of downloadable corpora and links to their descriptions (e.g. [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|Szeged Treebank 2.0]]). To obtain the treebank, one is supposed to complete the license form, print it, sign it and fax it to +36-62-546397 or mail it to Vincze Veronika, Árpád tér 2, H-6720 Szeged. You will be given a user ID and password needed to download the data. There are links to Microsoft Word documents with the license agreement but they do not work for me. Ask Veronika Vincze how to proceed (vinczev (at) inf (dot) u-szeged (dot) hu). |
| |
Informally agreed upon terms: | Republication of the CoNLL 2007 version in the LDC is planned but it has not happened yet. |
| |
| The CoNLL 2007 license in short: |
| * non-profit education and research purposes |
* no redistribution | * no redistribution |
| * no modification |
* cite the principal publication (see below) in publications | * cite the principal publication (see below) in publications |
| |
BDT was created by members of the [[http://ixa.si.ehu.es/|IXA Group]] (IXA taldea), University of the Basque Country (Euskal Herriko Unibertsitatea), 649 Posta kutxa, E-20080 Donostia, Spain. | SzTB was created by members of the [[http://www.inf.u-szeged.hu/projectdirs/hlt/|Human Language Technology Group]] (Nyelvtechnológiai Csoport), Department of Informatics (Informatikai Tanszékcsoport), University of Szeged (Szegedi Tudományegyetem), Árpád tér 2, H-6720 Szeged, Hungary. Conversion from constituents to dependencies for the CoNLL 2007 shared task was done by Zoltán Alexin. |
| |
==== References ==== | ==== References ==== |
| |
* Website | * Website |
* //no website dedicated to the treebank// | * http://www.inf.u-szeged.hu/projectdirs/hlt/index_en.html |
| * http://www.inf.u-szeged.hu/projectdirs/hlt/en/downloads.html |
| * http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html |
| * http://www.inf.u-szeged.hu/projectdirs/hlt/hu/Treebank/treebank2.html (on-line browsing using a Java applet) |
* Data | * Data |
* //no separate citation// | * //no separate citation// |
* Principal publications | * Principal publications |
* Itziar Aduriz, María Jesús Aranzabe, José María Arriola, Aitziber Atutxa, Arantza Díaz de Ilarraza, Aitzpea Garmendia, Maite Oronoz: [[http://w3.msi.vxu.se/~rics/TLT2003/doc/aduriz_et_al.pdf|Construction of a Basque Dependency Treebank]] In: Proceedings of The Second Workshop on Treebanks and Linguistic Theories (TLT 2003), pp. 149-160, Växjö, Sweden, 2003. | * Dóra Csendes, János Csirik, Tibor Gyimóthy, András Kocsor: [[http://www.springerlink.com/content/978-3-540-28789-6/#section=565106&page=1&locus=44|The Szeged Treebank]] In: Václav Matoušek, Pavel Mautner, Tomáš Pavelka (eds.): //Text, Speech and Dialogue. 8th International Conference, TSD 2005, Karlovy Vary, Czech Republic, September 12-15, 2005. Proceedings.// Lecture Notes in Computer Science, vol. 3658/2005, pp. 123-131, Springer-Verlag, Berlin / Heidelberg, Germany, 2005. ISSN 0302-9743, ISBN 978-3-540-28789-6. |
* Documentation | * Documentation |
* Description of tags and feature values is hard to find; the ''doc/README'' file in the CoNLL 2007 data distribution is not very informative. See below for information obtained per e-mail communication. | * The ''doc/README'' file in the CoNLL 2007 data distribution contains a quick guide to part of speech tags. There are also several PDF documents with detailed documentation of the annotation. |
* María Jesús Aranzabe, José Mari Arriola, Aitziber Atutxa, Irene Balza, Larraitz Uria: [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|Guía para la anotación sintáctica manual de Eus3LB (corpus del euskera anotado a nivel sintáctico, semántico y pragmático)]]. UPV/EHU/LSI/TR 13-2003, Donostia, Spain, 2003. | * A lot of useful information on SzTB 2.0 (original, not CoNLL version), including morphosyntax, can be found at the abovementioned [[http://www.inf.u-szeged.hu/projectdirs/hlt/en/Szeged%20Treebank%202.0_en.html|website]]. |
* [[http://www.google.cz/url?sa=t&rct=j&q=adlativo%20direccional%20norantz&source=web&cd=1&ved=0CB0QFjAA&url=http%3A%2F%2Flenguaesp.usal.es%2Fhtml%2Fes%2Fdbfs%2Fdownload.html%3FfileId%3D1118%26_key_%3D248d9f4b64589181dfabafad22b8e483&ei=Qg3VTpKCFpDNswaarJyNDg&usg=AFQjCNEA86oRVR_7sNixk1EKvDFCoSrSsg&sig2=yTsTylb19CsOqsdu-wOtwA&cad=rja|Here]] at the University of Salamanca is a Microsoft Word document in Spanish describing the Basque morphology. It does not mention the treebank but it could help understand some of the tags. | |
* José Ignacio Hualde, Jon Ortiz de Urbina: [[http://books.google.cz/books?id=Kss999lxKm0C&printsec=frontcover&dq=grammar+of+basque&cd=1&redir_esc=y#v=onepage&q&f=false|A Grammar of Basque]]. Mouton de Gruyter, Berlin, 2003. ISBN 3-11-017683-1. | |
| |
==== Domain ==== | ==== Domain ==== |
| |
Newswire + unknown (“25000 word forms from EPEC (Aduriz et al., 2003) and 25000 word forms coming from newspapers that can be considered equivalent to the other corpora in the project [3LB, i.e. Catalan and Spanish]”; “EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing”). | Mixed: |
| * Fiction |
| * Short essays by 14 to 16 year-old students |
| * Newspapers (Népszabadság, Népszava, Magyar Hírlap, HVG) |
| * Texts related to computer science |
| * Legal texts |
| * Economic and financial short news |
| |
==== Size ==== | ==== Size ==== |
| |
The CoNLL 2007 dataset was officially split into training and test part. The data split of BDT-II was provided by Koldo Gojenola and should correspond to data split used in parsing experiments published by the IXA Group. | According to their website, SzTB 2.0 contains 1.2 million words plus 250 thousand punctuation tokens in 82000 sentences. Only a fragment was converted to dependencies in the CoNLL 2007 version: 139,143 tokens in 6424 sentences, yielding 21.66 tokens per sentence on average (131,799 tokens / 6034 sentences training, 7344 tokens / 390 sentences test). |
| |
^ Version ^ Train Sentences ^ Train Tokens ^ D-test Sentences ^ D-test Tokens ^ E-test Sentences ^ E-test Tokens ^ Total Sentences ^ Total Tokens ^ Sentence Length ^ | |
| CoNLL 2007 | 3190 | 50526 | 334 | 5390 | | | 3524 | 55916 | 15.87 | | |
| BDT-II | 9094 | 124,684 | 1010 | 12625 | 1122 | 14295 | 11226 | 151,604 | 13.50 | | |
| |
==== Inside ==== | ==== Inside ==== |
| |
Both versions (CoNLL 2007 and BDT-II) are in the CoNLL 2006/2007 format. | The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format. |
| |
Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!): | Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column. |
| |
* IZE = noun | Personal names have been collapsed into one token, using underscore as the joining character (e.g. Torgyán_József). |
* ARR = common | |
* IZB = proper name | |
* LIB = place name | |
* ZKI = number | |
* ADJ = adjective | |
* ARR = common | |
* GAL = question | |
* ADI = verb | |
* SIN = simple | |
* ADK = composed | |
* ADP = periphrastic | |
* FAK = factitive | |
* ADB = adverb | |
* ARR = common | |
* GAL = question | |
* DET = determiner | |
* ERKARR = demonstrative common | |
* ERKIND = demonstrative emphatic | |
* NOLARR = indefinite common | |
* NOLGAL = indefinite question | |
* ZNB = number | |
* DZH = definite | |
* BAN = distributive | |
* ORD = ordinal | |
* DZG = indefinite | |
* ORO = general | |
* IOR = pronoun | |
* PERARR = personal common | |
* PERIND = personal emphatic | |
* IZGMGB = indefinite | |
* IZGGAL = question | |
* BIH = ??? | |
* ELK = ??? | |
* LOT = link | |
* LOK = connector | |
* JNT = conjunction | |
* PRT = particle | |
* ITJ = interjection | |
* BST = other | |
* ADL = auxiliary verb | |
* ADT = synthetic verb | |
* SIG = acronym | |
* SNB = symbol | |
* LAB = abbreviation | |
| |
Main features: | |
| |
* KAS = case. Various descriptions of Basque grammar list different numbers of cases and it is not easy to match all of the BDT case tags with them. Some but not all of them are described in the Annex 3 of the technical report mentioned above. The following list gives all case tags occurring in BDT with their frequencies in brackets. | |
* KAS:ABL (984) = ablativo = ablative | |
* KAS:ABS (22805) = absolutivo = absolutive | |
* KAS:ABU (32) = adlativo terminal ("-raino") = "until, as far as" = terminative | |
* KAS:ABZ (27) = adlativo direccional ("-rantz") = "towards" ~ lative? | |
* KAS:ALA (1093) = adlativo = allative | |
* KAS:BNK (13) =? special case of the locative genitive ("-ko", "-eko") | |
* KAS:DAT (1451) = dativo = dative | |
* KAS:DES (181) = destinativo = benefactive ("-entzat") | |
* KAS:DESK (223) =? descriptive locative genitive ("-ko", "-eko"), also frequently used for counted noun after numeral | |
* KAS:EM (705) = multiword token with postposition (e.g. "_gabe", "_arabera", "_batera", "_bezala"...) | |
* KAS:ERG (6059) = ergativo = ergative | |
* KAS:GEL (6259) = genitivo locativo = locative genitive | |
* KAS:GEN (4307) = genitivo de posesión = possessive genitive | |
* KAS:INE (7690) = inesivo = inessive | |
* KAS:INS (1370) = instrumental | |
* KAS:MOT (165) = motivativo = causative | |
* KAS:PAR (930) = partitivo = partitive | |
* KAS:PRO (89) = prolativo = essive | |
* KAS:SOZ (928) = asociativo = comitative | |
* ASP = aspect | |
* ERL = relation (relative sentence, completive sentence, indirect question...) | |
| |
List of all 286 features found in the corpus with frequencies: | |
* ADM:ADIZE 3612 | |
* ADM:ADOIN 2919 | |
* ADM:PART 14711 | |
* ASP:BURU 7491 | |
* ASP:EZBU 2421 | |
* ASP:GERO 2166 | |
* ASP:PNT 6631 | |
* BIZ:+ 2303 | |
* BIZ:- 22116 | |
* ENT:??? 35 | |
* ENT:Erakundea 3499 | |
* ENT:Pertsona 4401 | |
* ENT:Tokia 3949 | |
* ERL:AURK 1264 | |
* ERL:BALD 332 | |
* ERL:DENB 390 | |
* ERL:EMEN 5969 | |
* ERL:ERLT 1531 | |
* ERL:ESPL 129 | |
* ERL:HAUT 408 | |
* ERL:HELB 925 | |
* ERL:KAUS 864 | |
* ERL:KONPL 2614 | |
* ERL:KONT 215 | |
* ERL:MOD 1152 | |
* ERL:MOD/DENB 244 | |
* ERL:MOS 146 | |
* ERL:ONDO 160 | |
* ERL:ZHG 232 | |
* HIT:NO 50 | |
* HIT:TO 38 | |
* IZAUR:+ 1499 | |
* IZAUR:- 5930 | |
* KAS:ABL 984 | |
* KAS:ABS 22807 | |
* KAS:ABU 32 | |
* KAS:ABZ 27 | |
* KAS:ALA 1094 | |
* KAS:BNK 13 | |
* KAS:DAT 1451 | |
* KAS:DES 181 | |
* KAS:DESK 223 | |
* KAS:EM 707 | |
* KAS:ERG 6059 | |
* KAS:GEL 6266 | |
* KAS:GEN 4307 | |
* KAS:INE 7693 | |
* KAS:INS 1370 | |
* KAS:MOT 165 | |
* KAS:PAR 930 | |
* KAS:PRO 89 | |
* KAS:SOZ 928 | |
* KLM:AM 80 | |
* KLM:HAS 2 | |
* MAI:GEHI 38 | |
* MAI:IND 36 | |
* MAI:KONP 244 | |
* MAI:SUP 406 | |
* MDN:A1 11766 | |
* MDN:A3 107 | |
* MDN:A4 1 | |
* MDN:A5 282 | |
* MDN:B1 6666 | |
* MDN:B2 185 | |
* MDN:B3 11 | |
* MDN:B4 59 | |
* MDN:B5A 1 | |
* MDN:B5B 27 | |
* MDN:B6 1 | |
* MDN:B7 79 | |
* MDN:B8 38 | |
* MDN:C 52 | |
* MOD:EGI 2244 | |
* MOD:ZIU 126 | |
* MTKAT:LAB 16 | |
* MTKAT:SIG 696 | |
* MTKAT:SNB 22 | |
* MUG:M 42116 | |
* MUG:MG 8449 | |
* MW:B 3615 | |
* NEUR:- 193 | |
* NMG:MG 1055 | |
* NMG:P 2690 | |
* NMG:S 2156 | |
* NOR:GU 223 | |
* NOR:HAIEK 4248 | |
* NOR:HI 20 | |
* NOR:HURA 14342 | |
* NOR:NI 337 | |
* NOR:ZU 93 | |
* NOR:ZUEK 12 | |
* NORI:GURI 124 | |
* NORI:HAIEI 306 | |
* NORI:HARI 1085 | |
* NORI:HIRI-NO 2 | |
* NORI:HIRI-TO 5 | |
* NORI:NIRI 152 | |
* NORI:ZUEI 12 | |
* NORI:ZURI 39 | |
* NORK:GUK 721 | |
* NORK:HAIEK-K 2618 | |
* NORK:HARK 5981 | |
* NORK:HIK 6 | |
* NORK:HIK-NO 10 | |
* NORK:HIK-TO 8 | |
* NORK:NIK 662 | |
* NORK:ZUEK-K 46 | |
* NORK:ZUK 208 | |
* NUM:P 9347 | |
* NUM:PH 172 | |
* NUM:S 32570 | |
* PER:GU 242 | |
* PER:HAIEK 93 | |
* PER:HI 14 | |
* PER:HURA 1 | |
* PER:NI 290 | |
* PER:ZU 60 | |
* PER:ZUEK 29 | |
* PLU:+ 149 | |
* PLU:- 10257 | |
* POS:+ 2353 | |
* POS:POSAldeko 2 | |
* POS:POSAurkako 1 | |
* POS:POSGabeko 1 | |
* POS:POSInguruko 1 | |
* POS:POSKontrako 2 | |
* POS:POSaintzinean 1 | |
* POS:POSaitzina 2 | |
* POS:POSaitzinean 5 | |
* POS:POSaitzineko 2 | |
* POS:POSaitzinetik 3 | |
* POS:POSalboan 2 | |
* POS:POSaldamenetik 1 | |
* POS:POSalde 38 | |
* POS:POSaldean 11 | |
* POS:POSaldeaz 1 | |
* POS:POSaldeko 37 | |
* POS:POSaldera 20 | |
* POS:POSalderat 1 | |
* POS:POSaldetik 25 | |
* POS:POSantzean 1 | |
* POS:POSantzeko 9 | |
* POS:POSantzekoa 2 | |
* POS:POSantzera 3 | |
* POS:POSarabera 135 | |
* POS:POSaraberako 1 | |
* POS:POSarte 82 | |
* POS:POSartean 158 | |
* POS:POSarteetik 1 | |
* POS:POSarteko 108 | |
* POS:POSartekoak 1 | |
* POS:POSat 6 | |
* POS:POSatzean 15 | |
* POS:POSatzeko 6 | |
* POS:POSatzera 1 | |
* POS:POSatzetik 12 | |
* POS:POSaurka 103 | |
* POS:POSaurkaa 1 | |
* POS:POSaurkako 48 | |
* POS:POSaurrean 74 | |
* POS:POSaurreko 10 | |
* POS:POSaurrera 36 | |
* POS:POSaurrerako 2 | |
* POS:POSaurretik 26 | |
* POS:POSazpian 9 | |
* POS:POSazpitik 6 | |
* POS:POSbaitan 12 | |
* POS:POSbarik 2 | |
* POS:POSbarna 1 | |
* POS:POSbarnean 11 | |
* POS:POSbarneko 2 | |
* POS:POSbarnera 1 | |
* POS:POSbarrena 4 | |
* POS:POSbarrenean 1 | |
* POS:POSbarru 7 | |
* POS:POSbarruan 37 | |
* POS:POSbarruetatik 1 | |
* POS:POSbarruko 3 | |
* POS:POSbarrura 1 | |
* POS:POSbarrutik 2 | |
* POS:POSbatera 42 | |
* POS:POSbatera 1 | |
* POS:POSbegira 31 | |
* POS:POSbehera 11 | |
* POS:POSbestaldean 1 | |
* POS:POSbezala 75 | |
* POS:POSbezalako 15 | |
* POS:POSbezalakoa 1 | |
* POS:POSbezalakoen 1 | |
* POS:POSbidez 45 | |
* POS:POSbila 20 | |
* POS:POSbitarte 2 | |
* POS:POSbitartean 18 | |
* POS:POSbitarteko 5 | |
* POS:POSbitarterako 1 | |
* POS:POSbitartez 13 | |
* POS:POSburuan 7 | |
* POS:POSburuz 47 | |
* POS:POSburuzko 36 | |
* POS:POSeran 1 | |
* POS:POSerdian 11 | |
* POS:POSerdiko 1 | |
* POS:POSerdira 3 | |
* POS:POSerditan 1 | |
* POS:POSeske 2 | |
* POS:POSesker 30 | |
* POS:POSesku 12 | |
* POS:POSeskuetan 5 | |
* POS:POSeskuko 1 | |
* POS:POSeskutik 6 | |
* POS:POSezean 4 | |
* POS:POSgabe 74 | |
* POS:POSgabeko 17 | |
* POS:POSgain 36 | |
* POS:POSgaindi 1 | |
* POS:POSgaindiko 1 | |
* POS:POSgainean 33 | |
* POS:POSgaineko 12 | |
* POS:POSgainera 9 | |
* POS:POSgainerat 1 | |
* POS:POSgainetik 16 | |
* POS:POSgero 1 | |
* POS:POSgeroztik 18 | |
* POS:POSgertu 4 | |
* POS:POSgibeleko 1 | |
* POS:POSgibeletik 2 | |
* POS:POSgisa 34 | |
* POS:POSgisako 1 | |
* POS:POSgisan 2 | |
* POS:POSgisara 1 | |
* POS:POSgoiko 1 | |
* POS:POSgoitik 1 | |
* POS:POSgora 30 | |
* POS:POSgorago 1 | |
* POS:POSgorako 7 | |
* POS:POSgorakoen 1 | |
* POS:POShurbil 8 | |
* POS:POShurrean 1 | |
* POS:POSinguru 16 | |
* POS:POSingurua 1 | |
* POS:POSinguruan 77 | |
* POS:POSinguruetako 1 | |
* POS:POSinguruetan 2 | |
* POS:POSinguruetara 1 | |
* POS:POSinguruko 27 | |
* POS:POSingurura 5 | |
* POS:POSingururako 1 | |
* POS:POSirian 1 | |
* POS:POSkanpo 28 | |
* POS:POSkanpoko 12 | |
* POS:POSkanpora 4 | |
* POS:POSkontra 72 | |
* POS:POSkontrako 38 | |
* POS:POSlanda 7 | |
* POS:POSlandara 2 | |
* POS:POSlegez 1 | |
* POS:POSlekuan 4 | |
* POS:POSlepora 1 | |
* POS:POSmendean 1 | |
* POS:POSmenpe 8 | |
* POS:POSmenpera 1 | |
* POS:POSmoduan 1 | |
* POS:POSmodura 1 | |
* POS:POSondoan 19 | |
* POS:POSondoko 1 | |
* POS:POSondora 1 | |
* POS:POSondoren 32 | |
* POS:POSondorengo 2 | |
* POS:POSondotik 14 | |
* POS:POSordez 9 | |
* POS:POSostean 17 | |
* POS:POSosteko 1 | |
* POS:POSpare 1 | |
* POS:POSparean 5 | |
* POS:POSpareko 2 | |
* POS:POSpartean 3 | |
* POS:POSpartez 1 | |
* POS:POSpean 1 | |
* POS:POStruke 9 | |
* POS:POSurrun 3 | |
* POS:POSurruti 3 | |
* POS:POSzai 2 | |
* POS:POSzain 12 | |
* POS:POSzehar 42 | |
* ZENB:- 192 | |
* _ 36940 | |
| |
The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags. | |
| |
Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. Espainia_Poliziak, iduri_zait). | |
| |
==== Sample ==== | ==== Sample ==== |
The first sentence of the CoNLL 2007 training data: | The first sentence of the CoNLL 2007 training data: |
| |
| 1 | espainiako_poliziak | Espainia_Poliziak | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>ENTI_LOC | 4 | ncsubj | _ | _ | | | 1 | Az | az | T | Tf | <nowiki>def=yes</nowiki> | 4 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | hiru | hiru | DET | DET_DZH | NMGP | 3 | detmod | _ | _ | | | 2 | elmúlt | elmúlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 4 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | gazte | gazte | IZE | IZE_ARR | ABS<nowiki>|</nowiki>MG | 4 | ncobj | _ | _ | | | 3 | nyolc | nyolc | M | Mc | <nowiki>n=singular|case=nominative</nowiki> | 4 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | atxilotu | atxilotu | ADI | ADI_SIN | PART<nowiki>|</nowiki>BURU | 8 | lot | _ | _ | | | 4 | hónapban | hónap | N | Nc | <nowiki>n=singular|case=inessive|proper=no</nowiki> | 16 | INE | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | ditu | *edun | ADL | ADL | A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK | 4 | auxmod | _ | _ | | | 5 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 6 | atarrabian | Atarrabia | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | 4 | ncmod | _ | _ | | | 6 | amelyből | amely | P | Pr | <nowiki>p=3rd|n=singular|case=elative</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 7 | , | , | PUNC | PUNC_KOMA | _ | 6 | PUNC | _ | _ | | | 7 | összesen | összesen | R | Rx | <nowiki>_</nowiki> | 8 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 8 | eta | eta | LOT | LOT_JNT | - | 0 | ROOT | _ | _ | | | 8 | hatot | hat | M | Mc | <nowiki>n=singular|case=accusative</nowiki> | 11 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 9 | madrilera | Madril | IZE | IZE_LIB | PLU-<nowiki>|</nowiki>ALA<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | 10 | ncmod | _ | _ | | | 9 | kényszerűségből | kényszerűség | N | Nc | <nowiki>n=singular|case=elative|proper=no</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 10 | eraman | eraman | ADI | ADI_SIN | PART<nowiki>|</nowiki>BURU | 8 | lot | _ | _ | | | 10 | szabadságon | szabadság | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 11 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 11 | ditu | *edun | ADL | ADL | A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK | 10 | auxmod | _ | _ | | | 11 | töltött | tölt | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki> | 16 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 12 | . | . | PUNC | PUNC_PUNC | _ | 11 | PUNC | _ | _ | | | 12 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 14 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 13 | parlamenti | parlamenti | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 14 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 14 | ellenzék | ellenzék | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 11 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 15 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 16 | megváltozott | megváltozik | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 17 | itthon | itthon | R | Rx | <nowiki>_</nowiki> | 16 | LOCY | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 18 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 19 | hatalommegosztás | hatalommegosztás | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 20 | <nowiki>1990-ben</nowiki> | 1990 | M | Mc | <nowiki>n=singular|case=inessive</nowiki> | 21 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 21 | kialakított | kialakított | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 22 | rendszere | rendszer | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 16 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 23 | <nowiki>:</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 24 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 26 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 25 | e | e | P | Pd | <nowiki>p=3rd|n=singular|case=nominative</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 26 | héten | hét | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 27 | audienciát | audiencia | N | Nc | <nowiki>n=singular|case=accusative|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 28 | tartó | tartó | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 29 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 29 | kormányfő | kormányfő | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 31 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 30 | gyakorlatilag | gyakorlati | A | Af | <nowiki>deg=positive|n=singular|case=essive</nowiki> | 31 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 31 | kivonta | kivon | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=yes</nowiki> | 16 | CP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 32 | magát | maga | P | Px | <nowiki>p=3rd|n=singular|case=accusative</nowiki> | 31 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 33 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 34 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 34 | Országgyűlés | Országgyűlés | N | Np | <nowiki>n=singular|case=nominative|proper=yes</nowiki> | 35 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 35 | ellenőrzése | ellenőrzés | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 36 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 36 | alól | alól | S | St | <nowiki>_</nowiki> | 31 | PP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 37 | <nowiki>.</nowiki> | <nowiki>_</nowiki> | SPUNCT | SPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| |
The first sentence of the CoNLL 2007 test data: | The first sentence of the CoNLL 2007 test data: |
| |
| 1 | epaileek | epaile | IZE | IZE_ARR | BIZ+<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | | | 1 | A | a | T | Tf | <nowiki>def=yes</nowiki> | 2 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | diote | esan | ADT | ADT | PNT<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NK_HAIEK-K | | | 2 | bankokkal | bank | N | Nc | <nowiki>n=plural|case=instrumental|proper=no</nowiki> | 4 | INS | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | eaeko | EAE | IZE | IZE_LIB | SIG<nowiki>|</nowiki>GEL<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | | | 3 | kell | kell | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | parlamentarioek | parlamentario | ADJ | ADJ_ARR | IZAUR-<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | | | 4 | egyezkedniük | egyezkedik | V | Vm | <nowiki>mood=infinitive|t=present|p=3rd|n=plural</nowiki> | 3 | INF | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | eaetik_kanpo | EAE | SIG | SIG- | DEK<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>ABL_kanpo_ABS<nowiki>|</nowiki>ENTI_LOC<nowiki>|</nowiki>POS | | | 5 | azoknak | az | P | Pd | <nowiki>p=3rd|n=plural|case=dative</nowiki> | 8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 6 | eginiko | egin | ADI | ADI_SIN | PART<nowiki>|</nowiki>GEL | | | 6 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 8 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 7 | delituak | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | | | 7 | mezőgazdasági | mezőgazdasági | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 8 | ikertzea | ikertu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | | | 8 | termelőknek | termelő | N | Nc | <nowiki>n=plural|case=dative|proper=no</nowiki> | 4 | DAT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 9 | eta | eta | LOT | LOT_JNT | - | | | 9 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 3 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 10 | epaitzea | epaitu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | | | 10 | akik | aki | P | Pr | <nowiki>p=3rd|n=plural|case=nominative</nowiki> | 21 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 11 | auzitegi_gorenari | auzitegi_gora | ADJ | ADJ_IZO | DEK<nowiki>|</nowiki>GEN<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>DAT<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC | | | 11 | egy | egy | T | Ti | <nowiki>def=no</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 12 | dagokiola | egon | ADT | ADT | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NI_HARI | | | 12 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 19 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 13 | , | , | PUNC | PUNC_KOMA | _ | | | 13 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 15 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 14 | baina | baina | LOT | LOT_JNT | AURK | | | 14 | múlt | múlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 15 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 15 | atzerrian | atzerri | IZE | IZE_ARR | INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM | | | 15 | héten | hét | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 16 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 16 | izaniko | izan | ADI | ADI_SIN | PART<nowiki>|</nowiki>GEL | | | 16 | megjelent | megjelent | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 19 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 17 | kontaktu | kontaktu | IZE | IZE_ARR | _ | | | 17 | földművelésügyi | földművelésügyi | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 18 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 18 | horiek | horiek | DET | DET_ERKARR | ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM | | | 18 | minisztériumi | minisztériumi | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 19 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 19 | ezin_direla | ezin_izan | ADI | ADI_ADK | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>MWCorrect | | | 19 | rendelet | rendelet | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 20 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 20 | delitutzat | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>PRO<nowiki>|</nowiki>MG | | | 20 | alapján | alap | N | Nc | <nowiki>n=singular|case=superessive|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 21 | hartu | hartu | ADI | ADI_SIN | PART | | | 21 | kérik | kér | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=plural|def=yes</nowiki> | 5 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 22 | . | . | PUNC | PUNC_PUNC | _ | | | 22 | ősszel | ősszel | R | Rx | <nowiki>_</nowiki> | 23 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 23 | lejáró | lejáró | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
The first sentence of the BDT-II training data: | | 24 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 27 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| | 25 | éven | év | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 1 | Estatu_Batuetako_DEAko | Estatu_Batuak_DEA | IZE | LIB | PLU:+<nowiki>|</nowiki>IZAUR:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>MW:B<nowiki>|</nowiki>ENT:Erakundea | 2 | ncmod | _ | _ | | | 26 | belüli | belüli | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 2 | buru | buru | IZE | ARR | _ | 4 | ncsubj | _ | _ | | | 27 | hiteleik | hitel | N | Nc | <nowiki>n=plural|case=nominative|proper=no|pperson=3rd|pnumber=plural</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 3 | ohiak | ohi | ADJ | ARR | IZAUR:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | | 28 | átütemezését | átütemezés | N | Nc | <nowiki>n=singular|case=accusative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 4 | aztertuko | aztertu | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:GERO | 0 | ROOT | _ | _ | | | 29 | <nowiki>.</nowiki> | <nowiki>_</nowiki> | SPUNCT | SPUNCT | <nowiki>_</nowiki> | 3 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | |
| 5 | du | *edun | ADL | ADL | MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 4 | auxmod | _ | _ | | |
| 6 | RUCen | RUC | IZE | IZB | MTKAT:SIG<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Erakundea | 7 | ncmod | _ | _ | | |
| 7 | erreforma | erreforma | IZE | ARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncobj | _ | _ | | |
| 8 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 7 | PUNC | _ | _ | | |
| |
The first sentence of the BDT-II development data: | |
| |
| 1 | Irakaskuntzan | irakaskuntza | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | |
| 2 | jardun | jardun | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:BURU | 0 | ROOT | _ | _ | | |
| 3 | zuen | *edun | ADL | ADL | MDN:B1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 2 | auxmod | _ | _ | | |
| 4 | Miel | Miel | IZE | IZB | PLU:-<nowiki>|</nowiki>ENT:Pertsona | 5 | entios | _ | _ | | |
| 5 | Anjel_Elustondok | Anjel_Elustondo | IZE | IZB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Pertsona | 2 | ncsubj | _ | _ | | |
| 6 | 1980 | 1980 | IZE | ZKI | _ | 7 | ncmod | _ | _ | | |
| 7 | urtetik | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:ABL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | |
| 8 | 1992ra | 1992 | IZE | ZKI | KAS:ALA<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | |
| 9 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 8 | PUNC | _ | _ | | |
| 10 | hauetatik | hauek | DET | ERKARR | KAS:ABL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | | |
| 11 | hamar | hamar | DET | DZH | NMG:P | 12 | detmod | _ | _ | | |
| 12 | urtez | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INS<nowiki>|</nowiki>MUG:MG | 16 | lot | _ | _ | | |
| 13 | Azpeitiko | Azpeitia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 14 | ncmod | _ | _ | | |
| 14 | ikastolan | ikastola | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | | |
| 15 | irakasle | irakasle | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG | 16 | ncpred | _ | _ | | |
| 16 | eta | eta | LOT | JNT | ERL:EMEN | 8 | aponcmod | _ | _ | | |
| 17 | beste | beste | DET | DZG | _ | 18 | detmod | _ | _ | | |
| 18 | biak | bi | IZE | ZKI | KAS:ABS<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | lot | _ | _ | | |
| 19 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 18 | PUNC | _ | _ | | |
| 20 | Arabako | Araba | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 21 | ncmod | _ | _ | | |
| 21 | ikastolen | ikastola | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 22 | ncmod | _ | _ | | |
| 22 | elkartean | elkarte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | | |
| 23 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 22 | PUNC | _ | _ | | |
| |
The first sentence of the BDT-II test data: | |
| |
| 1 | Hegoaldean | hegoalde | IZE | ARR | KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ | | |
| 2 | iduri_zait | iduri_izan | ADI | ADK | ASP:PNT<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORI:NIRI<nowiki>|</nowiki>MW:B | 0 | ROOT | _ | _ | | |
| 3 | euskararen | euskara | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncmod | _ | _ | | |
| 4 | mundu | mundu | IZE | ARR | BIZ:- | 7 | ncsubj | _ | _ | | |
| 5 | hau | hau | DET | ERKARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | detmod | _ | _ | | |
| 6 | adi-adi | adi-adi | ADB | ARR | _ | 7 | ncmod | _ | _ | | |
| 7 | dagola | egon | ADT | ADT | ASP:PNT<nowiki>|</nowiki>ERL:KONPL<nowiki>|</nowiki>MDN:A3<nowiki>|</nowiki>NOR:HURA | 2 | ccomp_obj | _ | _ | | |
| 8 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 7 | PUNC | _ | _ | | |
| 9 | Euskaltzaindiak | Euskaltzaindia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 11 | ncsubj | _ | _ | | |
| 10 | zer | zer | DET | NOLGAL | NMG:MG<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 11 | ncobj | _ | _ | | |
| 11 | erranen | erran | ADI | SIN | ADM:PART<nowiki>|</nowiki>ASP:GERO | 13 | menos | _ | _ | | |
| 12 | duen | *edun | ADL | ADL | ERL:ZHG<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 11 | auxmod | _ | _ | | |
| 13 | zain | zain | ADB | ARR | _ | 7 | cmod | _ | _ | | |
| 14 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 13 | PUNC | _ | _ | | |
| 15 | haren | hura | DET | ERKARR | KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ | | |
| 16 | arauen | arau | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG | 18 | ncmod | _ | _ | | |
| 17 | berehala | berehala | ADB | ARR | _ | 18 | ncmod | _ | _ | | |
| 18 | betetzeko | bete | ADI | SIN | ADM:ADIZE<nowiki>|</nowiki>ERL:HELB<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 7 | xmod | _ | _ | | |
| 19 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 18 | PUNC | _ | _ | | |
| |
==== Parsing ==== | ==== Parsing ==== |
| |
BDT is a mildly nonprojective treebank. 1925 of the 151,604 tokens of combined BDT-II training and test sets are attached nonprojectively (1.27%). | SzTB is a mildly nonprojective treebank. 4032 of the 139,143 tokens of the CoNLL 2007 version are attached nonprojectively (2.9%). |
| |
The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek: | The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Hungarian: |
| |
^ Parser (Authors) ^ LAS ^ UAS ^ | ^ Parser (Authors) ^ LAS ^ UAS ^ |
| Malt (Nilsson et al.) | 76.94 | 82.84 | | | Malt (Nilsson et al.) | 80.27 | 83.55 | |
| Titov et al. | 75.49 | 81.93 | | | Sagae | 79.53 | 83.51 | |
| Sagae | 74.64 | 81.19 | | | Nakagawa | 76.74 | 82.49 | |
| Carreras | 75.75 | 81.11 | | | Titov et al. | 77.94 | 82.18 | |
| Nakagawa | 72.56 | 81.04 | | |
| Malt (J. Hall et al.) | 74.99 | 80.61 | | |
| Johansson et al. | 75.08 | 80.43 | | |
| |
The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. | The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. |
| |
Parsing results on BDT-II have been published in Kepa Bengoetxea, Koldo Gojenola: [[http://aclweb.org/anthology-new/W/W10/W10-1404.pdf|Application of Different Techniques to Dependency Parsing of Basque]]. In: Proceedings of the First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), NAACL Workshop, Los Angeles, California, USA, 2010. They report only Labeled Attachment Score (LAS) and their best system achieved LAS = 78.98%. | |