user:zeman:treebanks:hu [ufal wiki]

Szeged Treebank 1.0 (shallow parse)
Szeged Treebank 2.0 (full parse)
CoNLL 2007 (based on SzTB 2.0)

The Szeged Treebank is available for research free of charge, provided the user signs the license agreement first. The website uses JavaScript to manage content, which makes it difficult to directly link to relevant sections. Click on “downloads” (letöltések) to get the list of downloadable corpora and links to their descriptions (e.g. Szeged Treebank 2.0). To obtain the treebank, one is supposed to complete the license form, print it, sign it and fax it to +36-62-546397 or mail it to Vincze Veronika, Árpád tér 2, H-6720 Szeged. You will be given a user ID and password needed to download the data. There are links to Microsoft Word documents with the license agreement but they do not work for me. Ask Veronika Vincze how to proceed (vinczev (at) inf (dot) u-szeged (dot) hu).

Republication of the CoNLL 2007 version in the LDC is planned but it has not happened yet.

The CoNLL 2007 license in short:

non-profit education and research purposes
no redistribution
no modification
cite the principal publication (see below) in publications

SzTB was created by members of the Human Language Technology Group (Nyelvtechnológiai Csoport), Department of Informatics (Informatikai Tanszékcsoport), University of Szeged (Szegedi Tudományegyetem), Árpád tér 2, H-6720 Szeged, Hungary. Conversion from constituents to dependencies for the CoNLL 2007 shared task was done by Zoltán Alexin.

Website
Data
- no separate citation
Principal publications
- Dóra Csendes, János Csirik, Tibor Gyimóthy, András Kocsor: The Szeged Treebank In: Václav Matoušek, Pavel Mautner, Tomáš Pavelka (eds.): Text, Speech and Dialogue. 8th International Conference, TSD 2005, Karlovy Vary, Czech Republic, September 12-15, 2005. Proceedings. Lecture Notes in Computer Science, vol. 3658/2005, pp. 123-131, Springer-Verlag, Berlin / Heidelberg, Germany, 2005. ISSN 0302-9743, ISBN 978-3-540-28789-6.
Documentation
- The doc/README file in the CoNLL 2007 data distribution contains a quick guide to part of speech tags. There are also several PDF documents with detailed documentation of the annotation.
- A lot of useful information on SzTB 2.0 (original, not CoNLL version), including morphosyntax, can be found at the abovementioned website.

Newswire + unknown (“25000 word forms from EPEC (Aduriz et al., 2003) and 25000 word forms coming from newspapers that can be considered equivalent to the other corpora in the project [3LB, i.e. Catalan and Spanish]”; “EPEC, a corpus of written Basque tagged at morphological and syntactic levels for the automatic processing”).

The CoNLL 2007 dataset was officially split into training and test part. The data split of BDT-II was provided by Koldo Gojenola and should correspond to data split used in parsing experiments published by the IXA Group.

Version	Train Sentences	Train Tokens	D-test Sentences	D-test Tokens	E-test Sentences	E-test Tokens	Total Sentences	Total Tokens	Sentence Length
CoNLL 2007	3190	50526	334	5390			3524	55916	15.87
BDT-II	9094	124,684	1010	12625	1122	14295	11226	151,604	13.50

Both versions (CoNLL 2007 and BDT-II) are in the CoNLL 2006/2007 format.

Part of speech tag description (obtained per e-mail from Koldo Gojenola, thanks!):

IZE = noun
- ARR = common
- IZB = proper name
- LIB = place name
- ZKI = number
ADJ = adjective
- ARR = common
- GAL = question
ADI = verb
- SIN = simple
- ADK = composed
- ADP = periphrastic
- FAK = factitive
ADB = adverb
- ARR = common
- GAL = question
DET = determiner
- ERKARR = demonstrative common
- ERKIND = demonstrative emphatic
- NOLARR = indefinite common
- NOLGAL = indefinite question
- ZNB = number
- DZH = definite
- BAN = distributive
- ORD = ordinal
- DZG = indefinite
- ORO = general
IOR = pronoun
- PERARR = personal common
- PERIND = personal emphatic
- IZGMGB = indefinite
- IZGGAL = question
- BIH = ???
- ELK = ???
LOT = link
- LOK = connector
- JNT = conjunction
PRT = particle
ITJ = interjection
BST = other
ADL = auxiliary verb
ADT = synthetic verb
SIG = acronym
SNB = symbol
LAB = abbreviation

Main features:

KAS = case. Various descriptions of Basque grammar list different numbers of cases and it is not easy to match all of the BDT case tags with them. Some but not all of them are described in the Annex 3 of the technical report mentioned above. The following list gives all case tags occurring in BDT with their frequencies in brackets.
- KAS:ABL (984) = ablativo = ablative
- KAS:ABS (22805) = absolutivo = absolutive
- KAS:ABU (32) = adlativo terminal (“-raino”) = “until, as far as” = terminative
- KAS:ABZ (27) = adlativo direccional (“-rantz”) = “towards” ~ lative?
- KAS:ALA (1093) = adlativo = allative
- KAS:BNK (13) =? special case of the locative genitive (“-ko”, “-eko”)
- KAS:DAT (1451) = dativo = dative
- KAS:DES (181) = destinativo = benefactive (“-entzat”)
- KAS:DESK (223) =? descriptive locative genitive (“-ko”, “-eko”), also frequently used for counted noun after numeral
- KAS:EM (705) = multiword token with postposition (e.g. “_gabe”, “_arabera”, “_batera”, “_bezala”…)
- KAS:ERG (6059) = ergativo = ergative
- KAS:GEL (6259) = genitivo locativo = locative genitive
- KAS:GEN (4307) = genitivo de posesión = possessive genitive
- KAS:INE (7690) = inesivo = inessive
- KAS:INS (1370) = instrumental
- KAS:MOT (165) = motivativo = causative
- KAS:PAR (930) = partitivo = partitive
- KAS:PRO (89) = prolativo = essive
- KAS:SOZ (928) = asociativo = comitative
ASP = aspect
ERL = relation (relative sentence, completive sentence, indirect question…)

List of all 286 features found in the corpus with frequencies:

ADM:ADIZE 3612
ADM:ADOIN 2919
ADM:PART 14711
ASP:BURU 7491
ASP:EZBU 2421
ASP:GERO 2166
ASP:PNT 6631
BIZ:+ 2303
BIZ:- 22116
ENT:??? 35
ENT:Erakundea 3499
ENT:Pertsona 4401
ENT:Tokia 3949
ERL:AURK 1264
ERL:BALD 332
ERL:DENB 390
ERL:EMEN 5969
ERL:ERLT 1531
ERL:ESPL 129
ERL:HAUT 408
ERL:HELB 925
ERL:KAUS 864
ERL:KONPL 2614
ERL:KONT 215
ERL:MOD 1152
ERL:MOD/DENB 244
ERL:MOS 146
ERL:ONDO 160
ERL:ZHG 232
HIT:NO 50
HIT:TO 38
IZAUR:+ 1499
IZAUR:- 5930
KAS:ABL 984
KAS:ABS 22807
KAS:ABU 32
KAS:ABZ 27
KAS:ALA 1094
KAS:BNK 13
KAS:DAT 1451
KAS:DES 181
KAS:DESK 223
KAS:EM 707
KAS:ERG 6059
KAS:GEL 6266
KAS:GEN 4307
KAS:INE 7693
KAS:INS 1370
KAS:MOT 165
KAS:PAR 930
KAS:PRO 89
KAS:SOZ 928
KLM:AM 80
KLM:HAS 2
MAI:GEHI 38
MAI:IND 36
MAI:KONP 244
MAI:SUP 406
MDN:A1 11766
MDN:A3 107
MDN:A4 1
MDN:A5 282
MDN:B1 6666
MDN:B2 185
MDN:B3 11
MDN:B4 59
MDN:B5A 1
MDN:B5B 27
MDN:B6 1
MDN:B7 79
MDN:B8 38
MDN:C 52
MOD:EGI 2244
MOD:ZIU 126
MTKAT:LAB 16
MTKAT:SIG 696
MTKAT:SNB 22
MUG:M 42116
MUG:MG 8449
MW:B 3615
NEUR:- 193
NMG:MG 1055
NMG:P 2690
NMG:S 2156
NOR:GU 223
NOR:HAIEK 4248
NOR:HI 20
NOR:HURA 14342
NOR:NI 337
NOR:ZU 93
NOR:ZUEK 12
NORI:GURI 124
NORI:HAIEI 306
NORI:HARI 1085
NORI:HIRI-NO 2
NORI:HIRI-TO 5
NORI:NIRI 152
NORI:ZUEI 12
NORI:ZURI 39
NORK:GUK 721
NORK:HAIEK-K 2618
NORK:HARK 5981
NORK:HIK 6
NORK:HIK-NO 10
NORK:HIK-TO 8
NORK:NIK 662
NORK:ZUEK-K 46
NORK:ZUK 208
NUM:P 9347
NUM:PH 172
NUM:S 32570
PER:GU 242
PER:HAIEK 93
PER:HI 14
PER:HURA 1
PER:NI 290
PER:ZU 60
PER:ZUEK 29
PLU:+ 149
PLU:- 10257
POS:+ 2353
POS:POSAldeko 2
POS:POSAurkako 1
POS:POSGabeko 1
POS:POSInguruko 1
POS:POSKontrako 2
POS:POSaintzinean 1
POS:POSaitzina 2
POS:POSaitzinean 5
POS:POSaitzineko 2
POS:POSaitzinetik 3
POS:POSalboan 2
POS:POSaldamenetik 1
POS:POSalde 38
POS:POSaldean 11
POS:POSaldeaz 1
POS:POSaldeko 37
POS:POSaldera 20
POS:POSalderat 1
POS:POSaldetik 25
POS:POSantzean 1
POS:POSantzeko 9
POS:POSantzekoa 2
POS:POSantzera 3
POS:POSarabera 135
POS:POSaraberako 1
POS:POSarte 82
POS:POSartean 158
POS:POSarteetik 1
POS:POSarteko 108
POS:POSartekoak 1
POS:POSat 6
POS:POSatzean 15
POS:POSatzeko 6
POS:POSatzera 1
POS:POSatzetik 12
POS:POSaurka 103
POS:POSaurkaa 1
POS:POSaurkako 48
POS:POSaurrean 74
POS:POSaurreko 10
POS:POSaurrera 36
POS:POSaurrerako 2
POS:POSaurretik 26
POS:POSazpian 9
POS:POSazpitik 6
POS:POSbaitan 12
POS:POSbarik 2
POS:POSbarna 1
POS:POSbarnean 11
POS:POSbarneko 2
POS:POSbarnera 1
POS:POSbarrena 4
POS:POSbarrenean 1
POS:POSbarru 7
POS:POSbarruan 37
POS:POSbarruetatik 1
POS:POSbarruko 3
POS:POSbarrura 1
POS:POSbarrutik 2
POS:POSbatera 42
POS:POSbatera 1
POS:POSbegira 31
POS:POSbehera 11
POS:POSbestaldean 1
POS:POSbezala 75
POS:POSbezalako 15
POS:POSbezalakoa 1
POS:POSbezalakoen 1
POS:POSbidez 45
POS:POSbila 20
POS:POSbitarte 2
POS:POSbitartean 18
POS:POSbitarteko 5
POS:POSbitarterako 1
POS:POSbitartez 13
POS:POSburuan 7
POS:POSburuz 47
POS:POSburuzko 36
POS:POSeran 1
POS:POSerdian 11
POS:POSerdiko 1
POS:POSerdira 3
POS:POSerditan 1
POS:POSeske 2
POS:POSesker 30
POS:POSesku 12
POS:POSeskuetan 5
POS:POSeskuko 1
POS:POSeskutik 6
POS:POSezean 4
POS:POSgabe 74
POS:POSgabeko 17
POS:POSgain 36
POS:POSgaindi 1
POS:POSgaindiko 1
POS:POSgainean 33
POS:POSgaineko 12
POS:POSgainera 9
POS:POSgainerat 1
POS:POSgainetik 16
POS:POSgero 1
POS:POSgeroztik 18
POS:POSgertu 4
POS:POSgibeleko 1
POS:POSgibeletik 2
POS:POSgisa 34
POS:POSgisako 1
POS:POSgisan 2
POS:POSgisara 1
POS:POSgoiko 1
POS:POSgoitik 1
POS:POSgora 30
POS:POSgorago 1
POS:POSgorako 7
POS:POSgorakoen 1
POS:POShurbil 8
POS:POShurrean 1
POS:POSinguru 16
POS:POSingurua 1
POS:POSinguruan 77
POS:POSinguruetako 1
POS:POSinguruetan 2
POS:POSinguruetara 1
POS:POSinguruko 27
POS:POSingurura 5
POS:POSingururako 1
POS:POSirian 1
POS:POSkanpo 28
POS:POSkanpoko 12
POS:POSkanpora 4
POS:POSkontra 72
POS:POSkontrako 38
POS:POSlanda 7
POS:POSlandara 2
POS:POSlegez 1
POS:POSlekuan 4
POS:POSlepora 1
POS:POSmendean 1
POS:POSmenpe 8
POS:POSmenpera 1
POS:POSmoduan 1
POS:POSmodura 1
POS:POSondoan 19
POS:POSondoko 1
POS:POSondora 1
POS:POSondoren 32
POS:POSondorengo 2
POS:POSondotik 14
POS:POSordez 9
POS:POSostean 17
POS:POSosteko 1
POS:POSpare 1
POS:POSparean 5
POS:POSpareko 2
POS:POSpartean 3
POS:POSpartez 1
POS:POSpean 1
POS:POStruke 9
POS:POSurrun 3
POS:POSurruti 3
POS:POSzai 2
POS:POSzain 12
POS:POSzehar 42
ZENB:- 192
_ 36940

The syntactic guidelines (structure and labels) are described in Spanish in this technical report. See Appendix 3 for some lists of tags.

Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. Espainia_Poliziak, iduri_zait).

The first sentence of the CoNLL 2007 training data:

1	espainiako_poliziak	Espainia_Poliziak	IZE	IZE_LIB	PLU-\|ENTI_LOC	4	ncsubj	_	_
2	hiru	hiru	DET	DET_DZH	NMGP	3	detmod	_	_
3	gazte	gazte	IZE	IZE_ARR	ABS\|MG	4	ncobj	_	_
4	atxilotu	atxilotu	ADI	ADI_SIN	PART\|BURU	8	lot	_	_
5	ditu	*edun	ADL	ADL	A1\|NR_HAIEK\|NK_HARK	4	auxmod	_	_
6	atarrabian	Atarrabia	IZE	IZE_LIB	PLU-\|INE\|NUMS\|MUGM\|ENTI_LOC	4	ncmod	_	_
7	,	,	PUNC	PUNC_KOMA	_	6	PUNC	_	_
8	eta	eta	LOT	LOT_JNT	-	0	ROOT	_	_
9	madrilera	Madril	IZE	IZE_LIB	PLU-\|ALA\|NUMS\|MUGM\|ENTI_LOC	10	ncmod	_	_
10	eraman	eraman	ADI	ADI_SIN	PART\|BURU	8	lot	_	_
11	ditu	*edun	ADL	ADL	A1\|NR_HAIEK\|NK_HARK	10	auxmod	_	_
12	.	.	PUNC	PUNC_PUNC	_	11	PUNC	_	_

The first sentence of the CoNLL 2007 test data:

1	epaileek	epaile	IZE	IZE_ARR	BIZ+\|ERG\|NUMP\|MUGM
2	diote	esan	ADT	ADT	PNT\|A1\|NR_HURA\|NK_HAIEK-K
3	eaeko	EAE	IZE	IZE_LIB	SIG\|GEL\|NUMS\|MUGM\|ENTI_LOC
4	parlamentarioek	parlamentario	ADJ	ADJ_ARR	IZAUR-\|ERG\|NUMP\|MUGM
5	eaetik_kanpo	EAE	SIG	SIG-	DEK\|NUMS\|MUGM\|DEK\|ABL_kanpo_ABS\|ENTI_LOC\|POS
6	eginiko	egin	ADI	ADI_SIN	PART\|GEL
7	delituak	delitu	IZE	IZE_ARR	BIZ-\|ABS\|NUMP\|MUGM
8	ikertzea	ikertu	ADI	ADI_SIN	ADIZE\|KONPL\|ABS
9	eta	eta	LOT	LOT_JNT	-
10	epaitzea	epaitu	ADI	ADI_SIN	ADIZE\|KONPL\|ABS
11	auzitegi_gorenari	auzitegi_gora	ADJ	ADJ_IZO	DEK\|GEN\|NUMP\|MUGM\|DEK\|DAT\|NUMS\|MUGM\|ENTI_LOC
12	dagokiola	egon	ADT	ADT	PNT\|KONPL\|A1\|NR_HURA\|NI_HARI
13	,	,	PUNC	PUNC_KOMA	_
14	baina	baina	LOT	LOT_JNT	AURK
15	atzerrian	atzerri	IZE	IZE_ARR	INE\|NUMS\|MUGM
16	izaniko	izan	ADI	ADI_SIN	PART\|GEL
17	kontaktu	kontaktu	IZE	IZE_ARR	_
18	horiek	horiek	DET	DET_ERKARR	ABS\|NUMP\|MUGM
19	ezin_direla	ezin_izan	ADI	ADI_ADK	PNT\|KONPL\|A1\|NR_HAIEK\|MWCorrect
20	delitutzat	delitu	IZE	IZE_ARR	BIZ-\|PRO\|MG
21	hartu	hartu	ADI	ADI_SIN	PART
22	.	.	PUNC	PUNC_PUNC	_

The first sentence of the BDT-II training data:

1	Estatu_Batuetako_DEAko	Estatu_Batuak_DEA	IZE	LIB	PLU:+\|IZAUR:-\|KAS:GEL\|NUM:P\|MUG:M\|MW:B\|ENT:Erakundea	2	ncmod	_	_
2	buru	buru	IZE	ARR	_	4	ncsubj	_	_
3	ohiak	ohi	ADJ	ARR	IZAUR:-\|KAS:ERG\|NUM:S\|MUG:M	2	ncmod	_	_
4	aztertuko	aztertu	ADI	SIN	ADM:PART\|ASP:GERO	0	ROOT	_	_
5	du	*edun	ADL	ADL	MDN:A1\|NOR:HURA\|NORK:HARK	4	auxmod	_	_
6	RUCen	RUC	IZE	IZB	MTKAT:SIG\|KAS:GEN\|NUM:S\|MUG:M\|ENT:Erakundea	7	ncmod	_	_
7	erreforma	erreforma	IZE	ARR	KAS:ABS\|NUM:S\|MUG:M	4	ncobj	_	_
8	.	.	PUNT_MARKA	PUNT_PUNT	_	7	PUNC	_	_

The first sentence of the BDT-II development data:

1	Irakaskuntzan	irakaskuntza	IZE	ARR	BIZ:-\|KAS:INE\|NUM:S\|MUG:M	2	ncmod	_	_
2	jardun	jardun	ADI	SIN	ADM:PART\|ASP:BURU	0	ROOT	_	_
3	zuen	*edun	ADL	ADL	MDN:B1\|NOR:HURA\|NORK:HARK	2	auxmod	_	_
4	Miel	Miel	IZE	IZB	PLU:-\|ENT:Pertsona	5	entios	_	_
5	Anjel_Elustondok	Anjel_Elustondo	IZE	IZB	PLU:-\|KAS:ERG\|NUM:S\|MUG:M\|ENT:Pertsona	2	ncsubj	_	_
6	1980	1980	IZE	ZKI	_	7	ncmod	_	_
7	urtetik	urte	IZE	ARR	BIZ:-\|KAS:ABL\|NUM:S\|MUG:M	2	ncmod	_	_
8	1992ra	1992	IZE	ZKI	KAS:ALA\|NUM:S\|MUG:M	2	ncmod	_	_
9	,	,	PUNT_MARKA	PUNT_KOMA	_	8	PUNC	_	_
10	hauetatik	hauek	DET	ERKARR	KAS:ABL\|NUM:P\|MUG:M	16	ncmod	_	_
11	hamar	hamar	DET	DZH	NMG:P	12	detmod	_	_
12	urtez	urte	IZE	ARR	BIZ:-\|KAS:INS\|MUG:MG	16	lot	_	_
13	Azpeitiko	Azpeitia	IZE	LIB	PLU:-\|KAS:GEL\|NUM:S\|MUG:M\|ENT:Tokia	14	ncmod	_	_
14	ikastolan	ikastola	IZE	ARR	BIZ:-\|KAS:INE\|NUM:S\|MUG:M	16	ncmod	_	_
15	irakasle	irakasle	IZE	ARR	KAS:ABS\|MUG:MG	16	ncpred	_	_
16	eta	eta	LOT	JNT	ERL:EMEN	8	aponcmod	_	_
17	beste	beste	DET	DZG	_	18	detmod	_	_
18	biak	bi	IZE	ZKI	KAS:ABS\|NUM:P\|MUG:M	16	lot	_	_
19	,	,	PUNT_MARKA	PUNT_KOMA	_	18	PUNC	_	_
20	Arabako	Araba	IZE	LIB	PLU:-\|KAS:GEL\|NUM:S\|MUG:M\|ENT:Tokia	21	ncmod	_	_
21	ikastolen	ikastola	IZE	ARR	BIZ:-\|KAS:GEN\|NUM:P\|MUG:M	22	ncmod	_	_
22	elkartean	elkarte	IZE	ARR	BIZ:-\|KAS:INE\|NUM:S\|MUG:M	16	ncmod	_	_
23	.	.	PUNT_MARKA	PUNT_PUNT	_	22	PUNC	_	_

The first sentence of the BDT-II test data:

1	Hegoaldean	hegoalde	IZE	ARR	KAS:INE\|NUM:S\|MUG:M	2	ncmod	_	_
2	iduri_zait	iduri_izan	ADI	ADK	ASP:PNT\|MDN:A1\|NOR:HURA\|NORI:NIRI\|MW:B	0	ROOT	_	_
3	euskararen	euskara	IZE	ARR	BIZ:-\|KAS:GEN\|NUM:S\|MUG:M	4	ncmod	_	_
4	mundu	mundu	IZE	ARR	BIZ:-	7	ncsubj	_	_
5	hau	hau	DET	ERKARR	KAS:ABS\|NUM:S\|MUG:M	4	detmod	_	_
6	adi-adi	adi-adi	ADB	ARR	_	7	ncmod	_	_
7	dagola	egon	ADT	ADT	ASP:PNT\|ERL:KONPL\|MDN:A3\|NOR:HURA	2	ccomp_obj	_	_
8	,	,	PUNT_MARKA	PUNT_KOMA	_	7	PUNC	_	_
9	Euskaltzaindiak	Euskaltzaindia	IZE	LIB	PLU:-\|KAS:ERG\|NUM:S\|MUG:M\|ENT:Tokia	11	ncsubj	_	_
10	zer	zer	DET	NOLGAL	NMG:MG\|KAS:ABS\|MUG:MG	11	ncobj	_	_
11	erranen	erran	ADI	SIN	ADM:PART\|ASP:GERO	13	menos	_	_
12	duen	*edun	ADL	ADL	ERL:ZHG\|MDN:A1\|NOR:HURA\|NORK:HARK	11	auxmod	_	_
13	zain	zain	ADB	ARR	_	7	cmod	_	_
14	,	,	PUNT_MARKA	PUNT_KOMA	_	13	PUNC	_	_
15	haren	hura	DET	ERKARR	KAS:GEN\|NUM:S\|MUG:M	16	ncmod	_	_
16	arauen	arau	IZE	ARR	KAS:ABS\|MUG:MG	18	ncmod	_	_
17	berehala	berehala	ADB	ARR	_	18	ncmod	_	_
18	betetzeko	bete	ADI	SIN	ADM:ADIZE\|ERL:HELB\|KAS:ABS\|MUG:MG	7	xmod	_	_
19	.	.	PUNT_MARKA	PUNT_PUNT	_	18	PUNC	_	_

BDT is a mildly nonprojective treebank. 1925 of the 151,604 tokens of combined BDT-II training and test sets are attached nonprojectively (1.27%).

The results of the CoNLL 2007 shared task are available online. They have been published in (Nivre et al., 2007). The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek:

Parser (Authors)	LAS	UAS
Malt (Nilsson et al.)	76.94	82.84
Titov et al.	75.49	81.93
Sagae	74.64	81.19
Carreras	75.75	81.11
Nakagawa	72.56	81.04
Malt (J. Hall et al.)	74.99	80.61
Johansson et al.	75.08	80.43

The two Malt parser results of 2007 (single malt and blended) are described in (Hall et al., 2007) and the details about the parser configuration are described here.

Parsing results on BDT-II have been published in Kepa Bengoetxea, Koldo Gojenola: Application of Different Techniques to Dependency Parsing of Basque. In: Proceedings of the First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), NAACL Workshop, Los Angeles, California, USA, 2010. They report only Labeled Attachment Score (LAS) and their best system achieved LAS = 78.98%.

Institute of Formal and Applied Linguistics Wiki

Table of Contents

Hungarian (hu)

Versions

Obtaining and License

References

Domain

Size

Inside

Sample

Parsing