Table of Contents
Catalan (ca)
There is one treebank versions of which were known in different times under different names:
- CESS-Cat
- Cat3LB
- AnCora-CA
Versions
- CoNLL 2007 (CESS-Cat)
- CoNLL 2009 (AnCora-CA)
The dependency treebank Cat3LB was extracted automatically from an earlier constituent-based annotation (see Montserrat Civit, Ma. Antònia Martí, Núria Bufí: Cat3LB and Cast3LB: From Constituents to Dependencies. In: T. Salakoski et al. (eds.): FinTAL 2006, LNAI 4139, pp. 141–152, 2006, Springer, Berlin / Heidelberg)
Obtaining and License
The AnCora-CA corpus ought to be freely downloadable from its website. The download will not work for unregistered and not signed in users. The website offers creating new account but it is not automatic, one has to wait for approval.
Republication of the two CoNLL versions in LDC is planned but it has not happenned yet.
The CoNLL 2007 license in short:
- research and demonstrative usage
- no redistribution
- cite in publications
- The original CoNLL 2007 license required a reference to the CESS-ECE project, not a publication: M. Antònia Martí Antonín, Mariona Taulé Delor, Lluís Màrquez, Manuel Bertran (2007) CESS-ECE: A Multilingual and Multilevel Annotated Corpus.
- Later there was the LREC paper, which is now the required reference for the AnCora corpus.
AnCora-CA was created by members of the Centre de Llenguatge i Computació (CLiC), Universitat de Barcelona, Gran Via de les Corts Catalanes 585, E-08007 Barcelona, Spain.
References
- Website
- Data
- no separate citation
- Principal publications
- Mariona Taulé, M. Antònia Martí, Marta Recasens: AnCora: Multilevel Annotated Corpora for Catalan and Spanish. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008. ISBN 2-9517408-4-0
- Documentation
- Maria Antònia Martí, Mariona Taulé, Manu Bertran, Lluís Màrquez: AnCora: Multilingual and Multilevel Annotated Corpora. Draft Technical report, online.
Domain
Mostly newswire (EFE news, ACN Catalan news, Catalan version of El Periódico, 2000).
Size
The CoNLL 2007 version contains 435,860 tokens in 15125 sentences, yielding 28.82 tokens per sentence on average (CoNLL 2007 data split: 430,844 tokens / 14958 sentences training, 5016 tokens / 167 sentences test).
The CoNLL 2009 version contains 496,672 tokens in 16786 sentences, yielding 29.59 tokens per sentence on average (CoNLL 2009 data split: 390,302 tokens / 13200 sentences training, 53015 tokens / 1724 sentences development, 53355 tokens / 1862 sentences test).
Inside
The original morphosyntactic tags (EAGLES?) have been converted to fit into the three columns (CPOS, POS and FEAT) columns of the CoNLL 2006/7 format, resp. the two columns (POS and FEAT) of the CoNLL 2009 format. Note that the missing CPOS column is not the only difference between the two conversion schemes. Feature names and values in the FEAT column are different, too.
The morphosyntactic tags have been disambiguated manually. The CoNLL 2009 version also contains automatically disambiguated tags.
Multi-word expressions have been collapsed into one token, using underscore as the joining character. This includes named entities (e.g. La_Garrotxa, Ajuntament_de_Manresa, dilluns_4_de_juny) and prepositional compounds (pel_que_fa_al, d'_acord_amb, la_seva, a_més_de). Empty (underscore) tokens have been inserted to represent missing subjects (Catalan is a pro-drop language).
Sample
The first sentence of the CoNLL 2007 training data:
1 | L' | el | d | da | num=s|gen=c | 2 | ESPEC | _ | _ |
2 | Ajuntament_de_Manresa | Ajuntament_de_Manresa | n | np | _ | 4 | SUJ | _ | _ |
3 | ha | haver | v | va | num=s|per=3|mod=i|ten=p | 4 | AUX | _ | _ |
4 | posat_en_funcionament | posar_en_funcionament | v | vm | num=s|mod=p|gen=m | 0 | S | _ | _ |
5 | tot | tot | d | di | num=s|gen=m | 7 | ESPEC | _ | _ |
6 | un_seguit_de | un_seguit_de | d | di | num=p|gen=c | 5 | DET | _ | _ |
7 | mesures | mesura | n | nc | num=p|gen=f | 4 | CD | _ | _ |
8 | , | , | F | Fc | _ | 10 | PUNC | _ | _ |
9 | la | el | d | da | num=s|gen=f | 10 | ESPEC | _ | _ |
10 | majoria | majoria | n | nc | num=s|gen=f | 7 | _ | _ | _ |
11 | informatives | informatiu | a | aq | num=p|gen=f | 10 | _ | _ | _ |
12 | , | , | F | Fc | _ | 10 | PUNC | _ | _ |
13 | que | que | p | pr | num=n|gen=c | 14 | SUJ | _ | _ |
14 | tenen | tenir | v | vm | num=p|per=3|mod=i|ten=p | 7 | SF | _ | _ |
15 | com_a | com_a | s | sp | for=s | 14 | CPRED | _ | _ |
16 | finalitat | finalitat | n | nc | num=s|gen=f | 15 | SN | _ | _ |
17 | minimitzar | minimitzar | v | vm | mod=n | 14 | CD | _ | _ |
18 | els | el | d | da | num=p|gen=m | 19 | ESPEC | _ | _ |
19 | efectes | efecte | n | nc | num=p|gen=m | 17 | SN | _ | _ |
20 | de | de | s | sp | for=s | 19 | SP | _ | _ |
21 | la | el | d | da | num=s|gen=f | 22 | ESPEC | _ | _ |
22 | vaga | vaga | n | nc | num=s|gen=f | 20 | SN | _ | _ |
23 | . | . | F | Fp | _ | 4 | PUNC | _ | _ |
The first sentence of the CoNLL 2007 test data:
1 | Tot_i_que | tot_i_que | c | cs | _ | 5 | SUBORD | _ | _ |
2 | ahir | ahir | r | rg | _ | 5 | CC | _ | _ |
3 | hi | hi | p | pp | num=n|per=3|gen=c | 5 | MORF | _ | _ |
4 | va | anar | v | va | num=s|per=3|mod=i|ten=p | 5 | AUX | _ | _ |
5 | haver | haver | v | va | mod=n | 15 | AO | _ | _ |
6 | una | un | d | di | num=s|gen=f | 7 | ESPEC | _ | _ |
7 | reunió | reunió | n | nc | num=s|gen=f | 5 | CD | _ | _ |
8 | de | de | s | sp | for=s | 7 | SP | _ | _ |
9 | darrera | darrer | a | ao | num=s|gen=f | 10 | SADJ | _ | _ |
10 | hora | hora | n | nc | num=s|gen=f | 8 | SN | _ | _ |
11 | , | , | F | Fc | _ | 5 | PUNC | _ | _ |
12 | no | no | r | rn | _ | 15 | MOD | _ | _ |
13 | es | es | p | p0 | _ | 15 | PASS | _ | _ |
14 | va | anar | v | va | num=s|per=3|mod=i|ten=p | 15 | AUX | _ | _ |
15 | aconseguir | aconseguir | v | vm | mod=n | 0 | S | _ | _ |
16 | acostar | acostar | v | vm | mod=n | 15 | SUJ | _ | _ |
17 | posicions | posició | n | nc | num=p|gen=f | 16 | SN | _ | _ |
18 | , | , | F | Fc | _ | 23 | PUNC | _ | _ |
19 | de_manera_que | de_manera_que | c | cs | _ | 23 | SUBORD | _ | _ |
20 | els | el | d | da | num=p|gen=m | 21 | ESPEC | _ | _ |
21 | treballadors | treballador | n | nc | num=p|gen=m | 23 | SUJ | _ | _ |
22 | han | haver | v | va | num=p|per=3|mod=i|ten=p | 23 | AUX | _ | _ |
23 | decidit | decidir | v | vm | num=s|mod=p|gen=m | 15 | AO | _ | _ |
24 | anar | anar | v | vm | mod=n | 23 | CD | _ | _ |
25 | a | a | s | sp | for=s | 24 | CREG | _ | _ |
26 | la | el | d | da | num=s|gen=f | 27 | ESPEC | _ | _ |
27 | vaga | vaga | n | nc | num=s|gen=f | 25 | SN | _ | _ |
28 | . | . | F | Fp | _ | 15 | PUNC | _ | _ |
The first sentence of the CoNLL 2009 training data:
1 | El | el | el | d | d | postype=article|gen=m|num=s | postype=article|gen=m|num=s | 2 | 2 | spec | spec | _ | _ | _ | _ | _ | _ |
2 | Tribunal_Suprem | Tribunal_Suprem | Tribunal_Suprem | n | n | postype=proper|gen=c|num=c | postype=proper|gen=c|num=c | 7 | 7 | suj | suj | _ | _ | arg0-agt | _ | _ | _ |
3 | ( | ( | ( | f | f | punct=bracket|punctenclose=open | punct=bracket|punctenclose=open | 4 | 4 | f | f | _ | _ | _ | _ | _ | _ |
4 | TS | TS | TS | n | n | postype=proper|gen=c|num=c | postype=proper|gen=c|num=c | 2 | 2 | sn | sn | _ | _ | _ | _ | _ | _ |
5 | ) | ) | ) | f | f | punct=bracket|punctenclose=close | punct=bracket|punctenclose=close | 4 | 4 | f | f | _ | _ | _ | _ | _ | _ |
6 | ha | haver | haver | v | v | postype=auxiliary|gen=c|num=s|person=3|mood=indicative|tense=present | postype=auxiliary|gen=c|num=s|person=3|mood=indicative|tense=present | 7 | 7 | v | v | _ | _ | _ | _ | _ | _ |
7 | confirmat | confirmar | confirmar | v | v | postype=main|gen=m|num=s|mood=pastparticiple | postype=main|gen=m|num=s|mood=pastparticiple | 0 | 0 | sentence | sentence | Y | confirmar.a32 | _ | _ | _ | _ |
8 | la | el | el | d | d | postype=article|gen=f|num=s | postype=article|gen=f|num=s | 9 | 9 | spec | spec | _ | _ | _ | _ | _ | _ |
9 | condemna | condemna | condemna | n | n | postype=common|gen=f|num=s | postype=common|gen=f|num=s | 7 | 7 | cd | cd | _ | _ | arg1-pat | _ | _ | _ |
10 | a | a | a | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 9 | 9 | sp | sp | _ | _ | _ | _ | _ | _ |
11 | quatre | quatre | quatre | d | d | postype=numeral|gen=c|num=p | postype=numeral|gen=c|num=p | 12 | 12 | spec | spec | _ | _ | _ | _ | _ | _ |
12 | anys | any | any | n | n | postype=common|gen=m|num=p | postype=common|gen=m|num=p | 10 | 10 | sn | sn | _ | _ | _ | _ | _ | _ |
13 | d' | de | de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 12 | 12 | sp | sp | _ | _ | _ | _ | _ | _ |
14 | inhabilitació | inhabilitació | inhabilitació | n | n | postype=common|gen=f|num=s | postype=common|gen=f|num=s | 13 | 13 | sn | sn | _ | _ | _ | _ | _ | _ |
15 | especial | especial | especial | a | a | postype=qualificative|gen=c|num=s | postype=qualificative|gen=c|num=s | 14 | 14 | s.a | s.a | _ | _ | _ | _ | _ | _ |
16 | i | i | i | c | c | postype=coordinating | postype=coordinating | 12 | 9 | coord | coord | _ | _ | _ | _ | _ | _ |
17 | una | un | un | d | d | postype=indefinite|gen=f|num=s | postype=numeral|gen=f|num=s | 18 | 18 | spec | spec | _ | _ | _ | _ | _ | _ |
18 | multa | multa | multa | n | n | postype=common|gen=f|num=s | postype=common|gen=f|num=s | 12 | 9 | sn | sn | _ | _ | _ | _ | _ | _ |
19 | de | de | de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 18 | 18 | sp | sp | _ | _ | _ | _ | _ | _ |
20 | 3,6 | 3.6 | 3,6 | z | n | _ | postype=proper|gen=c|num=c | 21 | 21 | spec | spec | _ | _ | _ | _ | _ | _ |
21 | milions | milió | milió | n | n | postype=common|gen=m|num=p | postype=common|gen=m|num=p | 19 | 19 | sn | sn | _ | _ | _ | _ | _ | _ |
22 | de | de | de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 21 | 21 | sp | sp | _ | _ | _ | _ | _ | _ |
23 | pessetes | pesseta | pesseta | z | n | postype=currency | postype=common|gen=f|num=p | 22 | 22 | sn | sn | _ | _ | _ | _ | _ | _ |
24 | per | per | per | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 9 | 9 | sp | sp | _ | _ | _ | _ | _ | _ |
25 | a | a | a | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 24 | 24 | sp | sp | _ | _ | _ | _ | _ | _ |
26 | quatre | quatre | quatre | d | d | postype=numeral|gen=c|num=p | postype=numeral|gen=c|num=p | 27 | 27 | spec | spec | _ | _ | _ | _ | _ | _ |
27 | veterinaris | veterinari | veterinari | n | n | postype=common|gen=m|num=p | postype=common|gen=m|num=p | 25 | 25 | sn | sn | _ | _ | _ | _ | _ | _ |
28 | gironins | gironí | gironí | a | a | postype=qualificative|gen=m|num=p | postype=qualificative|gen=m|num=p | 27 | 27 | s.a | s.a | _ | _ | _ | _ | _ | _ |
29 | , | , | , | f | f | punct=comma | punct=comma | 30 | 30 | f | f | _ | _ | _ | _ | _ | _ |
30 | per | per | per | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 9 | 7 | sp | cc | _ | _ | _ | _ | _ | _ |
31 | haver | haver | haver | v | n | postype=auxiliary|gen=c|num=c|mood=infinitive | postype=common|gen=m|num=s | 33 | 33 | v | v | _ | _ | _ | _ | _ | _ |
32 | -se | ell | ell | p | p | gen=c|num=c|person=3 | gen=c|num=c|person=3 | 33 | 33 | morfema.pronominal | morfema.pronominal | _ | _ | _ | _ | _ | _ |
33 | beneficiat | beneficiar | beneficiat | v | a | postype=main|gen=m|num=s|mood=pastparticiple | postype=qualificative|gen=m|num=s|posfunction=participle | 42 | 30 | S | S | Y | beneficiar.a2 | _ | _ | _ | _ |
34 | dels | del | dels | s | s | postype=preposition|gen=m|num=p|contracted=yes | postype=preposition|gen=m|num=p|contracted=yes | 33 | 33 | creg | creg | _ | _ | _ | arg1-null | _ | _ |
35 | càrrecs | càrrec | càrrec | n | n | postype=common|gen=m|num=p | postype=common|gen=m|num=p | 34 | 34 | sn | sn | _ | _ | _ | _ | _ | _ |
36 | públics | públic | públic | a | a | postype=qualificative|gen=m|num=p | postype=qualificative|gen=m|num=p | 35 | 35 | s.a | s.a | _ | _ | _ | _ | _ | _ |
37 | que | que | que | p | p | postype=relative|gen=c|num=c | postype=relative|gen=c|num=c | 39 | 39 | cd | cd | _ | _ | _ | _ | arg1-pat | _ |
38 | _ | _ | _ | p | p | _ | _ | 39 | 39 | suj | suj | _ | _ | _ | _ | arg0-agt | _ |
39 | desenvolupaven | desenvolupar | desenvolupar | v | v | postype=main|gen=c|num=p|person=3|mood=indicative|tense=imperfect | postype=main|gen=c|num=p|person=3|mood=indicative|tense=imperfect | 35 | 35 | S | S | Y | desenvolupar.a2 | _ | _ | _ | _ |
40 | i | i | i | c | c | postype=coordinating | postype=coordinating | 42 | 33 | coord | coord | _ | _ | _ | _ | _ | _ |
41 | la_seva | el_seu | el_seu | d | d | postype=possessive|gen=f|num=s|person=3 | postype=possessive|gen=f|num=s|person=3 | 42 | 42 | spec | spec | _ | _ | _ | _ | _ | _ |
42 | relació | relació | relació | n | n | postype=common|gen=f|num=s | postype=common|gen=f|num=s | 30 | 33 | sn | cd | _ | _ | _ | _ | _ | _ |
43 | amb | amb | amb | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 42 | 42 | sp | sp | _ | _ | _ | _ | _ | _ |
44 | les | el | el | d | d | postype=article|gen=f|num=p | postype=article|gen=f|num=p | 45 | 45 | spec | spec | _ | _ | _ | _ | _ | _ |
45 | empreses | empresa | empresa | n | n | postype=common|gen=f|num=p | postype=common|gen=f|num=p | 43 | 43 | sn | sn | _ | _ | _ | _ | _ | _ |
46 | càrniques | càrnic | càrnic | a | a | postype=qualificative|gen=f|num=p | postype=qualificative|gen=f|num=p | 45 | 45 | s.a | s.a | _ | _ | _ | _ | _ | _ |
47 | de | de | de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 45 | 45 | sp | sp | _ | _ | _ | _ | _ | _ |
48 | la | el | el | d | d | postype=article|gen=f|num=s | postype=article|gen=f|num=s | 49 | 49 | spec | spec | _ | _ | _ | _ | _ | _ |
49 | zona | zona | zona | n | n | postype=common|gen=f|num=s | postype=common|gen=f|num=s | 47 | 47 | sn | sn | _ | _ | _ | _ | _ | _ |
50 | en | en | en | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 42 | 42 | sp | sp | _ | _ | _ | _ | _ | _ |
51 | oferir | oferir | oferir | v | v | postype=main|gen=c|num=c|mood=infinitive | postype=main|gen=c|num=c|mood=infinitive | 50 | 50 | S | S | Y | oferir.a32 | _ | _ | _ | _ |
52 | -los | ell | ell | p | p | postype=personal|gen=c|num=p|person=3 | postype=personal|gen=c|num=p|person=3 | 51 | 51 | ci | ci | _ | _ | _ | _ | _ | arg2-ben |
53 | serveis | servei | servei | n | n | postype=common|gen=m|num=p | postype=common|gen=m|num=p | 51 | 51 | cd | cd | _ | _ | _ | _ | _ | arg1-pat |
54 | particulars | particular | particular | a | a | postype=qualificative|gen=c|num=p | postype=qualificative|gen=c|num=p | 53 | 53 | s.a | s.a | _ | _ | _ | _ | _ | _ |
55 | . | . | . | f | f | punct=period | punct=period | 7 | 7 | f | f | _ | _ | _ | _ | _ | _ |
The first sentence of the CoNLL 2009 development data:
1 | Fundació_Privada_Fira_de_Manresa | Fundació_Privada_Fira_de_Manresa | Fundació_Privada_Fira_de_Manresa | n | n | postype=proper|gen=c|num=c | postype=proper|gen=c|num=c | 3 | 3 | suj | suj | _ | _ | arg0-agt |
2 | ha | haver | haver | v | v | postype=auxiliary|gen=c|num=s|person=3|mood=indicative|tense=present | postype=auxiliary|gen=c|num=s|person=3|mood=indicative|tense=present | 3 | 3 | v | v | _ | _ | _ |
3 | fet | fer | fer | v | v | postype=main|gen=m|num=s|mood=pastparticiple | postype=main|gen=m|num=s|mood=pastparticiple | 0 | 0 | sentence | sentence | Y | fer.a2 | _ |
4 | un | un | un | d | d | postype=numeral|gen=m|num=s | postype=numeral|gen=m|num=s | 5 | 5 | spec | spec | _ | _ | _ |
5 | balanç | balanç | balanç | n | n | postype=common|gen=m|num=s | postype=common|gen=m|num=s | 3 | 3 | cd | cd | _ | _ | arg1-pat |
6 | de | de | de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 5 | 5 | sp | sp | _ | _ | _ |
7 | l' | el | el | d | d | postype=article|gen=c|num=s | postype=article|gen=c|num=s | 8 | 8 | spec | spec | _ | _ | _ |
8 | activitat | activitat | activitat | n | n | postype=common|gen=f|num=s | postype=common|gen=f|num=s | 6 | 6 | sn | sn | _ | _ | _ |
9 | del | del | del | s | s | postype=preposition|gen=m|num=s|contracted=yes | postype=preposition|gen=m|num=s|contracted=yes | 8 | 8 | sp | sp | _ | _ | _ |
10 | Palau_Firal | Palau_Firal | Palau_Firal | n | n | postype=proper|gen=c|num=c | postype=proper|gen=c|num=c | 9 | 9 | sn | sn | _ | _ | _ |
11 | durant | durant | durant | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 8 | 3 | sp | cc | _ | _ | _ |
12 | els | el | el | d | d | postype=article|gen=m|num=p | postype=article|gen=m|num=p | 15 | 15 | spec | spec | _ | _ | _ |
13 | primers | primer | primer | a | a | postype=ordinal|gen=m|num=p | postype=ordinal|gen=m|num=p | 12 | 12 | a | a | _ | _ | _ |
14 | cinc | cinc | cinc | d | d | postype=numeral|gen=c|num=p | postype=numeral|gen=c|num=p | 12 | 12 | d | d | _ | _ | _ |
15 | mesos | mes | mes | n | n | postype=common|gen=m|num=p | postype=common|gen=m|num=p | 11 | 11 | sn | sn | _ | _ | _ |
16 | de | de | de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | 15 | 15 | sp | sp | _ | _ | _ |
17 | l' | el | el | d | d | postype=article|gen=c|num=s | postype=article|gen=c|num=s | 18 | 18 | spec | spec | _ | _ | _ |
18 | any | any | any | n | n | postype=common|gen=m|num=s | postype=common|gen=m|num=s | 16 | 16 | sn | sn | _ | _ | _ |
19 | . | . | . | f | f | punct=period | punct=period | 3 | 3 | f | f | _ | _ | _ |
The first sentence of the CoNLL 2009 test data:
1 | El | el | el | d | d | postype=article|gen=m|num=s | postype=article|gen=m|num=s | _ | _ | _ | _ | _ |
2 | darrer | darrer | darrer | a | a | postype=ordinal|gen=m|num=s | postype=ordinal|gen=m|num=s | _ | _ | _ | _ | _ |
3 | número | número | número | n | n | postype=common|gen=m|num=s | postype=common|gen=m|num=s | _ | _ | _ | _ | _ |
4 | de | de | de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | _ | _ | _ | _ | _ |
5 | l' | el | el | d | d | postype=article|gen=c|num=s | postype=article|gen=c|num=s | _ | _ | _ | _ | _ |
6 | Observatori_del_Mercat_de_Treball_d'_Osona | Observatori_del_Mercat_de_Treball_d'_Osona | Observatori_del_Mercat_de_Treball_d'_Osona | n | n | postype=proper|gen=c|num=c | postype=proper|gen=c|num=c | _ | _ | _ | _ | _ |
7 | inclou | incloure | incloure | v | v | postype=main|gen=c|num=s|person=3|mood=indicative|tense=present | postype=main|gen=c|num=s|person=3|mood=indicative|tense=present | _ | _ | _ | _ | Y |
8 | un | un | un | d | d | postype=numeral|gen=m|num=s | postype=numeral|gen=m|num=s | _ | _ | _ | _ | _ |
9 | informe | informe | informe | n | n | postype=common|gen=m|num=s | postype=common|gen=m|num=s | _ | _ | _ | _ | _ |
10 | especial | especial | especial | a | a | postype=qualificative|gen=c|num=s | postype=qualificative|gen=c|num=s | _ | _ | _ | _ | _ |
11 | sobre | sobre | sobre | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | _ | _ | _ | _ | _ |
12 | la | el | el | d | d | postype=article|gen=f|num=s | postype=article|gen=f|num=s | _ | _ | _ | _ | _ |
13 | contractació | contractació | contractació | n | n | postype=common|gen=f|num=s | postype=common|gen=f|num=s | _ | _ | _ | _ | _ |
14 | a_través_de | a_través_de | a_través_de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | _ | _ | _ | _ | _ |
15 | les | el | el | d | d | postype=article|gen=f|num=p | postype=article|gen=f|num=p | _ | _ | _ | _ | _ |
16 | empreses | empresa | empresa | n | n | postype=common|gen=f|num=p | postype=common|gen=f|num=p | _ | _ | _ | _ | _ |
17 | de | de | de | s | s | postype=preposition|gen=c|num=c | postype=preposition|gen=c|num=c | _ | _ | _ | _ | _ |
18 | treball | treball | treball | n | n | postype=common|gen=m|num=s | postype=common|gen=m|num=s | _ | _ | _ | _ | _ |
19 | temporal | temporal | temporal | a | a | postype=qualificative|gen=c|num=s | postype=qualificative|gen=c|num=s | _ | _ | _ | _ | _ |
20 | , | , | , | f | f | punct=comma | punct=comma | _ | _ | _ | _ | _ |
21 | les | el | el | d | d | postype=article|gen=f|num=p | postype=article|gen=f|num=p | _ | _ | _ | _ | _ |
22 | ETT | ETT | ETT | n | n | postype=proper|gen=c|num=c | postype=proper|gen=c|num=c | _ | _ | _ | _ | _ |
23 | . | . | . | f | f | punct=period | punct=period | _ | _ | _ | _ | _ |
Parsing
Nonprojectivities in AnCora-CA are very rare. Only 487 of the 435,860 tokens in the CoNLL 2007 version are attached nonprojectively (0.11%). In the CoNLL 2009 version, there are no nonprojectivities at all.
The results of the CoNLL 2007 shared task are available online. They have been published in (Nivre et al., 2007). The evaluation procedure was changed to include punctuation tokens. These are the best results for Catalan:
Parser (Authors) | LAS | UAS |
---|---|---|
Titov et al. | 87.40 | 93.40 |
Sagae | 88.16 | 93.34 |
Malt (Nilsson et al.) | 88.70 | 93.12 |
Nakagawa | 87.90 | 92.86 |
Carreras | 87.60 | 92.46 |
Malt (Hall et al.) | 87.74 | 92.20 |
The two Malt parser results of 2007 (single malt and blended) are described in (Hall et al., 2007) and the details about the parser configuration are described here.
The results of the CoNLL 2009 shared task are available online. They have been published in (Hajič et al., 2009). Unlabeled attachment score was not published. These are the best results for Catalan:
Parser (Authors) | LAS |
---|---|
Merlo | 87.86 |
Che | 86.56 |
Bohnet | 86.35 |
Chen | 85.88 |