|
|
— |
user:zeman:treebanks:ca [2011/11/20 21:17] (current) zeman vytvořeno |
| ===== Catalan (ca) ===== |
| |
| There is [[http://clic.ub.edu/corpus/|one treebank]] versions of which were known in different times under different names: |
| * CESS-Cat |
| * Cat3LB |
| * AnCora-CA |
| |
| ==== Versions ==== |
| |
| * CoNLL 2007 (CESS-Cat) |
| * CoNLL 2009 (AnCora-CA) |
| |
| The dependency treebank Cat3LB was extracted automatically from an earlier constituent-based annotation (see Montserrat Civit, Ma. Antònia Martí, Núria Bufí: [[http://www.springerlink.com/content/978-3-540-37334-6/#section=474512&page=8&locus=86|Cat3LB and Cast3LB: From Constituents to Dependencies]]. In: T. Salakoski et al. (eds.): FinTAL 2006, LNAI 4139, pp. 141–152, 2006, Springer, Berlin / Heidelberg) |
| |
| ==== Obtaining and License ==== |
| |
| The AnCora-CA corpus ought to be freely downloadable from [[http://clic.ub.edu/corpus/en/ancora-descarregues|its website]]. The download will not work for unregistered and not signed in users. The website offers creating new account but it is not automatic, one has to wait for approval. |
| |
| Republication of the two CoNLL versions in LDC is planned but it has not happenned yet. |
| |
| The CoNLL 2007 license in short: |
| |
| * research and demonstrative usage |
| * no redistribution |
| * cite in publications |
| * The original CoNLL 2007 license required a reference to the CESS-ECE //project//, not a publication: M. Antònia Martí Antonín, Mariona Taulé Delor, Lluís Màrquez, Manuel Bertran (2007) CESS-ECE: A Multilingual and Multilevel Annotated Corpus. |
| * Later there was [[http://www.lrec-conf.org/proceedings/lrec2008/summaries/35.html|the LREC paper]], which is now the required reference for the AnCora corpus. |
| |
| AnCora-CA was created by members of the [[http://clic.ub.edu/|Centre de Llenguatge i Computació (CLiC)]], Universitat de Barcelona, Gran Via de les Corts Catalanes 585, E-08007 Barcelona, Spain. |
| |
| ==== References ==== |
| |
| * Website |
| * http://clic.ub.edu/corpus/ |
| * Data |
| * //no separate citation// |
| * Principal publications |
| * Mariona Taulé, M. Antònia Martí, Marta Recasens: [[http://www.lrec-conf.org/proceedings/lrec2008/summaries/35.html|AnCora: Multilevel Annotated Corpora for Catalan and Spanish]]. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008. ISBN 2-9517408-4-0 |
| * Documentation |
| * Maria Antònia Martí, Mariona Taulé, Manu Bertran, Lluís Màrquez: [[http://clic.ub.edu/corpus/webfm_send/13|AnCora: Multilingual and Multilevel Annotated Corpora]]. Draft Technical report, online. |
| * [[http://clic.ub.edu/corpus/webfm_send/18|Morphology]] |
| * [[http://clic.ub.edu/corpus/webfm_send/17|Syntactic guidelines]] |
| |
| ==== Domain ==== |
| |
| Mostly newswire (EFE news, ACN Catalan news, Catalan version of El Periódico, 2000). |
| |
| ==== Size ==== |
| |
| The CoNLL 2007 version contains 435,860 tokens in 15125 sentences, yielding 28.82 tokens per sentence on average (CoNLL 2007 data split: 430,844 tokens / 14958 sentences training, 5016 tokens / 167 sentences test). |
| |
| The CoNLL 2009 version contains 496,672 tokens in 16786 sentences, yielding 29.59 tokens per sentence on average (CoNLL 2009 data split: 390,302 tokens / 13200 sentences training, 53015 tokens / 1724 sentences development, 53355 tokens / 1862 sentences test). |
| |
| ==== Inside ==== |
| |
| The original morphosyntactic tags (EAGLES?) have been converted to fit into the three columns (CPOS, POS and FEAT) columns of the CoNLL 2006/7 format, resp. the two columns (POS and FEAT) of the CoNLL 2009 format. Note that the missing CPOS column is not the only difference between the two conversion schemes. [[http://clic.ub.edu/corpus/webfm_send/18|Feature names and values]] in the FEAT column are different, too. |
| |
| The morphosyntactic tags have been disambiguated manually. The CoNLL 2009 version also contains automatically disambiguated tags. |
| |
| Multi-word expressions have been collapsed into one token, using underscore as the joining character. This includes named entities (e.g. La_Garrotxa, Ajuntament_de_Manresa, dilluns_4_de_juny) and prepositional compounds (pel_que_fa_al, d'_acord_amb, la_seva, a_més_de). Empty (underscore) tokens have been inserted to represent missing subjects (Catalan is a pro-drop language). |
| |
| ==== Sample ==== |
| |
| The first sentence of the CoNLL 2007 training data: |
| |
| | 1 | L' | el | d | da | num=s<nowiki>|</nowiki>gen=c | 2 | ESPEC | _ | _ | |
| | 2 | Ajuntament_de_Manresa | Ajuntament_de_Manresa | n | np | _ | 4 | SUJ | _ | _ | |
| | 3 | ha | haver | v | va | num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 4 | AUX | _ | _ | |
| | 4 | posat_en_funcionament | posar_en_funcionament | v | vm | num=s<nowiki>|</nowiki>mod=p<nowiki>|</nowiki>gen=m | 0 | S | _ | _ | |
| | 5 | tot | tot | d | di | num=s<nowiki>|</nowiki>gen=m | 7 | ESPEC | _ | _ | |
| | 6 | un_seguit_de | un_seguit_de | d | di | num=p<nowiki>|</nowiki>gen=c | 5 | DET | _ | _ | |
| | 7 | mesures | mesura | n | nc | num=p<nowiki>|</nowiki>gen=f | 4 | CD | _ | _ | |
| | 8 | , | , | F | Fc | _ | 10 | PUNC | _ | _ | |
| | 9 | la | el | d | da | num=s<nowiki>|</nowiki>gen=f | 10 | ESPEC | _ | _ | |
| | 10 | majoria | majoria | n | nc | num=s<nowiki>|</nowiki>gen=f | 7 | _ | _ | _ | |
| | 11 | informatives | informatiu | a | aq | num=p<nowiki>|</nowiki>gen=f | 10 | _ | _ | _ | |
| | 12 | , | , | F | Fc | _ | 10 | PUNC | _ | _ | |
| | 13 | que | que | p | pr | num=n<nowiki>|</nowiki>gen=c | 14 | SUJ | _ | _ | |
| | 14 | tenen | tenir | v | vm | num=p<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 7 | SF | _ | _ | |
| | 15 | com_a | com_a | s | sp | for=s | 14 | CPRED | _ | _ | |
| | 16 | finalitat | finalitat | n | nc | num=s<nowiki>|</nowiki>gen=f | 15 | SN | _ | _ | |
| | 17 | minimitzar | minimitzar | v | vm | mod=n | 14 | CD | _ | _ | |
| | 18 | els | el | d | da | num=p<nowiki>|</nowiki>gen=m | 19 | ESPEC | _ | _ | |
| | 19 | efectes | efecte | n | nc | num=p<nowiki>|</nowiki>gen=m | 17 | SN | _ | _ | |
| | 20 | de | de | s | sp | for=s | 19 | SP | _ | _ | |
| | 21 | la | el | d | da | num=s<nowiki>|</nowiki>gen=f | 22 | ESPEC | _ | _ | |
| | 22 | vaga | vaga | n | nc | num=s<nowiki>|</nowiki>gen=f | 20 | SN | _ | _ | |
| | 23 | . | . | F | Fp | _ | 4 | PUNC | _ | _ | |
| |
| The first sentence of the CoNLL 2007 test data: |
| |
| | 1 | Tot_i_que | tot_i_que | c | cs | _ | 5 | SUBORD | _ | _ | |
| | 2 | ahir | ahir | r | rg | _ | 5 | CC | _ | _ | |
| | 3 | hi | hi | p | pp | num=n<nowiki>|</nowiki>per=3<nowiki>|</nowiki>gen=c | 5 | MORF | _ | _ | |
| | 4 | va | anar | v | va | num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 5 | AUX | _ | _ | |
| | 5 | haver | haver | v | va | mod=n | 15 | AO | _ | _ | |
| | 6 | una | un | d | di | num=s<nowiki>|</nowiki>gen=f | 7 | ESPEC | _ | _ | |
| | 7 | reunió | reunió | n | nc | num=s<nowiki>|</nowiki>gen=f | 5 | CD | _ | _ | |
| | 8 | de | de | s | sp | for=s | 7 | SP | _ | _ | |
| | 9 | darrera | darrer | a | ao | num=s<nowiki>|</nowiki>gen=f | 10 | SADJ | _ | _ | |
| | 10 | hora | hora | n | nc | num=s<nowiki>|</nowiki>gen=f | 8 | SN | _ | _ | |
| | 11 | , | , | F | Fc | _ | 5 | PUNC | _ | _ | |
| | 12 | no | no | r | rn | _ | 15 | MOD | _ | _ | |
| | 13 | es | es | p | p0 | _ | 15 | PASS | _ | _ | |
| | 14 | va | anar | v | va | num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 15 | AUX | _ | _ | |
| | 15 | aconseguir | aconseguir | v | vm | mod=n | 0 | S | _ | _ | |
| | 16 | acostar | acostar | v | vm | mod=n | 15 | SUJ | _ | _ | |
| | 17 | posicions | posició | n | nc | num=p<nowiki>|</nowiki>gen=f | 16 | SN | _ | _ | |
| | 18 | , | , | F | Fc | _ | 23 | PUNC | _ | _ | |
| | 19 | de_manera_que | de_manera_que | c | cs | _ | 23 | SUBORD | _ | _ | |
| | 20 | els | el | d | da | num=p<nowiki>|</nowiki>gen=m | 21 | ESPEC | _ | _ | |
| | 21 | treballadors | treballador | n | nc | num=p<nowiki>|</nowiki>gen=m | 23 | SUJ | _ | _ | |
| | 22 | han | haver | v | va | num=p<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 23 | AUX | _ | _ | |
| | 23 | decidit | decidir | v | vm | num=s<nowiki>|</nowiki>mod=p<nowiki>|</nowiki>gen=m | 15 | AO | _ | _ | |
| | 24 | anar | anar | v | vm | mod=n | 23 | CD | _ | _ | |
| | 25 | a | a | s | sp | for=s | 24 | CREG | _ | _ | |
| | 26 | la | el | d | da | num=s<nowiki>|</nowiki>gen=f | 27 | ESPEC | _ | _ | |
| | 27 | vaga | vaga | n | nc | num=s<nowiki>|</nowiki>gen=f | 25 | SN | _ | _ | |
| | 28 | . | . | F | Fp | _ | 15 | PUNC | _ | _ | |
| |
| The first sentence of the CoNLL 2009 training data: |
| |
| | 1 | El | el | el | d | d | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 2 | 2 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 2 | Tribunal_Suprem | Tribunal_Suprem | Tribunal_Suprem | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 7 | 7 | suj | suj | _ | _ | arg0-agt | _ | _ | _ | |
| | 3 | ( | ( | ( | f | f | punct=bracket<nowiki>|</nowiki>punctenclose=open | punct=bracket<nowiki>|</nowiki>punctenclose=open | 4 | 4 | f | f | _ | _ | _ | _ | _ | _ | |
| | 4 | TS | TS | TS | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 2 | 2 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 5 | ) | ) | ) | f | f | punct=bracket<nowiki>|</nowiki>punctenclose=close | punct=bracket<nowiki>|</nowiki>punctenclose=close | 4 | 4 | f | f | _ | _ | _ | _ | _ | _ | |
| | 6 | ha | haver | haver | v | v | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | 7 | 7 | v | v | _ | _ | _ | _ | _ | _ | |
| | 7 | confirmat | confirmar | confirmar | v | v | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | 0 | 0 | sentence | sentence | Y | confirmar.a32 | _ | _ | _ | _ | |
| | 8 | la | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 9 | 9 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 9 | condemna | condemna | condemna | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 7 | 7 | cd | cd | _ | _ | arg1-pat | _ | _ | _ | |
| | 10 | a | a | a | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 11 | quatre | quatre | quatre | d | d | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 12 | 12 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 12 | anys | any | any | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 10 | 10 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 13 | d' | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 12 | 12 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 14 | inhabilitació | inhabilitació | inhabilitació | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 13 | 13 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 15 | especial | especial | especial | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 14 | 14 | s.a | s.a | _ | _ | _ | _ | _ | _ | |
| | 16 | i | i | i | c | c | postype=coordinating | postype=coordinating | 12 | 9 | coord | coord | _ | _ | _ | _ | _ | _ | |
| | 17 | una | un | un | d | d | postype=indefinite<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=numeral<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 18 | 18 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 18 | multa | multa | multa | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 12 | 9 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 19 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 18 | 18 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 20 | 3,6 | 3.6 | 3,6 | z | n | _ | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 21 | 21 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 21 | milions | milió | milió | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 19 | 19 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 22 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 21 | 21 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 23 | pessetes | pesseta | pesseta | z | n | postype=currency | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 22 | 22 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 24 | per | per | per | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 25 | a | a | a | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 24 | 24 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 26 | quatre | quatre | quatre | d | d | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 27 | 27 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 27 | veterinaris | veterinari | veterinari | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 25 | 25 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 28 | gironins | gironí | gironí | a | a | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 27 | 27 | s.a | s.a | _ | _ | _ | _ | _ | _ | |
| | 29 | , | , | , | f | f | punct=comma | punct=comma | 30 | 30 | f | f | _ | _ | _ | _ | _ | _ | |
| | 30 | per | per | per | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 7 | sp | cc | _ | _ | _ | _ | _ | _ | |
| | 31 | haver | haver | haver | v | n | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 33 | 33 | v | v | _ | _ | _ | _ | _ | _ | |
| | 32 | -se | ell | ell | p | p | gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>person=3 | gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>person=3 | 33 | 33 | morfema.pronominal | morfema.pronominal | _ | _ | _ | _ | _ | _ | |
| | 33 | beneficiat | beneficiar | beneficiat | v | a | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>posfunction=participle | 42 | 30 | S | S | Y | beneficiar.a2 | _ | _ | _ | _ | |
| | 34 | dels | del | dels | s | s | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p<nowiki>|</nowiki>contracted=yes | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p<nowiki>|</nowiki>contracted=yes | 33 | 33 | creg | creg | _ | _ | _ | arg1-null | _ | _ | |
| | 35 | càrrecs | càrrec | càrrec | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 34 | 34 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 36 | públics | públic | públic | a | a | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 35 | 35 | s.a | s.a | _ | _ | _ | _ | _ | _ | |
| | 37 | que | que | que | p | p | postype=relative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=relative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 39 | 39 | cd | cd | _ | _ | _ | _ | arg1-pat | _ | |
| | 38 | _ | _ | _ | p | p | _ | _ | 39 | 39 | suj | suj | _ | _ | _ | _ | arg0-agt | _ | |
| | 39 | desenvolupaven | desenvolupar | desenvolupar | v | v | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=imperfect | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=imperfect | 35 | 35 | S | S | Y | desenvolupar.a2 | _ | _ | _ | _ | |
| | 40 | i | i | i | c | c | postype=coordinating | postype=coordinating | 42 | 33 | coord | coord | _ | _ | _ | _ | _ | _ | |
| | 41 | la_seva | el_seu | el_seu | d | d | postype=possessive<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3 | postype=possessive<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3 | 42 | 42 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 42 | relació | relació | relació | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 30 | 33 | sn | cd | _ | _ | _ | _ | _ | _ | |
| | 43 | amb | amb | amb | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 42 | 42 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 44 | les | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 45 | 45 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 45 | empreses | empresa | empresa | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 43 | 43 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 46 | càrniques | càrnic | càrnic | a | a | postype=qualificative<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 45 | 45 | s.a | s.a | _ | _ | _ | _ | _ | _ | |
| | 47 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 45 | 45 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 48 | la | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 49 | 49 | spec | spec | _ | _ | _ | _ | _ | _ | |
| | 49 | zona | zona | zona | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 47 | 47 | sn | sn | _ | _ | _ | _ | _ | _ | |
| | 50 | en | en | en | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 42 | 42 | sp | sp | _ | _ | _ | _ | _ | _ | |
| | 51 | oferir | oferir | oferir | v | v | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | 50 | 50 | S | S | Y | oferir.a32 | _ | _ | _ | _ | |
| | 52 | -los | ell | ell | p | p | postype=personal<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3 | postype=personal<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3 | 51 | 51 | ci | ci | _ | _ | _ | _ | _ | arg2-ben | |
| | 53 | serveis | servei | servei | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 51 | 51 | cd | cd | _ | _ | _ | _ | _ | arg1-pat | |
| | 54 | particulars | particular | particular | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 53 | 53 | s.a | s.a | _ | _ | _ | _ | _ | _ | |
| | 55 | . | . | . | f | f | punct=period | punct=period | 7 | 7 | f | f | _ | _ | _ | _ | _ | _ | |
| |
| The first sentence of the CoNLL 2009 development data: |
| |
| | 1 | Fundació_Privada_Fira_de_Manresa | Fundació_Privada_Fira_de_Manresa | Fundació_Privada_Fira_de_Manresa | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 3 | 3 | suj | suj | _ | _ | arg0-agt | |
| | 2 | ha | haver | haver | v | v | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | 3 | 3 | v | v | _ | _ | _ | |
| | 3 | fet | fer | fer | v | v | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | 0 | 0 | sentence | sentence | Y | fer.a2 | _ | |
| | 4 | un | un | un | d | d | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 5 | 5 | spec | spec | _ | _ | _ | |
| | 5 | balanç | balanç | balanç | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 3 | 3 | cd | cd | _ | _ | arg1-pat | |
| | 6 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 5 | 5 | sp | sp | _ | _ | _ | |
| | 7 | l' | el | el | d | d | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 8 | 8 | spec | spec | _ | _ | _ | |
| | 8 | activitat | activitat | activitat | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 6 | 6 | sn | sn | _ | _ | _ | |
| | 9 | del | del | del | s | s | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>contracted=yes | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>contracted=yes | 8 | 8 | sp | sp | _ | _ | _ | |
| | 10 | Palau_Firal | Palau_Firal | Palau_Firal | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sn | sn | _ | _ | _ | |
| | 11 | durant | durant | durant | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 8 | 3 | sp | cc | _ | _ | _ | |
| | 12 | els | el | el | d | d | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 15 | 15 | spec | spec | _ | _ | _ | |
| | 13 | primers | primer | primer | a | a | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 12 | 12 | a | a | _ | _ | _ | |
| | 14 | cinc | cinc | cinc | d | d | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 12 | 12 | d | d | _ | _ | _ | |
| | 15 | mesos | mes | mes | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 11 | 11 | sn | sn | _ | _ | _ | |
| | 16 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 15 | 15 | sp | sp | _ | _ | _ | |
| | 17 | l' | el | el | d | d | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 18 | 18 | spec | spec | _ | _ | _ | |
| | 18 | any | any | any | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 16 | 16 | sn | sn | _ | _ | _ | |
| | 19 | . | . | . | f | f | punct=period | punct=period | 3 | 3 | f | f | _ | _ | _ | |
| |
| The first sentence of the CoNLL 2009 test data: |
| |
| | 1 | El | el | el | d | d | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 2 | darrer | darrer | darrer | a | a | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 3 | número | número | número | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 4 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | |
| | 5 | l' | el | el | d | d | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 6 | Observatori_del_Mercat_de_Treball_d'_Osona | Observatori_del_Mercat_de_Treball_d'_Osona | Observatori_del_Mercat_de_Treball_d'_Osona | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | |
| | 7 | inclou | incloure | incloure | v | v | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | _ | _ | _ | _ | Y | |
| | 8 | un | un | un | d | d | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 9 | informe | informe | informe | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 10 | especial | especial | especial | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 11 | sobre | sobre | sobre | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | |
| | 12 | la | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 13 | contractació | contractació | contractació | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 14 | a_través_de | a_través_de | a_través_de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | |
| | 15 | les | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ | |
| | 16 | empreses | empresa | empresa | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ | |
| | 17 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | |
| | 18 | treball | treball | treball | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 19 | temporal | temporal | temporal | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | |
| | 20 | , | , | , | f | f | punct=comma | punct=comma | _ | _ | _ | _ | _ | |
| | 21 | les | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ | |
| | 22 | ETT | ETT | ETT | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | |
| | 23 | . | . | . | f | f | punct=period | punct=period | _ | _ | _ | _ | _ | |
| |
| ==== Parsing ==== |
| |
| Nonprojectivities in AnCora-CA are very rare. Only 487 of the 435,860 tokens in the CoNLL 2007 version are attached nonprojectively (0.11%). In the CoNLL 2009 version, there are no nonprojectivities at all. |
| |
| The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Catalan: |
| |
| ^ Parser (Authors) ^ LAS ^ UAS ^ |
| | Titov et al. | 87.40 | 93.40 | |
| | Sagae | 88.16 | 93.34 | |
| | Malt (Nilsson et al.) | 88.70 | 93.12 | |
| | Nakagawa | 87.90 | 92.86 | |
| | Carreras | 87.60 | 92.46 | |
| | Malt (Hall et al.) | 87.74 | 92.20 | |
| |
| The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. |
| |
| The results of the CoNLL 2009 shared task are [[http://ufal.mff.cuni.cz/conll2009-st/results/results.php|available online]]. They have been published in [[http://aclweb.org/anthology/W/W09/W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for Catalan: |
| |
| ^ Parser (Authors) ^ LAS ^ |
| | Merlo | 87.86 | |
| | Che | 86.56 | |
| | Bohnet | 86.35 | |
| | Chen | 85.88 | |
| |