[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

user:zeman:treebanks:ca [2011/11/20 21:17] (current)
zeman vytvořeno
Line 1: Line 1:
 +===== Catalan (ca) =====
 +
 +There is [[http://clic.ub.edu/corpus/|one treebank]] versions of which were known in different times under different names:
 +  * CESS-Cat
 +  * Cat3LB
 +  * AnCora-CA
 +
 +==== Versions ====
 +
 +  * CoNLL 2007 (CESS-Cat)
 +  * CoNLL 2009 (AnCora-CA)
 +
 +The dependency treebank Cat3LB was extracted automatically from an earlier constituent-based annotation (see Montserrat Civit, Ma. Antònia Martí, Núria Bufí: [[http://www.springerlink.com/content/978-3-540-37334-6/#section=474512&page=8&locus=86|Cat3LB and Cast3LB: From Constituents to Dependencies]]. In: T. Salakoski et al. (eds.): FinTAL 2006, LNAI 4139, pp. 141–152, 2006, Springer, Berlin / Heidelberg)
 +
 +==== Obtaining and License ====
 +
 +The AnCora-CA corpus ought to be freely downloadable from [[http://clic.ub.edu/corpus/en/ancora-descarregues|its website]]. The download will not work for unregistered and not signed in users. The website offers creating new account but it is not automatic, one has to wait for approval.
 +
 +Republication of the two CoNLL versions in LDC is planned but it has not happenned yet.
 +
 +The CoNLL 2007 license in short:
 +
 +  * research and demonstrative usage
 +  * no redistribution
 +  * cite in publications
 +    * The original CoNLL 2007 license required a reference to the CESS-ECE //project//, not a publication: M. Antònia Martí Antonín, Mariona Taulé Delor, Lluís Màrquez, Manuel Bertran (2007) CESS-ECE: A Multilingual and Multilevel Annotated Corpus.
 +    * Later there was [[http://www.lrec-conf.org/proceedings/lrec2008/summaries/35.html|the LREC paper]], which is now the required reference for the AnCora corpus.
 +
 +AnCora-CA was created by members of the [[http://clic.ub.edu/|Centre de Llenguatge i Computació (CLiC)]], Universitat de Barcelona, Gran Via de les Corts Catalanes 585, E-08007 Barcelona, Spain.
 +
 +==== References ====
 +
 +  * Website
 +    * http://clic.ub.edu/corpus/
 +  * Data
 +    * //no separate citation//
 +  * Principal publications
 +    * Mariona Taulé, M. Antònia Martí, Marta Recasens: [[http://www.lrec-conf.org/proceedings/lrec2008/summaries/35.html|AnCora: Multilevel Annotated Corpora for Catalan and Spanish]]. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco, 2008. ISBN 2-9517408-4-0
 +  * Documentation
 +    * Maria Antònia Martí, Mariona Taulé, Manu Bertran, Lluís Màrquez: [[http://clic.ub.edu/corpus/webfm_send/13|AnCora: Multilingual and Multilevel Annotated Corpora]]. Draft Technical report, online.
 +    * [[http://clic.ub.edu/corpus/webfm_send/18|Morphology]]
 +    * [[http://clic.ub.edu/corpus/webfm_send/17|Syntactic guidelines]]
 +
 +==== Domain ====
 +
 +Mostly newswire (EFE news, ACN Catalan news, Catalan version of El Periódico, 2000).
 +
 +==== Size ====
 +
 +The CoNLL 2007 version contains 435,860 tokens in 15125 sentences, yielding 28.82 tokens per sentence on average (CoNLL 2007 data split: 430,844 tokens / 14958 sentences training, 5016 tokens / 167 sentences test).
 +
 +The CoNLL 2009 version contains 496,672 tokens in 16786 sentences, yielding 29.59 tokens per sentence on average (CoNLL 2009 data split: 390,302 tokens / 13200 sentences training, 53015 tokens / 1724 sentences development, 53355 tokens / 1862 sentences test).
 +
 +==== Inside ====
 +
 +The original morphosyntactic tags (EAGLES?) have been converted to fit into the three columns (CPOS, POS and FEAT) columns of the CoNLL 2006/7 format, resp. the two columns (POS and FEAT) of the CoNLL 2009 format. Note that the missing CPOS column is not the only difference between the two conversion schemes. [[http://clic.ub.edu/corpus/webfm_send/18|Feature names and values]] in the FEAT column are different, too.
 +
 +The morphosyntactic tags have been disambiguated manually. The CoNLL 2009 version also contains automatically disambiguated tags.
 +
 +Multi-word expressions have been collapsed into one token, using underscore as the joining character. This includes named entities (e.g. La_Garrotxa, Ajuntament_de_Manresa, dilluns_4_de_juny) and prepositional compounds (pel_que_fa_al, d'_acord_amb, la_seva, a_més_de). Empty (underscore) tokens have been inserted to represent missing subjects (Catalan is a pro-drop language).
 +
 +==== Sample ====
 +
 +The first sentence of the CoNLL 2007 training data:
 +
 +| 1 | L' | el | d | da | num=s<nowiki>|</nowiki>gen=c | 2 | ESPEC | _ | _ |
 +| 2 | Ajuntament_de_Manresa | Ajuntament_de_Manresa | n | np | _ | 4 | SUJ | _ | _ |
 +| 3 | ha | haver | v | va | num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 4 | AUX | _ | _ |
 +| 4 | posat_en_funcionament | posar_en_funcionament | v | vm | num=s<nowiki>|</nowiki>mod=p<nowiki>|</nowiki>gen=m | 0 | S | _ | _ |
 +| 5 | tot | tot | d | di | num=s<nowiki>|</nowiki>gen=m | 7 | ESPEC | _ | _ |
 +| 6 | un_seguit_de | un_seguit_de | d | di | num=p<nowiki>|</nowiki>gen=c | 5 | DET | _ | _ |
 +| 7 | mesures | mesura | n | nc | num=p<nowiki>|</nowiki>gen=f | 4 | CD | _ | _ |
 +| 8 | , | , | F | Fc | _ | 10 | PUNC | _ | _ |
 +| 9 | la | el | d | da | num=s<nowiki>|</nowiki>gen=f | 10 | ESPEC | _ | _ |
 +| 10 | majoria | majoria | n | nc | num=s<nowiki>|</nowiki>gen=f | 7 | _ | _ | _ |
 +| 11 | informatives | informatiu | a | aq | num=p<nowiki>|</nowiki>gen=f | 10 | _ | _ | _ |
 +| 12 | , | , | F | Fc | _ | 10 | PUNC | _ | _ |
 +| 13 | que | que | p | pr | num=n<nowiki>|</nowiki>gen=c | 14 | SUJ | _ | _ |
 +| 14 | tenen | tenir | v | vm | num=p<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 7 | SF | _ | _ |
 +| 15 | com_a | com_a | s | sp | for=s | 14 | CPRED | _ | _ |
 +| 16 | finalitat | finalitat | n | nc | num=s<nowiki>|</nowiki>gen=f | 15 | SN | _ | _ |
 +| 17 | minimitzar | minimitzar | v | vm | mod=n | 14 | CD | _ | _ |
 +| 18 | els | el | d | da | num=p<nowiki>|</nowiki>gen=m | 19 | ESPEC | _ | _ |
 +| 19 | efectes | efecte | n | nc | num=p<nowiki>|</nowiki>gen=m | 17 | SN | _ | _ |
 +| 20 | de | de | s | sp | for=s | 19 | SP | _ | _ |
 +| 21 | la | el | d | da | num=s<nowiki>|</nowiki>gen=f | 22 | ESPEC | _ | _ |
 +| 22 | vaga | vaga | n | nc | num=s<nowiki>|</nowiki>gen=f | 20 | SN | _ | _ |
 +| 23 | . | . | F | Fp | _ | 4 | PUNC | _ | _ |
 +
 +The first sentence of the CoNLL 2007 test data:
 +
 +| 1 | Tot_i_que | tot_i_que | c | cs | _ | 5 | SUBORD | _ | _ |
 +| 2 | ahir | ahir | r | rg | _ | 5 | CC | _ | _ |
 +| 3 | hi | hi | p | pp | num=n<nowiki>|</nowiki>per=3<nowiki>|</nowiki>gen=c | 5 | MORF | _ | _ |
 +| 4 | va | anar | v | va | num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 5 | AUX | _ | _ |
 +| 5 | haver | haver | v | va | mod=n | 15 | AO | _ | _ |
 +| 6 | una | un | d | di | num=s<nowiki>|</nowiki>gen=f | 7 | ESPEC | _ | _ |
 +| 7 | reunió | reunió | n | nc | num=s<nowiki>|</nowiki>gen=f | 5 | CD | _ | _ |
 +| 8 | de | de | s | sp | for=s | 7 | SP | _ | _ |
 +| 9 | darrera | darrer | a | ao | num=s<nowiki>|</nowiki>gen=f | 10 | SADJ | _ | _ |
 +| 10 | hora | hora | n | nc | num=s<nowiki>|</nowiki>gen=f | 8 | SN | _ | _ |
 +| 11 | , | , | F | Fc | _ | 5 | PUNC | _ | _ |
 +| 12 | no | no | r | rn | _ | 15 | MOD | _ | _ |
 +| 13 | es | es | p | p0 | _ | 15 | PASS | _ | _ |
 +| 14 | va | anar | v | va | num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 15 | AUX | _ | _ |
 +| 15 | aconseguir | aconseguir | v | vm | mod=n | 0 | S | _ | _ |
 +| 16 | acostar | acostar | v | vm | mod=n | 15 | SUJ | _ | _ |
 +| 17 | posicions | posició | n | nc | num=p<nowiki>|</nowiki>gen=f | 16 | SN | _ | _ |
 +| 18 | , | , | F | Fc | _ | 23 | PUNC | _ | _ |
 +| 19 | de_manera_que | de_manera_que | c | cs | _ | 23 | SUBORD | _ | _ |
 +| 20 | els | el | d | da | num=p<nowiki>|</nowiki>gen=m | 21 | ESPEC | _ | _ |
 +| 21 | treballadors | treballador | n | nc | num=p<nowiki>|</nowiki>gen=m | 23 | SUJ | _ | _ |
 +| 22 | han | haver | v | va | num=p<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 23 | AUX | _ | _ |
 +| 23 | decidit | decidir | v | vm | num=s<nowiki>|</nowiki>mod=p<nowiki>|</nowiki>gen=m | 15 | AO | _ | _ |
 +| 24 | anar | anar | v | vm | mod=n | 23 | CD | _ | _ |
 +| 25 | a | a | s | sp | for=s | 24 | CREG | _ | _ |
 +| 26 | la | el | d | da | num=s<nowiki>|</nowiki>gen=f | 27 | ESPEC | _ | _ |
 +| 27 | vaga | vaga | n | nc | num=s<nowiki>|</nowiki>gen=f | 25 | SN | _ | _ |
 +| 28 | . | . | F | Fp | _ | 15 | PUNC | _ | _ |
 +
 +The first sentence of the CoNLL 2009 training data:
 +
 +| 1 | El | el | el | d | d | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 2 | 2 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 2 | Tribunal_Suprem | Tribunal_Suprem | Tribunal_Suprem | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 7 | 7 | suj | suj | _ | _ | arg0-agt | _ | _ | _ |
 +| 3 | ( | ( | ( | f | f | punct=bracket<nowiki>|</nowiki>punctenclose=open | punct=bracket<nowiki>|</nowiki>punctenclose=open | 4 | 4 | f | f | _ | _ | _ | _ | _ | _ |
 +| 4 | TS | TS | TS | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 2 | 2 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 5 | ) | ) | ) | f | f | punct=bracket<nowiki>|</nowiki>punctenclose=close | punct=bracket<nowiki>|</nowiki>punctenclose=close | 4 | 4 | f | f | _ | _ | _ | _ | _ | _ |
 +| 6 | ha | haver | haver | v | v | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | 7 | 7 | v | v | _ | _ | _ | _ | _ | _ |
 +| 7 | confirmat | confirmar | confirmar | v | v | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | 0 | 0 | sentence | sentence | Y | confirmar.a32 | _ | _ | _ | _ |
 +| 8 | la | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 9 | 9 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 9 | condemna | condemna | condemna | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 7 | 7 | cd | cd | _ | _ | arg1-pat | _ | _ | _ |
 +| 10 | a | a | a | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 11 | quatre | quatre | quatre | d | d | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 12 | 12 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 12 | anys | any | any | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 10 | 10 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 13 | d' | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 12 | 12 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 14 | inhabilitació | inhabilitació | inhabilitació | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 13 | 13 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 15 | especial | especial | especial | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 14 | 14 | s.a | s.a | _ | _ | _ | _ | _ | _ |
 +| 16 | i | i | i | c | c | postype=coordinating | postype=coordinating | 12 | 9 | coord | coord | _ | _ | _ | _ | _ | _ |
 +| 17 | una | un | un | d | d | postype=indefinite<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=numeral<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 18 | 18 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 18 | multa | multa | multa | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 12 | 9 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 19 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 18 | 18 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 20 | 3,6 | 3.6 | 3,6 | z | n | _ | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 21 | 21 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 21 | milions | milió | milió | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 19 | 19 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 22 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 21 | 21 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 23 | pessetes | pesseta | pesseta | z | n | postype=currency | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 22 | 22 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 24 | per | per | per | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 25 | a | a | a | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 24 | 24 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 26 | quatre | quatre | quatre | d | d | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 27 | 27 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 27 | veterinaris | veterinari | veterinari | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 25 | 25 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 28 | gironins | gironí | gironí | a | a | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 27 | 27 | s.a | s.a | _ | _ | _ | _ | _ | _ |
 +| 29 | , | , | , | f | f | punct=comma | punct=comma | 30 | 30 | f | f | _ | _ | _ | _ | _ | _ |
 +| 30 | per | per | per | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 7 | sp | cc | _ | _ | _ | _ | _ | _ |
 +| 31 | haver | haver | haver | v | n | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 33 | 33 | v | v | _ | _ | _ | _ | _ | _ |
 +| 32 | -se | ell | ell | p | p | gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>person=3 | gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>person=3 | 33 | 33 | morfema.pronominal | morfema.pronominal | _ | _ | _ | _ | _ | _ |
 +| 33 | beneficiat | beneficiar | beneficiat | v | a | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>posfunction=participle | 42 | 30 | S | S | Y | beneficiar.a2 | _ | _ | _ | _ |
 +| 34 | dels | del | dels | s | s | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p<nowiki>|</nowiki>contracted=yes | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p<nowiki>|</nowiki>contracted=yes | 33 | 33 | creg | creg | _ | _ | _ | arg1-null | _ | _ |
 +| 35 | càrrecs | càrrec | càrrec | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 34 | 34 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 36 | públics | públic | públic | a | a | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 35 | 35 | s.a | s.a | _ | _ | _ | _ | _ | _ |
 +| 37 | que | que | que | p | p | postype=relative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=relative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 39 | 39 | cd | cd | _ | _ | _ | _ | arg1-pat | _ |
 +| 38 | _ | _ | _ | p | p | _ | _ | 39 | 39 | suj | suj | _ | _ | _ | _ | arg0-agt | _ |
 +| 39 | desenvolupaven | desenvolupar | desenvolupar | v | v | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=imperfect | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=imperfect | 35 | 35 | S | S | Y | desenvolupar.a2 | _ | _ | _ | _ |
 +| 40 | i | i | i | c | c | postype=coordinating | postype=coordinating | 42 | 33 | coord | coord | _ | _ | _ | _ | _ | _ |
 +| 41 | la_seva | el_seu | el_seu | d | d | postype=possessive<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3 | postype=possessive<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3 | 42 | 42 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 42 | relació | relació | relació | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 30 | 33 | sn | cd | _ | _ | _ | _ | _ | _ |
 +| 43 | amb | amb | amb | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 42 | 42 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 44 | les | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 45 | 45 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 45 | empreses | empresa | empresa | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 43 | 43 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 46 | càrniques | càrnic | càrnic | a | a | postype=qualificative<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 45 | 45 | s.a | s.a | _ | _ | _ | _ | _ | _ |
 +| 47 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 45 | 45 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 48 | la | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 49 | 49 | spec | spec | _ | _ | _ | _ | _ | _ |
 +| 49 | zona | zona | zona | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 47 | 47 | sn | sn | _ | _ | _ | _ | _ | _ |
 +| 50 | en | en | en | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 42 | 42 | sp | sp | _ | _ | _ | _ | _ | _ |
 +| 51 | oferir | oferir | oferir | v | v | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | 50 | 50 | S | S | Y | oferir.a32 | _ | _ | _ | _ |
 +| 52 | -los | ell | ell | p | p | postype=personal<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3 | postype=personal<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3 | 51 | 51 | ci | ci | _ | _ | _ | _ | _ | arg2-ben |
 +| 53 | serveis | servei | servei | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 51 | 51 | cd | cd | _ | _ | _ | _ | _ | arg1-pat |
 +| 54 | particulars | particular | particular | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 53 | 53 | s.a | s.a | _ | _ | _ | _ | _ | _ |
 +| 55 | . | . | . | f | f | punct=period | punct=period | 7 | 7 | f | f | _ | _ | _ | _ | _ | _ |
 +
 +The first sentence of the CoNLL 2009 development data:
 +
 +| 1 | Fundació_Privada_Fira_de_Manresa | Fundació_Privada_Fira_de_Manresa | Fundació_Privada_Fira_de_Manresa | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 3 | 3 | suj | suj | _ | _ | arg0-agt |
 +| 2 | ha | haver | haver | v | v | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | 3 | 3 | v | v | _ | _ | _ |
 +| 3 | fet | fer | fer | v | v | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | 0 | 0 | sentence | sentence | Y | fer.a2 | _ |
 +| 4 | un | un | un | d | d | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 5 | 5 | spec | spec | _ | _ | _ |
 +| 5 | balanç | balanç | balanç | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 3 | 3 | cd | cd | _ | _ | arg1-pat |
 +| 6 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 5 | 5 | sp | sp | _ | _ | _ |
 +| 7 | l' | el | el | d | d | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 8 | 8 | spec | spec | _ | _ | _ |
 +| 8 | activitat | activitat | activitat | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 6 | 6 | sn | sn | _ | _ | _ |
 +| 9 | del | del | del | s | s | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>contracted=yes | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>contracted=yes | 8 | 8 | sp | sp | _ | _ | _ |
 +| 10 | Palau_Firal | Palau_Firal | Palau_Firal | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sn | sn | _ | _ | _ |
 +| 11 | durant | durant | durant | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 8 | 3 | sp | cc | _ | _ | _ |
 +| 12 | els | el | el | d | d | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 15 | 15 | spec | spec | _ | _ | _ |
 +| 13 | primers | primer | primer | a | a | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 12 | 12 | a | a | _ | _ | _ |
 +| 14 | cinc | cinc | cinc | d | d | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 12 | 12 | d | d | _ | _ | _ |
 +| 15 | mesos | mes | mes | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 11 | 11 | sn | sn | _ | _ | _ |
 +| 16 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 15 | 15 | sp | sp | _ | _ | _ |
 +| 17 | l' | el | el | d | d | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 18 | 18 | spec | spec | _ | _ | _ |
 +| 18 | any | any | any | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 16 | 16 | sn | sn | _ | _ | _ |
 +| 19 | . | . | . | f | f | punct=period | punct=period | 3 | 3 | f | f | _ | _ | _ |
 +
 +The first sentence of the CoNLL 2009 test data:
 +
 +| 1 | El | el | el | d | d | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 2 | darrer | darrer | darrer | a | a | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 3 | número | número | número | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 4 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ |
 +| 5 | l' | el | el | d | d | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 6 | Observatori_del_Mercat_de_Treball_d'_Osona | Observatori_del_Mercat_de_Treball_d'_Osona | Observatori_del_Mercat_de_Treball_d'_Osona | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ |
 +| 7 | inclou | incloure | incloure | v | v | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | _ | _ | _ | _ | Y |
 +| 8 | un | un | un | d | d | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 9 | informe | informe | informe | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 10 | especial | especial | especial | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 11 | sobre | sobre | sobre | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ |
 +| 12 | la | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 13 | contractació | contractació | contractació | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 14 | a_través_de | a_través_de | a_través_de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ |
 +| 15 | les | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ |
 +| 16 | empreses | empresa | empresa | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ |
 +| 17 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ |
 +| 18 | treball | treball | treball | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 19 | temporal | temporal | temporal | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ |
 +| 20 | , | , | , | f | f | punct=comma | punct=comma | _ | _ | _ | _ | _ |
 +| 21 | les | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ |
 +| 22 | ETT | ETT | ETT | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ |
 +| 23 | . | . | . | f | f | punct=period | punct=period | _ | _ | _ | _ | _ |
 +
 +==== Parsing ====
 +
 +Nonprojectivities in AnCora-CA are very rare. Only 487 of the 435,860 tokens in the CoNLL 2007 version are attached nonprojectively (0.11%). In the CoNLL 2009 version, there are no nonprojectivities at all.
 +
 +The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Catalan:
 +
 +^ Parser (Authors) ^ LAS ^ UAS ^
 +| Titov et al. | 87.40 | 93.40 |
 +| Sagae | 88.16 | 93.34 |
 +| Malt (Nilsson et al.) | 88.70 | 93.12 |
 +| Nakagawa | 87.90 | 92.86 |
 +| Carreras | 87.60 | 92.46 |
 +| Malt (Hall et al.) | 87.74 | 92.20 |
 +
 +The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].
 +
 +The results of the CoNLL 2009 shared task are [[http://ufal.mff.cuni.cz/conll2009-st/results/results.php|available online]]. They have been published in [[http://aclweb.org/anthology/W/W09/W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for Catalan:
 +
 +^ Parser (Authors) ^ LAS ^
 +| Merlo | 87.86 |
 +| Che | 86.56 |
 +| Bohnet | 86.35 |
 +| Chen | 85.88 |
  

[ Back to the navigation ] [ Back to the content ]