[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:zeman:treebanks [2011/11/18 17:39]
zeman German.
user:zeman:treebanks [2011/11/19 13:14]
zeman Greek inside.
Line 1385: Line 1385:
     * [[http://www.ims.uni-stuttgart.de/projekte/TIGER/paper/|List of publications]]     * [[http://www.ims.uni-stuttgart.de/projekte/TIGER/paper/|List of publications]]
   * [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/|Documentation]]   * [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/|Documentation]]
 +    * [[http://www.ims.uni-stuttgart.de/projekte/corplex/TagSets/stts-table.html|Stuttgart-Tübingen Tagset]] (part of speech)
     * Berthold Crysmann, Silvia Hansen-Schirra, George Smith, Dorothea Ziegler-Eisele: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/tiger_scheme-morph.pdf|TIGER Morphologie-Annotationsschema]], 2005.     * Berthold Crysmann, Silvia Hansen-Schirra, George Smith, Dorothea Ziegler-Eisele: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/tiger_scheme-morph.pdf|TIGER Morphologie-Annotationsschema]], 2005.
     * Stefanie Albert, Jan Anderssen, Regine Bader, Stephanie Becker, Tobias Bracht, Sabine Brants, Thorsten Brants, Vera Demberg, Stefanie Dipper, Peter Eisenberg, Silvia Hansen, Hagen Hirschmann, Juliane Janitzek, Carolin Kirstein, Robert Langner, Lukas Michelbacher, Oliver Plaehn, Cordula Preis, Marcus Pußel, Marco Rower, Bettina Schrader, Anne Schwartz, George Smith, Hans Uszkoreit: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/tiger_scheme-syntax.pdf|TIGER Annotationsschema]] //(syntax)//, 2003.     * Stefanie Albert, Jan Anderssen, Regine Bader, Stephanie Becker, Tobias Bracht, Sabine Brants, Thorsten Brants, Vera Demberg, Stefanie Dipper, Peter Eisenberg, Silvia Hansen, Hagen Hirschmann, Juliane Janitzek, Carolin Kirstein, Robert Langner, Lukas Michelbacher, Oliver Plaehn, Cordula Preis, Marcus Pußel, Marco Rower, Bettina Schrader, Anne Schwartz, George Smith, Hans Uszkoreit: [[http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus/annotation/tiger_scheme-syntax.pdf|TIGER Annotationsschema]] //(syntax)//, 2003.
 +    * The header of the XML version of the TIGER Treebank contains lists of various sorts of tags with brief explanation.
  
 ==== Domain ==== ==== Domain ====
Line 1394: Line 1396:
 ==== Size ==== ==== Size ====
  
-The CoNLL 2007 version contains 435,860 tokens in 15125 sentencesyielding 28.82 tokens per sentence on average (CoNLL 2007 data split: 430,844 tokens / 14958 sentences training5016 tokens / 167 sentences test).+According to their website, the TIGER Treebank version contains approximately 700,000 tokens in 40,000 sentencesVersion 2.1 contains approximately 900,000 tokens in 50,000 sentences.
  
-The CoNLL 2009 version contains 496,672 tokens in 16786 sentences, yielding 29.59 tokens per sentence on average (CoNLL 2009 data split: 390,302 tokens / 13200 sentences training, 53015 tokens / 1724 sentences development, 53355 tokens / 1862 sentences test).+The CoNLL 2006 version contains 705,304 tokens in 39573 sentences, yielding 17.82 tokens per sentence on average (CoNLL 2006 data split: 699,610 tokens / 39216 sentences training, 5694 tokens / 357 sentences test). 
 + 
 +The CoNLL 2009 version contains 712,332 tokens in 40020 sentences, yielding 17.80 tokens per sentence on average (CoNLL 2009 data split: 648,677 tokens / 36020 sentences training, 32033 tokens / 2000 sentences development, 31622 tokens / 2000 sentences test).
  
 ==== Inside ==== ==== Inside ====
  
-The original morphosyntactic tags (EAGLES?) have been converted to fit into the three columns (CPOS, POS and FEAT) columns of the CoNLL 2006/7 format, resp. the two columns (POS and FEAT) of the CoNLL 2009 format. Note that the missing CPOS column is not the only difference between the two conversion schemes. [[http://clic.ub.edu/corpus/webfm_send/18|Feature names and values]] in the FEAT column are differenttoo.+All versions contain //semi-automatic// part of speech tags ([[http://www.ims.uni-stuttgart.de/projekte/corplex/TagSets/stts-table.html|Stuttgart-Tübingen Tagset]], STTS) and syntactic structure. Lemmas and morphosyntactic features are available only for newer versions (TIGER Treebank version 2 and onwardsand CoNLL 2009). The parts of speech are heavily context-dependent, e.g. many words can be used both substantively (pronouns) and attributively (determiners), which is distinguished by different POS tags.
  
-The morphosyntactic tags have been disambiguated manually. The CoNLL 2009 version also contains automatically disambiguated tags.+It is not clear what the //semi-automatic// annotation means (probably first auto-tagging, then manual correction?) and whether it also applies to the morphosyntactic annotation. The CoNLL 2009 version also contains automatically disambiguated lemmas, tags and features.
  
-Multi-word expressions have been collapsed into one token, using underscore as the joining characterThis includes named entities (e.g. La_GarrotxaAjuntament_de_Manresa, dilluns_4_de_juny) and prepositional compounds (pel_que_fa_al, d'_acord_amb, la_seva, a_més_de). Empty (underscore) tokens have been inserted to represent missing subjects (Catalan is pro-drop language).+The original treebank is phrase-based. The dependencies in the CoNLL versions must have thus been drawn using a head-selection procedureBesides CoNLL datathe TIGER project also provides subset of the TIGER Treebank in a dependency format.
  
 ==== Sample ==== ==== Sample ====
  
-The first sentence of the CoNLL 2007 training data:+The first sentence of TIGER Treebank 2.1 in the TIGER-XML format:
  
-| 1 | L' | el | d | da | num=s<nowiki>|</nowiki>gen=c | 2 | ESPEC | _ | _ | +<code xml><s id="s1"> 
-| 2 | Ajuntament_de_Manresa | Ajuntament_de_Manresa | n | np | _ | 4 | SUJ | _ | _ | +  <graph root="s1_VROOT"> 
-| 3 | ha | haver | v | va | num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 4 | AUX | _ | _ | +    <terminals> 
-| 4 | posat_en_funcionament | posar_en_funcionament | v | vm | num=s<nowiki>|</nowiki>mod=p<nowiki>|</nowiki>gen=m | 0 | S | _ | _ | +      <t id="s1_1" word="``" lemma="--" pos="$(" morph="--" case="--" number="--" gender="--" person="--" degree="--" tense="--" mood="--" /> 
-| 5 | tot | tot | d | di | num=s<nowiki>|</nowiki>gen=m | 7 | ESPEC | _ | _ | +      <t id="s1_2" word="Ross" lemma="Ross" pos="NE" morph="Nom.Sg.Masc" case="Nom" number="Sg" gender="Masc" person="--" degree="--" tense="--" mood="--" /> 
-| 6 | un_seguit_de | un_seguit_de | d | di | num=p<nowiki>|</nowiki>gen=c | 5 | DET | _ | _ | +      <t id="s1_3" word="Perot" lemma="Perot" pos="NE" morph="Nom.Sg.Masc" case="Nom" number="Sg" gender="Masc" person="--" degree="--" tense="--" mood="--" /> 
-| 7 | mesures | mesura | n | nc | num=p<nowiki>|</nowiki>gen=f | 4 | CD | _ | _ | +      <t id="s1_4" word="wäre" lemma="sein" pos="VAFIN" morph="3.Sg.Past.Subj" case="--" number="Sg" gender="--" person="3" degree="--" tense="Past" mood="Subj" /> 
-| 8 | , | , | F | Fc | _ | 10 | PUNC | _ | _ | +      <t id="s1_5" word="vielleicht" lemma="vielleicht" pos="ADV" morph="--" case="--" number="--" gender="--" person="--" degree="--" tense="--" mood="--" /> 
-| 9 | la | el | d | da | num=s<nowiki>|</nowiki>gen=f | 10 | ESPEC | _ | _ | +      <t id="s1_6" word="ein" lemma="ein" pos="ART" morph="Nom.Sg.Masc" case="Nom" number="Sg" gender="Masc" person="--" degree="--" tense="--" mood="--" /> 
-| 10 | majoria | majoria | n | nc | num=s<nowiki>|</nowiki>gen=f | 7 | _ | _ | _ | +      <t id="s1_7" word="prächtiger" lemma="prächtig" pos="ADJA" morph="Pos.Nom.Sg.Masc" case="Nom" number="Sg" gender="Masc" person="--" degree="Pos" tense="--" mood="--" /> 
-| 11 | informatives | informatiu | a | aq | num=p<nowiki>|</nowiki>gen=f | 10 | _ | _ | _ | +      <t id="s1_8" word="Diktator" lemma="Diktator" pos="NN" morph="Nom.Sg.Masc" case="Nom" number="Sg" gender="Masc" person="--" degree="--" tense="--" mood="--" /> 
-| 12 | , | , | F | Fc | _ | 10 | PUNC | _ | _ | +      <t id="s1_9" word="''" lemma="--" pos="$(" morph="--" case="--" number="--" gender="--" person="--" degree="--" tense="--" mood="--" /> 
-| 13 | que | que | p | pr | num=n<nowiki>|</nowiki>gen=c | 14 | SUJ | _ | _ | +    </terminals
-| 14 | tenen | tenir | v | vm | num=p<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p | 7 | SF | _ | _ | +    <nonterminals> 
-| 15 | com_a | com_a | s | sp | for=s | 14 | CPRED | _ | _ | +      <nt id="s1_500" cat="PN"> 
-| 16 | finalitat | finalitat | n | nc | num=s<nowiki>|</nowiki>gen=f | 15 | SN | _ | _ | +        <edge label="PNC" idref="s1_2" /> 
-| 17 | minimitzar | minimitzar | v | vm | mod=n | 14 | CD | _ | _ | +        <edge label="PNC" idref="s1_3" /> 
-| 18 | els | el | d | da | num=p<nowiki>|</nowiki>gen=m | 19 | ESPEC | _ | _ | +      </nt
-| 19 | efectes | efecte | n | nc | num=p<nowiki>|</nowiki>gen=m | 17 | SN | _ | _ | +      <nt id="s1_501" cat="NP"> 
-| 20 | de | de | s | sp | for=s | 19 | SP | _ | _ | +        <edge label="NK" idref="s1_6" /> 
-| 21 | la | el | d | da | num=s<nowiki>|</nowiki>gen=f | 22 | ESPEC | _ | _ | +        <edge label="NK" idref="s1_7" /> 
-| 22 | vaga | vaga | n | nc | num=s<nowiki>|</nowiki>gen=f | 20 | SN | _ | _ | +        <edge label="NK" idref="s1_8" /> 
-| 23 | . | . | F | Fp | _ | 4 | PUNC | _ | _ |+      </nt
 +      <nt id="s1_502" cat="S"> 
 +        <edge label="SB" idref="s1_500" /> 
 +        <edge label="HD" idref="s1_4" /> 
 +        <edge label="MO" idref="s1_5" /> 
 +        <edge label="PD" idref="s1_501" /> 
 +      </nt
 +      <nt id="s1_VROOT" cat="VROOT"> 
 +        <edge label="--" idref="s1_1" /> 
 +        <edge label="--" idref="s1_502" /> 
 +        <edge label="--" idref="s1_9" /> 
 +      </nt> 
 +    </nonterminals
 +  </graph> 
 +</s></code>
  
-The first sentence of the CoNLL 2007 test data:+The first sentence of the CoNLL 2006 training data:
  
-| 1 | Tot_i_que tot_i_que cs | _ | SUBORD +| 1 | `` $( $( | _ | PUNC PUNC 
-| 2 | ahir ahir rg | _ | CC +| 2 | Ross NE NE | _ | SB SB 
-| 3 | hi hi pp num=n<nowiki>|</nowiki>per=3<nowiki>|</nowiki>gen=c MORF | _ | _ +| 3 | Perot NE NE PNC PNC 
-| 4 | va anar va num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p 5 | AUX | _ | _ +| 4 | wäre VAFIN VAFIN ROOT ROOT 
-| 5 | haver haver va mod=n 15 AO +| 5 | vielleicht ADV ADV MO MO 
-| 6 | una un di num=s<nowiki>|</nowiki>gen=f ESPEC _ | _ +| 6 | ein ART ART NK NK 
-| 7 | reunió reunió nc num=s<nowiki>|</nowiki>gen=f CD _ | _ +| 7 | prächtiger ADJA ADJA NK NK 
-| 8 | de de sp for=s SP +| 8 | Diktator NN NN PD PD 
-| 9 | darrera | darrer | a | ao | num=s<nowiki>|</nowiki>gen=f 10 SADJ | _ | +| 9 | <nowiki>''</nowiki>$( $( | _ | PUNC PUNC | 
-10 hora hora nc num=s<nowiki>|</nowiki>gen=f SN | _ | | + 
-11 Fc | _ | PUNC | _ | | +The first sentence of the CoNLL 2006 test data: 
-12 no no rn | _ | 15 MOD | _ | | + 
-13 es es p0 | _ | 15 PASS | _ | | +Zwei CARD CARD | _ | NK NK | 
-14 va anar va num=s<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p 15 AUX +Themen | _ | NN NN | _ | 14 SB 14 SB | 
-15 aconseguir aconseguir vm mod=n +| _ | $, $, | _ | PUNC PUNC | 
-16 acostar acostar vm mod=n 15 SUJ +die | _ | PRELS PRELS | _ | OA OA | 
-17 posicions posició nc num=p<nowiki>|</nowiki>gen=f 16 SN _ | _ +Perot NE NE SB SB 
-18 | , | , | F | Fc | _ | 23 | PUNC | +immer ADV ADV MO MO 
-19 de_manera_que de_manera_que cs | _ | 23 SUBORD +wieder ADV ADV MO MO 
-20 els el da num=p<nowiki>|</nowiki>gen=m 21 ESPEC _ | _ +anspricht VVFIN VVFIN RC RC 
-21 treballadors treballador nc num=p<nowiki>|</nowiki>gen=m 23 SUJ _ | _ +| , | _ | $, | $, | _ | | PUNC | PUNC 
-22 han haver va num=p<nowiki>|</nowiki>per=3<nowiki>|</nowiki>mod=i<nowiki>|</nowiki>ten=p 23 AUX | _ | +10 Rezession NN NN | _ | APP APP 
-23 decidit decidir vm num=s<nowiki>|</nowiki>mod=p<nowiki>|</nowiki>gen=m | 15 | AO | _ | | +11 und KON KON 10 CD 10 CD 
-24 anar anar vm mod=n 23 CD | _ | | +12 Bürokratie NN NN 10 CJ 10 CJ 
-25 sp for=s 24 CREG | _ | | +13 $, $, 14 PUNC 14 PUNC | 
-26 la el da num=s<nowiki>|</nowiki>gen=f | 27 ESPEC | _ | | +14 | machen | _ | VVFIN VVFIN ROOT ROOT | 
-27 vaga vaga nc num=s<nowiki>|</nowiki>gen=f 25 | SN | _ | _ | +| 15 | ihnen | _ | PPER PPER 18 DA 18 DA | 
-| 28 | . | . | F | Fp | _ | 15 | PUNC | |+16 besonders | _ | ADV ADV 18 MO 18 MO | 
 +17 zu | _ | PTKZU PTKZU 18 PM 18 PM | 
 +18 schaffen | _ | VVINF VVINF 14 OC 14 OC | 
 +19 | _ | $. | $. | _ | 14 | PUNC | 14 PUNC |
  
 The first sentence of the CoNLL 2009 training data: The first sentence of the CoNLL 2009 training data:
  
-| 1 | El | el | el | d | d | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 2 | 2 | spec | spec | _ | _ | _ | _ | _ | _ | +| 1 | `` | _ | `` $( | $( | _ | _ | 4 | 4 | PUNC PUNC | _ | _ | 
-| 2 | Tribunal_Suprem | Tribunal_Suprem | Tribunal_Suprem | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 7 | 7 | suj | suj | _ | _ | arg0-agt | _ | _ | _ | +Ross Ross Roß NE NN Nom<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Masc | _ | 3 | 3 | PNC PNC | _ | _ | 
-| 3 | ( | ( | ( | f | f | punct=bracket<nowiki>|</nowiki>punctenclose=open | punct=bracket<nowiki>|</nowiki>punctenclose=open | 4 | 4 | f | f | _ | _ | _ | _ | _ | _ | +| 3 | Perot Perot Perot NE NE Nom<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Masc | _ | SB SB | _ | _ | 
-| 4 | TS | TS | TS | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 2 | 2 | sn | sn | _ | _ | _ | _ | _ | _ | +wäre sein sein VAFIN VAFIN | 3<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Past<nowiki>|</nowiki>Subj *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Past<nowiki>|</nowiki>Subj ROOT ROOT | _ | _ | 
-| 5 | ) | ) | ) | f | f | punct=bracket<nowiki>|</nowiki>punctenclose=close | punct=bracket<nowiki>|</nowiki>punctenclose=close | 4 | 4 | f | _ | _ | _ | _ | _ | _ | +vielleicht vielleicht vielleicht ADV ADV | _ | _ | MO MO | _ | _ | 
-ha haver haver postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | 7 | 7 | v | v | _ | _ | _ | _ | _ | _ | +ein ein ein ART ART Nom<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Masc *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>NK NK | _ | _ | 
-| 7 | confirmat | confirmar | confirmar | v | v | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | 0 | 0 | sentence | sentence | Y | confirmar.a32 | _ | _ | _ | _ | +prächtiger prächtig prächtig ADJA ADJA Pos<nowiki>|</nowiki>Nom<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Masc *<nowiki>|</nowiki>*<nowiki>|</nowiki>*<nowiki>|</nowiki>NK NK | _ | _ | 
-| 8 | la | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 9 | 9 | spec | spec | _ | _ | _ | _ | _ | _ | +Diktator Diktator Diktator NN NN Nom<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Masc *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Masc PD PD | _ | _ | 
-| 9 | condemna | condemna | condemna | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 7 | 7 | cd | cd | _ | _ | arg1-pat | _ | _ | _ | +| <nowiki>''</nowiki> | _ | <nowiki>''</nowiki>$( $( | _ | _ | PUNC PUNC | _ | _ |
-| 10 | a | a | a | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sp | sp | _ | _ | _ | _ | _ | _ | +
-| 11 | quatre | quatre | quatre | d | d | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 12 | 12 | spec | spec | _ | _ | _ | _ | _ | _ | +
-| 12 | anys | any | any | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 10 | 10 | sn | sn | _ | _ | _ | _ | _ | _ | +
-| 13 | d' | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 12 | 12 | sp | sp | _ | _ | _ | _ | _ | _ | +
-| 14 | inhabilitació | inhabilitació | inhabilitació | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 13 | 13 | sn | sn | _ | _ | _ | _ | _ | _ | +
-| 15 | especial | especial | especial | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 14 | 14 | s.a | s.a | _ | _ | _ | _ | _ | _ | +
-| 16 | i | i | i | c | c | postype=coordinating | postype=coordinating | 12 | 9 | coord | coord | _ | _ | _ | _ | _ | _ | +
-| 17 | una | un | un | d | d | postype=indefinite<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=numeral<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 18 | 18 | spec | spec | _ | _ | _ | _ | _ | _ | +
-| 18 | multa | multa | multa | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 12 | 9 | sn | sn | _ | _ | _ | _ | _ | _ | +
-| 19 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 18 | 18 | sp | sp | _ | _ | _ | _ | _ | _ | +
-| 20 | 3,6 | 3.6 3,6 z | n | _ | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 21 | 21 | spec | spec | _ | _ | _ | _ | _ | _ | +
-21 | milions | milió | milió | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 19 | 19 | sn | sn | _ | _ | _ | _ | _ | _ | +
-| 22 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 21 | 21 | sp | sp | _ | _ | _ | _ | _ | _ | +
-| 23 | pessetes | pesseta | pesseta | z | n | postype=currency | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 22 | 22 | sn | sn | _ | _ | _ | _ | _ | _ | +
-| 24 | per | per | per | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sp | sp | _ | _ | _ | _ | _ | _ | +
-| 25 | a | a | a | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 24 | 24 | sp | sp | _ | _ | _ | _ | _ | _ | +
-| 26 | quatre | quatre | quatre | d | d | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 27 | 27 | spec | spec | _ | _ | _ | _ | _ | _ | +
-| 27 | veterinaris | veterinari | veterinari | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 25 | 25 | sn | sn | _ | _ | _ | _ | _ | _ | +
-| 28 | gironins | gironí | gironí | a | a | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 27 | 27 | s.a | s.a | _ | _ | _ | _ | _ | _ | +
-| 29 | , | , | , | f | f | punct=comma | punct=comma | 30 | 30 | f | f | _ | _ | _ | _ | _ | _ | +
-| 30 | per | per | per | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 7 | sp | cc | _ | _ | _ | _ | _ | _ | +
-| 31 | haver | haver | haver | v | n | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 33 | 33 | v | v | _ | _ | _ | _ | _ | _ | +
-| 32 | -se | ell | ell | p | p | gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>person=3 | gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>person=3 | 33 | 33 | morfema.pronominal | morfema.pronominal | _ | _ | _ | _ | _ | _ | +
-| 33 | beneficiat | beneficiar beneficiat postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>posfunction=participle | 42 | 30 | S | S | Y | beneficiar.a2 | _ | _ | _ | +
-| 34 | dels | del | dels | s | s | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p<nowiki>|</nowiki>contracted=yes | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p<nowiki>|</nowiki>contracted=yes | 33 | 33 creg creg | _ | _ | _ | arg1-null | _ | _ | +
-35 càrrecs càrrec càrrec postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 34 | 34 | sn | sn | _ | _ | _ | _ | _ | _ | +
-| 36 | públics | públic | públic | a | a | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 35 | 35 | s.a | s.a | _ | _ | _ | _ | _ | _ | +
-| 37 | que | que | que | p | p | postype=relative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=relative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 39 | 39 | cd | cd | _ | _ | _ | _ | arg1-pat | _ | +
-| 38 | _ | _ | _ | p | p | _ | _ | 39 | 39 | suj | suj | _ | _ | _ | _ | arg0-agt | _ | +
-| 39 | desenvolupaven | desenvolupar | desenvolupar | v | v | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=imperfect | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=imperfect 35 35 S | Y | desenvolupar.a2 | _ | _ | _ | _ | +
-40 c | postype=coordinating | postype=coordinating | 42 | 33 | coord | coord | _ | _ | _ | _ | +
-| 41 | la_seva | el_seu | el_seu | d | d | postype=possessive<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3 | postype=possessive<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3 | 42 | 42 | spec spec | _ | _ | _ | _ | _ | _ | +
-42 relació relació relació postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s 30 33 sn cd | _ | _ | _ | _ | _ | _ | +
-43 amb amb amb postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 42 | 42 | sp | sp | _ | _ | _ | _ | _ | _ | +
-| 44 | les | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p 45 45 spec | spec | _ | _ | _ | _ | _ | _ | +
-45 empreses empresa empresa postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p 43 43 sn sn | _ | _ | _ | _ | _ | _ | +
-46 càrniques | càrnic | càrnic | a | a | postype=qualificative<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | 45 | 45 | s.a | s.a | _ | _ | _ | _ | _ | _ | +
-| 47 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 45 | 45 | sp | sp | _ | _ | _ | +
-| 48 | la | el | el | d | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 49 | 49 | spec | spec | _ | _ | _ | _ | _ | _ | +
-| 49 | zona | zona | zona | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 47 | 47 | sn | sn | _ | _ | _ | _ | _ | _ | +
-| 50 | en | en | en | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 42 | 42 | sp | sp | _ | _ | _ | _ | _ | _ | +
-| 51 | oferir | oferir | oferir | v | v | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c<nowiki>|</nowiki>mood=infinitive | 50 | 50 | S | S | Y | oferir.a32 | _ | _ | _ | _ | +
-| 52 | -los | ell | ell | p | p | postype=personal<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3 | postype=personal<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p<nowiki>|</nowiki>person=3 | 51 | 51 | ci | ci | _ | _ | _ | _ | _ | arg2-ben | +
-| 53 | serveis | servei | servei | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 51 | 51 | cd | cd | _ | _ | _ | _ | _ | arg1-pat | +
-| 54 | particulars | particular | particular | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p | 53 | 53 | s.a | s.a | _ | _ | _ | _ | _ | _ | +
-| 55 | . | . | . | f | f | punct=period | punct=period | 7 | 7 | f | f | _ | _ | _ | _ | _ | _ |+
  
 The first sentence of the CoNLL 2009 development data: The first sentence of the CoNLL 2009 development data:
  
-| 1 | Fundació_Privada_Fira_de_Manresa Fundació_Privada_Fira_de_Manresa Fundació_Privada_Fira_de_Manresa postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 3 | 3 | suj | suj | _ | _ | arg0-agt | +| 1 | Maschinenbau Maschinenbau Maschinenbau NN NN Nom<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Masc *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Masc | 0 | 4 | ROOT NK | _ | _ | 
-| 2 | ha | haver | haver | v | v | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=auxiliary<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | 3 | 3 | v | v | _ | _ | _ | +| / | _ | / | $( $( | _ | _ | PUNC PUNC | _ | _ | 
-| 3 | fet | fer | fer | v | v | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | postype=main<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>mood=pastparticiple | 0 | 0 | sentence | sentence | Y | fer.a2 | _ | +| 3 | | _ | $( $( | _ | _ | PUNC PUNC | _ | _ | 
-| 4 | un un | un | d | d | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 5 | 5 | spec | spec | _ | _ | _ | +Zusammenfassung Zusammenfassung Zusammenfassung NN NN Nom<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem ROOT ROOT | _ | _ | 
-balanç | balanç | balanç | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 3 | 3 | cd | cd | _ | _ | arg1-pat | +| _ | $( $( | _ | _ | PUNC PUNC | _ | _ |
-| 6 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 5 | 5 | sp | sp | _ | _ | _ | +
-| 7 | l' | el | el | d | d | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 8 spec spec | _ | _ | _ | +
-| 8 | activitat | activitat | activitat | n | n | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | 6 | 6 | sn | sn | _ | _ | _ | +
-| 9 | del | del | del | s | s | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>contracted=yes | postype=preposition<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>contracted=yes | 8 | 8 | sp | sp | _ | _ | _ | +
-| 10 | Palau_Firal | Palau_Firal | Palau_Firal | n | n | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 9 | 9 | sn | sn | _ | _ | _ | +
-| 11 | durant | durant | durant | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 8 | 3 | sp | cc | _ | _ | _ | +
-| 12 | els | el | el | d | d | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 15 | 15 spec spec | _ | _ | | +
-13 primers | primer | primer | a | a | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 12 | 12 | a | a | _ | _ | _ | +
-14 cinc cinc cinc postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p postype=numeral<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=p 12 12 d | _ | _ | _ | +
-15 mesos | mes | mes | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=p | 11 | 11 | sn | sn | _ | | +
-| 16 | de | de | de | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | 15 | 15 | sp | sp | _ | _ | | +
-17 l' | el | el | d | d | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | 18 | 18 | spec | spec | _ | _ | _ | +
-| 18 | any | any | any | n | n | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | 16 | 16 | sn | sn | _ | _ | _ | +
-| 19 | . | . | . | f | f | punct=period | punct=period | 3 | 3 | f | f | _ | _ | _ |+
  
 The first sentence of the CoNLL 2009 test data: The first sentence of the CoNLL 2009 test data:
  
-| 1 | El el el postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | postype=article<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +| 1 | Gegen gegen gegen APPR APPR | _ | _ | _ | _ | _ | 
-| 2 | darrer darrer darrer postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s postype=ordinal<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +| 2 | eine ein ein ART ART Acc<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem | _ | _ | _ | _ | _ | 
-| 3 | número número número postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +| 3 | Erweiterung Erweiterung Erweiterung NN NN Acc<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem | _ | _ | _ | _ | _ | 
-| 4 | de de de postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | +| 4 | ihrer ihr ihr PPOSAT PPOSAT Gen<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem *<nowiki>|</nowiki>*<nowiki>|</nowiki>| _ | _ | _ | _ | _ | 
-| 5 | l' el el postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s postype=article<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +| 5 | Organisation Organisation Organisation NN NN Gen<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem | _ | _ | _ | _ | _ | 
-| 6 | Observatori_del_Mercat_de_Treball_d'_Osona Observatori_del_Mercat_de_Treball_d'_Osona Observatori_del_Mercat_de_Treball_d'_Osona postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | +| 6 | zu | zu | zu | APPR | APPR | _ | _ | _ | _ | _ | _ | _ | 
-inclou incloure incloure postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | postype=main<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s<nowiki>|</nowiki>person=3<nowiki>|</nowiki>mood=indicative<nowiki>|</nowiki>tense=present | _ | _ | _ | _ | Y +| 7 | einem ein ein ART ART Dat<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Neut Dat<nowiki>|</nowiki>Sg<nowiki>|</nowiki>| _ | _ | _ | _ | _ | 
-un un un postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s postype=numeral<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +sicherheitspolitischen sicherheitspolitisch sicherheitspolitisch ADJA ADJA Pos<nowiki>|</nowiki>Dat<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Neut Pos<nowiki>|</nowiki>*<nowiki>|</nowiki>*<nowiki>|</nowiki>| _ | _ | _ | _ | 
-informe informe informe postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | +Forum Forum Forum NN NN Dat<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Neut *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Neut | _ | _ | _ | _ | _ | 
-10 especial especial especial | a | a | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s _ | _ | _ | _ | +10 sprachen sprechen sprechen VVFIN VVFIN 3<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Past<nowiki>|</nowiki>Ind *<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Past<nowiki>|</nowiki>Ind | _ | _ | _ | _ | Y 
-| 11 | sobre | sobre | sobre | s | s | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | +11 sich sich er<nowiki>|</nowiki>es<nowiki>|</nowiki>sie<nowiki>|</nowiki>Sie PRF PRF 3<nowiki>|</nowiki>Acc<nowiki>|</nowiki>Pl *<nowiki>|</nowiki>*<nowiki>|</nowiki>| _ | _ | _ | _ | _ | 
-| 12 | la el | el | d | postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +| 12 | die der | d | ART | ART Nom<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Masc *<nowiki>|</nowiki>*<nowiki>|</nowiki>| _ | _ | _ | _ | _ | 
-| 13 | contractació contractació contractació postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +| 13 | meisten meister meist PIAT PIAT Nom<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Masc *<nowiki>|</nowiki>*<nowiki>|</nowiki>| _ | _ | _ | _ | _ | 
-| 14 | a_través_de a_través_de a_través_de postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | +| 14 | Staaten Staat Staat NN NN Nom<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Masc *<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Masc | _ | _ | _ | _ | _ | 
-| 15 | les el el postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ | +| 15 | beim bei beim APPRART APPRART Dat<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Neut Dat<nowiki>|</nowiki>Sg<nowiki>|</nowiki>| _ | _ | _ | _ | _ | 
-| 16 | empreses empresa empresa postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p postype=common<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ | +| 16 | Gipfeltreffen Gipfeltreffen Gipfeltreffen NN NN Dat<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Neut *<nowiki>|</nowiki>*<nowiki>|</nowiki>Neut | _ | _ | _ | _ | _ | 
-| 17 | de de de postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | postype=preposition<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | +| 17 | für für für APPR APPR | _ | _ | _ | _ | _ | 
-| 18 | treball treball treball postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s postype=common<nowiki>|</nowiki>gen=m<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +| 18 | Asiatisch-Pazifische asiatisch-pazifisch Asiatisch-Pazifische ADJA NN Pos<nowiki>|</nowiki>Acc<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem *<nowiki>|</nowiki>*<nowiki>|</nowiki>| _ | _ | _ | _ | _ | 
-| 19 | temporal temporal temporal postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s postype=qualificative<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=s | _ | _ | _ | _ | _ | +| 19 | Wirtschaftskooperation Wirtschaftskooperation Wirtschaftskooperation NN NN Acc<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem | _ | _ | _ | _ | _ | 
-| 20 | punct=comma punct=comma | _ | _ | _ | _ | _ | +| 20 | $( $( | _ | _ | _ | _ | _ | 
-| 21 | les el el postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p postype=article<nowiki>|</nowiki>gen=f<nowiki>|</nowiki>num=p | _ | _ | _ | _ | _ | +| 21 | Apec Apec NE NE Nom<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Fem _ | _ | _ | _ | 
-22 ETT ETT ETT postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c postype=proper<nowiki>|</nowiki>gen=c<nowiki>|</nowiki>num=c | _ | _ | _ | _ | _ | +| 22 | ) | _ | ) | $( | $( | _ | _ | _ | _ | _ | _ | _ | 
-23 | . | . | . | punct=period | punct=period | _ | _ | _ | _ | _ |+23 | in | in | in | APPR | APPR | _ | _ | _ | _ | _ | _ | _ | 
 +| 24 Osaka Osaka Osaka NE NE Dat<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Neut *<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Neut | _ | _ | _ | _ | _ | 
 +25 | aus | aus | aus | PTKVZ | PTKVZ | _ | _ | _ | _ | _ | _ | _ | 
 +| 26 | . | _ | . | $. | $. | _ | _ | _ | _ | _ |
  
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in AnCora-CA are very rareOnly 487 of the 435,860 tokens in the CoNLL 2007 version are attached nonprojectively (0.11%). In the CoNLL 2009 version, there are no nonprojectivities at all.+TIGER is a mildly nonprojective treebank15875 of the 680,710 tokens in the CoNLL 2009 training+development datasets are attached nonprojectively (2.33%).
  
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al.2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Catalan:+The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for German:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-Titov et al. | 87.40 93.40 +MST (McDonald et al.| 87.34 90.38 
-Sagae | 88.16 | 93.34 | +Riedel et al. | 86.24 89.76 
-| Malt (Nilsson et al.88.70 93.12 +Basis (O'Neil) 85.36 89.16 
-Nakagawa 87.90 92.86 | +| Malt (Nivre et al.) | 85.82 88.76 |
-| Carreras | 87.60 | 92.46 +
-| Malt (Hall et al.) | 87.74 92.20 |+
  
-The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. +The results of the CoNLL 2009 shared task are [[http://ufal.mff.cuni.cz/conll2009-st/results/results.php|available online]]. They have been published in [[http://aclweb.org/anthology/W/W09/W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for German:
- +
-The results of the CoNLL 2009 shared task are [[http://ufal.mff.cuni.cz/conll2009-st/results/results.php|available online]]. They have been published in [[http://aclweb.org/anthology/W/W09/W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for Catalan:+
  
 ^ Parser (Authors) ^ LAS ^ ^ Parser (Authors) ^ LAS ^
-| Merlo | 87.86 | +| Bohnet | 87.48 | 
-| Che | 86.56 +| Merlo | 87.29 | 
-Bohnet 86.35 +| Chen | 86.24 
-Chen 85.88 |+| Che | 86.19 
 + 
 +===== Greek (el) ===== 
 + 
 +Greek Dependency Treebank (GDT) 
 + 
 +==== Versions ==== 
 + 
 +  * CoNLL 2007 
 + 
 +==== Obtaining and License ==== 
 + 
 +There does not seem to be any regular distribution channel for the Greek Dependency Treebank. The CoNLL 2007 version had a restricted license for the duration of the shared task only. Republication of the CoNLL version in LDC is planned but it has not happenned yet. In the meantime, one can ask Prokopis Prokopidis (prokopis (at) ilsp (dot) gr) about availability of the corpus. 
 + 
 +GDT was created by members of the [[http://www.ilsp.gr/|Institute for Language and Speech Processing]] (Ινστιτούτο Επεξεργασίας του Λόγου, ILSP/ΙΕΛ), Επιδαύρου & Αρτέμιδος 6, Παράδεισος Αμαρουσίου, GR-15125 Αθήνα, Greece. 
 + 
 +==== References ==== 
 + 
 +  * Website 
 +    * //no website dedicated to the treebank// 
 +  * Data 
 +    * //no separate citation// 
 +  * Principal publications 
 +    * Prokopis Prokopidis, Elina Desipri, Maria Koutsombogera, Harris Papageorgiou, Stelios Piperidis: [[http://www.ilsp.gr/homepages/prokopidis/documents/gdt_tlt2005.pdf|Theoretical and Practical Issues in the Construction of a Greek Dependency Corpus]] In: Montserrat Civit, Sandra Kübler, MaAntònia Martí (eds.), Proceedings of The Fourth Workshop on Treebanks and Linguistic Theories (TLT 2005), pp. 149-160, Barcelona, Spain, 2005. 
 +  * Documentation 
 +    * Description of tags and feature values is provided in the ''doc/README'' file in the CoNLL 2007 data distribution. 
 + 
 +==== Domain ==== 
 + 
 +Mixed (“GDT consists of randomly selected textual fragments and texts in three domains: politics (current affairs, manual transcripts and minutes of European parliamentary sessions), health, and travel.”) 
 + 
 +==== Size ==== 
 + 
 +The CoNLL 2007 version contains 70223 tokens in 2902 sentences, yielding 24.20 tokens per sentence on average (CoNLL 2007 data split: 65419 tokens / 2705 sentences training, 4804 tokens / 197 sentences test). 
 + 
 +==== Inside ==== 
 + 
 +The syntactic annotation style and the tagset for dependency relations (analytical functions) in GDT has been modeled after the [[http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/a-layer/html/index.html|Prague Dependency Treebank]]. 
 + 
 +==== Sample ==== 
 + 
 +The first sentence of the CoNLL 2007 training data: 
 + 
 +1 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | 
 +| 2 | Τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 3 | Atr | _ | _ | 
 +| 3 | αντισώματα | αντίσωμα | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Nm | 5 | Sb | _ | _ | 
 +| 4 | IgG | IgG | Rg | RgFwOr | _ | 3 | Atr | _ | _ | 
 +| 5 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | 
 +| 6 | σαν | σαν | Ad | Ad | Ba | 5 | Adv | _ | _ | 
 +| 7 | μακροπρόθεσμη | μακροπρόθεσμος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 8 | Atr | _ | _ | 
 +| 8 | μνήμη | μνήμη | No | NoCm | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 6 | Adv | _ | _ | 
 +| 9 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | 
 +| 10 | ενώ | ενώ | Cj | CjCo | _ | 26 | Coord | _ | _ | 
 +| 11 | το | ο | At | AtDf | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 12 | Atr | _ | _ | 
 +| 12 | IgA | IgA | Rg | RgFwOr | _ | 15 | Sb | _ | _ | 
 +| 13 | πιστεύεται | πιστεύεται | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 10 | Obj_Co | _ | _ | 
 +| 14 | ότι | ότι | Cj | CjSb | _ | 13 | AuxC | _ | _ | 
 +| 15 | είναι | είμαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 14 | Sb | _ | _ | 
 +| 16 | ένας | ένας | At | AtId | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | 
 +| 17 | συγκεκριμένος | συγκεκριμένος | Aj | Aj | Ba<nowiki>|</nowiki>Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 18 | Atr | _ | _ | 
 +| 18 | δείκτης | δείκτης | No | NoCm | Ma<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 15 | Pnom | _ | _ | 
 +| 19 | για | για | AsPp | AsPpSp | _ | 18 | AuxP | _ | _ | 
 +| 20 | πρόσφατες | πρόσφατος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | 
 +| 21 | ή | ή | Cj | CjCo | _ | 23 | Coord | _ | _ | 
 +| 22 | χρόνιες | χρόνιος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 21 | Atr_Co | _ | _ | 
 +| 23 | λοιμώξεις | λοίμωξη | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 19 | Atr | _ | _ | 
 +| 24 | " | " | PUNCT | PUNCT | _ | 10 | AuxG | _ | _ | 
 +| 25 | , | , | PUNCT | PUNCT | _ | 10 | AuxX | _ | _ | 
 +| 26 | εξηγεί | εξηγώ | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Av<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | 
 +| 27 | η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | 
 +| 28 | Δρ | Δρ | Rg | RgFwTr | _ | 26 | Sb | _ | _ | 
 +| 29 | Αρκάρι | Αρκάρι | No | NoCm | Ne<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 28 | Atr | _ | _ | 
 +| 30 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | 
 + 
 +The first sentence of the CoNLL 2007 test data: 
 + 
 +| 1 | Η | ο | At | AtDf | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 2 | Atr | _ | _ | 
 +| 2 | Σίφνος | Σίφνος | No | NoPr | Fe<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Nm | 3 | Sb | _ | _ | 
 +| 3 | φημίζεται | φημίζομαι | Vb | VbMn | Id<nowiki>|</nowiki>Pr<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Xx<nowiki>|</nowiki>Ip<nowiki>|</nowiki>Pv<nowiki>|</nowiki>Xx | 0 | Pred | _ | _ | 
 +| 4 | και | και | Cj | CjCo | _ | 5 | AuxY | _ | _ | 
 +| 5 | για | για | AsPp | AsPpSp | _ | 3 | AuxP | _ | _ | 
 +| 6 | τα | ο | At | AtDf | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | 
 +| 7 | καταγάλανα | καταγάλανος | Aj | Aj | Ba<nowiki>|</nowiki>Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 8 | Atr | _ | _ | 
 +| 8 | νερά | νερό | No | NoCm | Ne<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ac | 5 | Obj | _ | _ | 
 +| 9 | των | ο | At | AtDf | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | 
 +| 10 | πανέμορφων | πανέμορφος | Aj | Aj | Ba<nowiki>|</nowiki>Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 11 | Atr | _ | _ | 
 +| 11 | ακτών | ακτή | No | NoCm | Fe<nowiki>|</nowiki>Pl<nowiki>|</nowiki>Ge | 8 | Atr | _ | _ | 
 +| 12 | της | μου | Pn | PnPo | Fe<nowiki>|</nowiki>03<nowiki>|</nowiki>Sg<nowiki>|</nowiki>Ge<nowiki>|</nowiki>Xx | 11 | Atr | _ | _ | 
 +| 13 | . | . | PUNCT | PUNCT | _ | 0 | AuxK | _ | _ | 
 + 
 +==== Parsing ==== 
 + 
 +Nonprojectivities in BTB are rare. Only 747 of the 196,151 tokens in the CoNLL 2006 version are attached nonprojectively (0.38%). 
 + 
 +The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Bulgarian: 
 + 
 +^ Parser (Authors) ^ LAS ^ UAS ^ 
 +| MST (McDonald et al.) | 87.57 | 92.04 | 
 +| Malt (Nivre et al.) | 87.41 | 91.72 | 
 +| Nara (Yuchang Cheng) | 86.34 91.30 |
  

[ Back to the navigation ] [ Back to the content ]