[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:treebanks [2011/11/20 18:25]
zeman English inside.
user:zeman:treebanks [2011/11/20 18:53]
zeman Nějak už se to sem nevejde.
Line 1: Line 1:
 ====== Treebanks for Various Languages ====== ====== Treebanks for Various Languages ======
 +
 +  * [[user:zeman:treebanks:ar|Arabic (ar)]]
 +  * [[user:zeman:treebanks:bg|Bulgarian (bg)]]
 +  * [[user:zeman:treebanks:bn|Bengali (bn)]]
 +  * [[user:zeman:treebanks:ca|Catalan (ca)]]
 +  * [[user:zeman:treebanks:cs|Czech (cs)]]
 +  * [[user:zeman:treebanks:da|Danish (da)]]
 +  * [[user:zeman:treebanks:de|German (de)]]
 +  * [[user:zeman:treebanks:el|Greek (el)]]
 +  * [[user:zeman:treebanks:en|English (en)]]
  
 ===== Arabic (ar) ===== ===== Arabic (ar) =====
Line 1719: Line 1729:
 Conversion for CoNLL 2007: Many function tags were removed from the non-terminals in the phrase-structure representation. The phrase structures were converted to dependency structures using the procedure described in [[http://dspace.utlib.ee/dspace/bitstream/handle/10062/2560/reg-Johansson-10.pdf;jsessionid=BB8432D9BAE4FCF9DD9BD746704E796F?sequence=1|(Johansson and Nugues, 2007)]]. Conversion for CoNLL 2007: Many function tags were removed from the non-terminals in the phrase-structure representation. The phrase structures were converted to dependency structures using the procedure described in [[http://dspace.utlib.ee/dspace/bitstream/handle/10062/2560/reg-Johansson-10.pdf;jsessionid=BB8432D9BAE4FCF9DD9BD746704E796F?sequence=1|(Johansson and Nugues, 2007)]].
  
-The original Penn Treebank contains non-terminal labels, function tags and part-of-speech tags, all assigned manually. The CoNLL 2009 version contains manual and automatic disambiguation. See above for documentation of the part-of-speech tags. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=en::penn|DZ Interset]] to inspect the tagset. +The original Penn Treebank contains non-terminal labels, function tags and part-of-speech tags, all assigned manually. The CoNLL 2009 version contains manual and automatic disambiguation. See above for documentation of the part-of-speech tags. Use [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=en::penn|DZ Interset]] to inspect the tagset. The original treebank and the CoNLL 2007 version does not contain lemmas. The CoNLL 2009 version includes some lemmas but they are just lowercased word forms most of the timee.gnouns are not converted to singularNeverthelessthere is some base-form normalization of verbs.
- +
-==== Sample ==== +
- +
-The first sentence of the PDT 1.0 training data: +
- +
-<code xml><csts lang=cs> +
-<h> +
-<source>Českomoravský profit</source> +
-<markup> +
-<mauth>js +
-<mdate>1996-2000 +
-<mdesc>Manual analytical annotation +
-</markup> +
-<markup> +
-<mauth>kk,lk +
-<mdate>1996-2000 +
-<mdesc>Manual morphological annotation +
-</markup> +
-</h> +
-<doc file="s/inf/j/1994/cmpr9406" id="001"> +
-<a> +
-<mod>+
-<txtype>inf +
-<genre>mix +
-<med>+
-<temp>1994 +
-<authname>+
-<opus>cmpr9406 +
-<id>001 +
-</a> +
-<c> +
-<p n=1> +
-<s id="cmpr9406:001-p1s1"> +
-<p n=2> +
-<s id="cmpr9406:001-p2s1"> +
-<f cap>Třikrát<l>třikrát`3<t>Cv-------------<MDl src="a">třikrát`3<MDt src="a">Cv-------------<MDl src="b">třikrát`3<MDt src="b">Cv-------------<A>Adv<r>1<g>+
-<f>rychlejší<l>rychlý<t>AAFS1----2A----<MDl src="a">rychlý<MDt src="a">AANS1----2A----<MDl src="b">rychlý<MDt src="b">AAFS1----2A----<A>ExD<r>2<g>+
-<f>než<l>než-2<t>J,-------------<MDl src="a">než-2<MDt src="a">J,-------------<MDl src="b">než-2<MDt src="b">J,-------------<A>AuxC<r>3<g>+
-<f>slovo<l>slovo<t>NNNS1-----A----<MDl src="a">slovo<MDt src="a">NNNS4-----A----<MDl src="b">slovo<MDt src="b">NNNS1-----A----<A>ExD<r>4<g>3</code> +
- +
-The first two sentences of the PDT 1.0 d-test data: +
- +
-<code xml><csts lang=cs> +
-<h> +
-<source>Lidové noviny</source> +
-<markup> +
-<mauth>zu +
-<mdate>1996-2000 +
-<mdesc>Manual analytical annotation +
-</markup> +
-</h> +
-<doc file="s/pub/nws/1994/ln94206" id="1"> +
-<a> +
-<mod>+
-<txtype>pub +
-<genre>mix +
-<med>nws +
-<temp>1994 +
-<authname>+
-<opus>ln94206 +
-<id>+
-</a> +
-<c> +
-<p n=1> +
-<s id="ln94206:1-p1s1"> +
-<i>ti +
-<f cap>Lidé<MDl src="a">člověk<MDt src="a">NNMP1-----A---1<MDl src="b">člověk<MDt src="b">NNMP1-----A---1<A>ExD<r>1<g>+
-<p n=2> +
-<s id="ln94206:1-p2s1"> +
-<f upper.abbr>ING<MDl src="a">Ing-1_:B_^(inženýr)<MDt src="a">NNMXX-----A---8<MDl src="b">Ing-1_:B_^(inženýr)<MDt src="b">NNMXX-----A---8<A>Atr<r>1<g>+
-<D> +
-<d>.<MDl src="a">.<MDt src="a">Z:-------------<MDl src="b">.<MDt src="b">Z:-------------<A>AuxG<r>2<g>+
-<f upper>PETR<MDl src="a">Petr_;Y<MDt src="a">NNMS1-----A----<MDl src="b">Petr_;Y<MDt src="b">NNMS1-----A----<A>Atr<r>3<g>+
-<f upper>KARAS<MDl src="a">karas<MDt src="a">NNMS1-----A----<MDl src="b">karas<MDt src="b">NNMS1-----A----<A>Sb_Ap<r>4<g>11 +
-<D> +
-<d>,<MDl src="a">,<MDt src="a">Z:-------------<MDl src="b">,<MDt src="b">Z:-------------<A>AuxX<r>5<g>+
-<f mixed>CSc<MDl src="a">CSc-1_:B_^(kandidát_věd)<MDt src="a">NNMXX-----A---8<MDl src="b">CSc-1_:B_^(kandidát_věd)<MDt src="b">NNMXX-----A---8<A>Atr<r>6<g>+
-<D> +
-<d>.<MDl src="a">.<MDt src="a">Z:-------------<MDl src="b">.<MDt src="b">Z:-------------<A>AuxG<r>7<g>+
-<d>(<MDl src="a">(<MDt src="a">Z:-------------<MDl src="b">(<MDt src="b">Z:-------------<A>ExD<r>8<g>+
-<D> +
-<f num>53<MDl src="a">53<MDt src="a">C=-------------<MDl src="b">53<MDt src="b">C=-------------<A>ExD_Pa<r>9<g>+
-<D> +
-<d>)<MDl src="a">)<MDt src="a">Z:-------------<MDl src="b">)<MDt src="b">Z:-------------<A>ExD<r>10<g>+
-<D> +
-<d>,<MDl src="a">,<MDt src="a">Z:-------------<MDl src="b">,<MDt src="b">Z:-------------<A>Apos<r>11<g>20 +
-<f>generální<MDl src="a">generální<MDt src="a">AAMS1----1A----<MDl src="b">generální<MDt src="b">AAMS1----1A----<A>Atr<r>12<g>13 +
-<f>ředitel<MDl src="a">ředitel<MDt src="a">NNMS1-----A----<MDl src="b">ředitel<MDt src="b">NNMS1-----A----<A>Sb_Co<r>13<g>15 +
-<f upper>ČEZ<MDl src="a">ČEZ-1_:B_;K_^(České_energetické_závody)<MDt src="a">NNIPX-----A---8<MDl src="b">ČEZ-1_:B_;K_^(České_energetické_závody)<MDt src="b">NNIPX-----A---8<A>Atr<r>14<g>13 +
-<f>a<MDl src="a">a-1<MDt src="a">J^-------------<MDl src="b">a-1<MDt src="b">J^-------------<A>Coord_Ap<r>15<g>11 +
-<f>předseda<MDl src="a">předseda<MDt src="a">NNMS1-----A----<MDl src="b">předseda<MDt src="b">NNMS1-----A----<A>Sb_Co<r>16<g>15 +
-<f>jeho<MDl src="a">jeho_^(přivlast.)<MDt src="a">PSXXXZS3-------<MDl src="b">jeho_^(přivlast.)<MDt src="b">PSXXXZS3-------<A>Atr<r>17<g>18 +
-<f>představenstva<MDl src="a">představenstvo<MDt src="a">NNNS2-----A----<MDl src="b">představenstvo<MDt src="b">NNNS2-----A----<A>Atr<r>18<g>16 +
-<D> +
-<d>,<MDl src="a">,<MDt src="a">Z:-------------<MDl src="b">,<MDt src="b">Z:-------------<A>AuxX<r>19<g>11 +
-<f>je<MDl src="a">být<MDt src="a">VB-S---3P-AA---<MDl src="b">být<MDt src="b">VB-S---3P-AA---<A>Pred<r>20<g>+
-<f>absolventem<MDl src="a">absolvent<MDt src="a">NNMS7-----A----<MDl src="b">absolvent<MDt src="b">NNMS7-----A----<A>Pnom<r>21<g>20 +
-<f>elektrotechnické<MDl src="a">elektrotechnický<MDt src="a">AAFS2----1A----<MDl src="b">elektrotechnický<MDt src="b">AAFS2----1A----<A>Atr<r>22<g>23 +
-<f>fakulty<MDl src="a">fakulta<MDt src="a">NNFS2-----A----<MDl src="b">fakulta<MDt src="b">NNFS2-----A----<A>Atr_Co<r>23<g>25 +
-<f upper>ČVUT<MDl src="a">ČVUT-1_:B_;K_^(České_vysoké_učení_technické)<MDt src="a">NNNXX-----A---8<MDl src="b">ČVUT-1_:B_;K_^(České_vysoké_učení_technické)<MDt src="b">NNNXX-----A---8<A>Atr<r>24<g>23 +
-<f>a<MDl src="a">a-1<MDt src="a">J^-------------<MDl src="b">a-1<MDt src="b">J^-------------<A>Coord<r>25<g>21 +
-<f>postgraduálního<MDl src="a">postgraduální<MDt src="a">AANS2----1A----<MDl src="b">postgraduální<MDt src="b">AANS2----1A----<A>Atr<r>26<g>27 +
-<f>studia<MDl src="a">studium<MDt src="a">NNNS2-----A----<MDl src="b">studium<MDt src="b">NNNS2-----A----<A>Atr_Co<r>27<g>25 +
-<f>v<MDl src="a">v-1<MDt src="a">RR--6----------<MDl src="b">v-1<MDt src="b">RR--6----------<A>AuxP<r>28<g>29 +
-<f>oboru<MDl src="a">obor_^(lidské_činnosti)<MDt src="a">NNIS6-----A----<MDl src="b">obor_^(lidské_činnosti)<MDt src="b">NNIS6-----A----<A>AuxP<r>29<g>27 +
-<f>metod<MDl src="a">metoda<MDt src="a">NNFP2-----A----<MDl src="b">metoda<MDt src="b">NNFP2-----A----<A>Atr<r>30<g>29 +
-<f>operační<MDl src="a">operační<MDt src="a">AAFS2----1A----<MDl src="b">operační<MDt src="b">AAFS2----1A----<A>Atr<r>31<g>32 +
-<f>analýzy<MDl src="a">analýza<MDt src="a">NNFS2-----A----<MDl src="b">analýza<MDt src="b">NNFS2-----A----<A>Atr<r>32<g>30 +
-<D> +
-<d>.<MDl src="a">.<MDt src="a">Z:-------------<MDl src="b">.<MDt src="b">Z:-------------<A>AuxK<r>33<g>0</code> +
- +
-The first sentence of the PDT 1.0 e-test data: +
- +
-<code xml><csts lang=cs> +
-<h> +
-<source>Lidové noviny</source> +
-<markup> +
-<mauth>zu +
-<mdate>1996-2000 +
-<mdesc>Manual analytical annotation +
-</markup> +
-</h> +
-<doc file="s/pub/nws/1994/ln94209" id="1"> +
-<a> +
-<mod>+
-<txtype>pub +
-<genre>mix +
-<med>nws +
-<temp>1994 +
-<authname>+
-<opus>ln94209 +
-<id>+
-</a> +
-<c> +
-<p n=1> +
-<s id="ln94209:1-p1s1"> +
-<f cap>Přádelny<MDl src="a">přádelna<MDt src="a">NNFP1-----A----<MDl src="b">přádelna<MDt src="b">NNFP1-----A----<A>Sb<r>1<g>+
-<f>mají<MDl src="a">mít<MDt src="a">VB-P---3P-AA---<MDl src="b">mít<MDt src="b">VB-P---3P-AA---<A>Pred<r>2<g>+
-<f>dvojnásob<MDl src="a">dvojnásob<MDt src="a">Db-------------<MDl src="b">dvojnásob<MDt src="b">Db-------------<A>Obj<r>3<g>+
-<f>vad<MDl src="a">vada<MDt src="a">NNFP2-----A----<MDl src="b">vada<MDt src="b">NNFP2-----A----<A>Atr<r>4<g>3</code> +
- +
-Morphological annotation of the first amw training file of the PDT 2.0: +
- +
-<code xml><mdata xmlns="http://ufal.mff.cuni.cz/pdt/pml/"> +
- <head> +
-  <schema href="mdata_schema.xml" /> +
-  <references> +
-   <reffile id="w" name="wdata" href="cmpr9406_001.w.gz" /> +
-  </references> +
- </head> +
- <meta> +
-  <lang>cs</lang> +
-  <annotation_info id="manual"> +
-   <desc>Manual annotation</desc> +
-  </annotation_info> +
- </meta> +
- <s id="m-cmpr9406-001-p2s1"> +
-  <m id="m-cmpr9406-001-p2s1w1"> +
-   <src.rf>manual</src.rf> +
-   <w.rf>w#w-cmpr9406-001-p2s1w1</w.rf> +
-   <form>Třikrát</form> +
-   <lemma>třikrát`3</lemma> +
-   <tag>Cv-------------</tag> +
-  </m> +
-  <m id="m-cmpr9406-001-p2s1w2"> +
-   <src.rf>manual</src.rf> +
-   <w.rf>w#w-cmpr9406-001-p2s1w2</w.rf> +
-   <form>rychlejší</form> +
-   <lemma>rychlý</lemma> +
-   <tag>AAFS1----2A----</tag> +
-  </m> +
-  <m id="m-cmpr9406-001-p2s1w3"> +
-   <src.rf>manual</src.rf> +
-   <w.rf>w#w-cmpr9406-001-p2s1w3</w.rf> +
-   <form>než</form> +
-   <lemma>než-2</lemma> +
-   <tag>J,-------------</tag> +
-  </m> +
-  <m id="m-cmpr9406-001-p2s1w4"> +
-   <src.rf>manual</src.rf> +
-   <w.rf>w#w-cmpr9406-001-p2s1w4</w.rf> +
-   <form>slovo</form> +
-   <lemma>slovo</lemma> +
-   <tag>NNNS1-----A----</tag> +
-  </m> +
- </s></code> +
- +
-Analytical (surface-syntactic) annotation of the first amw training file of the PDT 2.0: +
- +
-<code xml><adata xmlns="http://ufal.mff.cuni.cz/pdt/pml/"> +
- <head> +
-  <schema href="adata_schema.xml" /> +
-  <references> +
-   <reffile id="m" name="mdata" href="cmpr9406_001.m.gz" /> +
-   <reffile id="w" name="wdata" href="cmpr9406_001.w.gz" /> +
-  </references> +
- </head> +
- <meta> +
-  <annotation_info> +
-   <desc>Manual annotation</desc> +
-  </annotation_info> +
- </meta> +
- <trees> +
-  <LM id="a-cmpr9406-001-p2s1"> +
-   <s.rf>m#m-cmpr9406-001-p2s1</s.rf> +
-   <ord>0</ord> +
-   <children> +
-    <LM id="a-cmpr9406-001-p2s1w2"> +
-     <m.rf>m#m-cmpr9406-001-p2s1w2</m.rf> +
-     <afun>ExD</afun> +
-     <ord>2</ord> +
-     <children> +
-      <LM id="a-cmpr9406-001-p2s1w1"> +
-       <m.rf>m#m-cmpr9406-001-p2s1w1</m.rf> +
-       <afun>Adv</afun> +
-       <ord>1</ord> +
-      </LM> +
-      <LM id="a-cmpr9406-001-p2s1w3"> +
-       <m.rf>m#m-cmpr9406-001-p2s1w3</m.rf> +
-       <afun>AuxC</afun> +
-       <ord>3</ord> +
-       <children> +
-        <LM id="a-cmpr9406-001-p2s1w4"> +
-         <m.rf>m#m-cmpr9406-001-p2s1w4</m.rf> +
-         <afun>ExD</afun> +
-         <ord>4</ord> +
-        </LM> +
-       </children> +
-      </LM> +
-     </children> +
-    </LM> +
-   </children> +
-  </LM></code> +
- +
-The first two sentences of the CoNLL 2006 and 2007 training data: +
- +
-| 1 | Třikrát | třikrát`3 | C | v | _ | 2 | Adv | _ | _ | +
-| 2 | rychlejší | rychlý | A | A | Gen=F<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Gra=2<nowiki>|</nowiki>Neg=A | 0 | ExD | _ | _ | +
-| 3 | než | než-2 | J | , | _ | 2 | AuxC | _ | _ | +
-| 4 | slovo | slovo | N | N | Gen=N<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | 3 | ExD | _ | _ | +
-| |||||||||| +
-| 1 | Faxu | fax | N | N | Gen=I<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=3<nowiki>|</nowiki>Neg=A | 2 | Obj | _ | _ | +
-| 2 | škodí | škodit | V | B | Num=P<nowiki>|</nowiki>Per=3<nowiki>|</nowiki>Ten=P<nowiki>|</nowiki>Neg=A<nowiki>|</nowiki>Voi=A | 0 | Pred | _ | _ | +
-| 3 | především | především | D | b | _ | 6 | AuxZ | _ | _ | +
-| 4 | přetížené | přetížený | A | A | Gen=F<nowiki>|</nowiki>Num=P<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | 6 | Atr | _ | _ | +
-| 5 | telefonní | telefonní | A | A | Gen=F<nowiki>|</nowiki>Num=P<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | 6 | Atr | _ | _ | +
-| 6 | linky | linka | N | N | Gen=F<nowiki>|</nowiki>Num=P<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | 2 | Sb | _ | _ | +
-| 7 | * | * | Z | : | _ | 2 | AuxG | _ | _ | +
- +
-The first sentence of the CoNLL 2006 test data: +
- +
-| 1 | Podobně | podobně | D | g | Gra=1<nowiki>|</nowiki>Neg=A | 5 | Adv | _ | _ | +
-| 2 | , | , | Z | : | _ | 3 | AuxX | _ | _ | +
-| 3 | myslím | myslit | V | B | Num=S<nowiki>|</nowiki>Per=1<nowiki>|</nowiki>Ten=P<nowiki>|</nowiki>Neg=A<nowiki>|</nowiki>Voi=A | 5 | Pred_Pa | _ | _ | +
-| 4 | , | , | Z | : | _ | 3 | AuxX | _ | _ | +
-| 5 | postupuje | postupovat | V | B | Num=S<nowiki>|</nowiki>Per=3<nowiki>|</nowiki>Ten=P<nowiki>|</nowiki>Neg=A<nowiki>|</nowiki>Voi=A | 0 | Pred | _ | _ | +
-| 6 | většina | většina | N | N | Gen=F<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | 5 | Sb | _ | _ | +
-| 7 | českých | český | A | A | Gen=F<nowiki>|</nowiki>Num=P<nowiki>|</nowiki>Cas=2<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | 8 | Atr | _ | _ | +
-| 8 | bank | banka | N | N | Gen=F<nowiki>|</nowiki>Num=P<nowiki>|</nowiki>Cas=2<nowiki>|</nowiki>Neg=A | 6 | Atr | _ | _ | +
-| 9 | , | , | Z | : | _ | 11 | AuxX | _ | _ | +
-| 10 | zejména | zejména | D | b | _ | 12 | AuxZ | _ | _ | +
-| 11 | v | v-1 | R | R | Cas=6 | 5 | AuxP | _ | _ | +
-| 12 | případech | případ | N | N | Gen=I<nowiki>|</nowiki>Num=P<nowiki>|</nowiki>Cas=6<nowiki>|</nowiki>Neg=A | 11 | Adv | _ | _ | +
-| 13 | , | , | Z | : | _ | 17 | AuxX | _ | _ | +
-| 14 | kdy | kdy | D | b | _ | 17 | Adv | _ | _ | +
-| 15 | by | být | V | c | Num=X<nowiki>|</nowiki>Per=3 | 17 | AuxV | _ | _ | +
-| 16 | se | se | P | 7 | Num=X<nowiki>|</nowiki>Cas=4 | 18 | AuxT | _ | _ | +
-| 17 | mělo | mít | V | p | Gen=N<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Per=X<nowiki>|</nowiki>Ten=R<nowiki>|</nowiki>Neg=A<nowiki>|</nowiki>Voi=A | 12 | Atr | _ | _ | +
-| 18 | jednat | jednat | V | f | Neg=A | 17 | Obj | _ | _ | +
-| 19 | o | o-1 | R | R | Cas=4 | 18 | AuxP | _ | _ | +
-| 20 | větší | velký | A | A | Gen=F<nowiki>|</nowiki>Num=P<nowiki>|</nowiki>Cas=4<nowiki>|</nowiki>Gra=2<nowiki>|</nowiki>Neg=A | 21 | Atr | _ | _ | +
-| 21 | částky | částka | N | N | Gen=F<nowiki>|</nowiki>Num=P<nowiki>|</nowiki>Cas=4<nowiki>|</nowiki>Neg=A | 19 | Obj | _ | _ | +
-| 22 | | . | Z | : | _ | 0 | AuxK | _ | _ | +
- +
-The first sentence of the CoNLL 2007 test data: +
- +
-| 1 | Proč | proč | D | b | _ | 2 | Adv | _ | _ | +
-| 2 | mají | mít | V | B | Num=P<nowiki>|</nowiki>Per=3<nowiki>|</nowiki>Ten=P<nowiki>|</nowiki>Neg=A<nowiki>|</nowiki>Voi=A | 0 | Pred | _ | _ | +
-| 3 | každý | každý | A | A | Gen=I<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=4<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | 4 | Atr | _ | _ | +
-| 4 | rok | rok | N | N | Gen=I<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=4<nowiki>|</nowiki>Neg=A | 5 | Adv | _ | _ | +
-| 5 | fasovat | fasovat | V | f | Neg=A | 2 | Obj | _ | _ | +
-| 6 | speciální | speciální | A | A | Gen=F<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=4<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | 7 | Atr | _ | _ | +
-| 7 | taxu | taxa | N | N | Gen=F<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=4<nowiki>|</nowiki>Neg=A | 5 | Obj | _ | _ | +
-| 8 | na | na | R | R | Cas=4 | 7 | AuxP | _ | _ | +
-| 9 | oblečení | oblečení | N | N | Gen=N<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=4<nowiki>|</nowiki>Neg=A | 8 | AtrAdv | _ | _ | +
-| 10 | ? | ? | Z | : | _ | 0 | AuxK | _ | _ | +
- +
-The first sentence of the CoNLL 2009 training data: +
- +
-| 1 | Celní | celní | celní | A | A | SubPOS=A<nowiki>|</nowiki>Gen=F<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | SubPOS=A<nowiki>|</nowiki>Gen=F<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | 2 | 2 | Atr | Atr | Y | celní | _ | RSTR | _ | +
-| 2 | unie | unie | unie | N | N | SubPOS=N<nowiki>|</nowiki>Gen=F<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | SubPOS=N<nowiki>|</nowiki>Gen=F<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | 0 | 0 | ExD | ExD | Y | unie | _ | _ | _ | +
-| 3 | v | v | v | R | R | SubPOS=R<nowiki>|</nowiki>Cas=6 | SubPOS=R<nowiki>|</nowiki>Cas=6 | 2 | 2 | AuxP | AuxP | _ | _ | _ | _ | _ | +
-| 4 | ohrožení | ohrožení | ohrožení | N | N | SubPOS=N<nowiki>|</nowiki>Gen=N<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=6<nowiki>|</nowiki>Neg=A | SubPOS=N<nowiki>|</nowiki>Gen=N<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=6<nowiki>|</nowiki>Neg=A | 3 | 3 | Atr | Atr | Y | v-w3017f1 | _ | _ | _ | +
- +
-The first sentence of the CoNLL 2009 development data: +
- +
-| 1 | <nowiki>|</nowiki> | <nowiki>|</nowiki> | <nowiki>|</nowiki> | Z | Z | SubPOS=: | SubPOS=: | 0 | 3 | ExD | AuxG | _ | _ | _ | _ | +
-| 2 | Daňový | daňový | daňový | A | A | SubPOS=A<nowiki>|</nowiki>Gen=M<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | SubPOS=A<nowiki>|</nowiki>Gen=M<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Gra=1<nowiki>|</nowiki>Neg=A | 3 | 3 | Atr | Atr | Y | daňový | _ | RSTR | +
-| 3 | poradce | poradce | poradce | N | N | SubPOS=N<nowiki>|</nowiki>Gen=M<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | SubPOS=N<nowiki>|</nowiki>Gen=M<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | 0 | 0 | ExD | ExD | Y | poradce | _ | _ | +
-| 4 | <nowiki>|</nowiki> | <nowiki>|</nowiki> | <nowiki>|</nowiki> | Z | Z | SubPOS=: | SubPOS=: | 0 | 3 | AuxK | AuxG | _ | _ | _ | _ | +
- +
-The first sentence of the CoNLL 2009 test data: +
- +
-| 1 | Názor | názor | názor | N | N | SubPOS=N<nowiki>|</nowiki>Gen=I<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | SubPOS=N<nowiki>|</nowiki>Gen=I<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=1<nowiki>|</nowiki>Neg=A | _ | _ | _ | _ | Y | +
-| 2 | experta | expert | expert | N | N | SubPOS=N<nowiki>|</nowiki>Gen=M<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=2<nowiki>|</nowiki>Neg=A | SubPOS=N<nowiki>|</nowiki>Gen=M<nowiki>|</nowiki>Num=S<nowiki>|</nowiki>Cas=2<nowiki>|</nowiki>Neg=A | _ | _ | _ | _ | Y | +
- +
-==== Parsing ==== +
- +
-PDT is a mildly nonprojective treebank. 8351 of the 437,020 tokens in the CoNLL 2007 version are attached nonprojectively (1.91%). +
- +
-There is an [[http://ufal.mff.cuni.cz/czech-parsing/|online summary]] of known results in Czech parsing. +
- +
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi2006)]]The evaluation procedure was non-standard because it excluded punctuation tokensThese are the best results for Czech: +
- +
-^ Parser (Authors) ^ LAS ^ UAS ^ +
-| MST (McDonald et al.) | 80.18 | 87.30 | +
-| Basis (O'Neil) | 76.60 | 85.58 | +
-| Malt (Nivre et al.) | 78.42 | 84.80 | +
-| Nara (Yuchang Cheng) | 76.24 | 83.40 | +
- +
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Czech: +
- +
-^ Parser (Authors) ^ LAS ^ UAS ^ +
-| Nakagawa | 80.19 | 86.28 | +
-| Carreras | 78.60 | 85.16 | +
-| Titov et al. | 77.94 | 84.19 | +
-| Malt (Nilsson et al.) | 77.98 | 83.59 | +
-| Attardi et al. | 77.37 | 83.40 | +
-| Malt (Hall et al.) | 77.22 | 82.35 | +
- +
-The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. +
- +
-The results of the CoNLL 2009 shared task are [[http://ufal.mff.cuni.cz/conll2009-st/results/results.php|available online]]. They have been published in [[http://aclweb.org/anthology/W/W09/W09-1201.pdf|(Hajič et al., 2009)]]. Unlabeled attachment score was not published. These are the best results for Czech: +
- +
-^ Parser (Authors) ^ LAS ^ +
-| Merlo (Gesmundo et al.) | 80.38 | +
-| Bohnet | 80.11 | +
-| Che et al. | 80.01 |+
  

[ Back to the navigation ] [ Back to the content ]