[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:it [2012/01/03 15:33]
zeman Size.
user:zeman:treebanks:it [2012/01/03 15:48]
zeman Parsing results.
Line 42: Line 42:
 ==== Inside ==== ==== Inside ====
  
-The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.+The original ISST is a phrase-based treebank. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.
  
-Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.+Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.
  
-Personal names have been collapsed into one token, using underscore as the joining character (e.g. Torgyán_József).+Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. a_causa_di).
  
 ==== Sample ==== ==== Sample ====
Line 52: Line 52:
 The first sentence of the CoNLL 2007 training data: The first sentence of the CoNLL 2007 training data:
  
-| 1 | Az az Tf | <nowiki>def=yes</nowiki>DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 1 | Non non | <nowiki>_</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | elmúlt elmúlt Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 2 | ci ci PQ | <nowiki>gen=N|num=P|per=1</nowiki>clit | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 3 | nyolc nyolc Mc | <nowiki>n=singular|case=nominative</nowiki> 4 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 3 | rendiamo rendere | <nowiki>num=P|per=1|mod=I|tmp=P</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | hónapban | hónap | N | Nc | <nowiki>n=singular|case=inessive|proper=no</nowiki>16 INE | <nowiki>_</nowiki> | <nowiki>_</nowiki>+conto conto | <nowiki>gen=M|num=S</nowiki>| <nowiki>ogg_d</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-<nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+del di | <nowiki>gen=M|num=S</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | amelyből | amely | P | Pr | <nowiki>p=3rd|n=singular|case=elative</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki>+lavoro lavoro | <nowiki>gen=M|num=S</nowiki>prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 7 | összesen | összesen | R | Rx | <nowiki>_</nowiki> | 8 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+psicologico psicologico | A | <nowiki>gen=M|num=S</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 8 | hatot | hat | M | Mc | <nowiki>n=singular|case=accusative</nowiki>11 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>,</nowiki> | <nowiki>,</nowiki>PU PU | <nowiki>_</nowiki>con | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 9 | kényszerűségből | kényszerűség | N | Nc | <nowiki>n=singular|case=elative|proper=no</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki>+dei di | <nowiki>gen=M|num=P</nowiki>cong | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-10 szabadságon szabadság Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki>11 SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+10 prodigi prodigio | <nowiki>gen=M|num=P</nowiki>prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-11 töltött tölt Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki>16 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+11 di di | <nowiki>_</nowiki>10 mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-12 T | Tf | <nowiki>def=yes</nowiki> | 14 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+12 equilibrio equilibrio | <nowiki>gen=M|num=S</nowiki>11 prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 13 | parlamenti | parlamenti | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>14 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+13 | <nowiki>,</nowiki> | <nowiki>,</nowiki>PU PU | <nowiki>_</nowiki>11 con | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-14 | ellenzék | ellenzék | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 11 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+14 di di | <nowiki>_</nowiki>11 cong | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 15 | <nowiki>,</nowiki><nowiki>_</nowiki> | WPUNCT WPUNCT | <nowiki>_</nowiki>16 PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+15 diplomazia diplomazia | <nowiki>gen=F|num=S</nowiki>14 prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-16 megváltozott megváltozik Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+16 che che PR | <nowiki>gen=N|num=N</nowiki>17 | <nowiki>ogg_d</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-17 itthon itthon Rx | <nowiki>_</nowiki> | 16 | LOCY | <nowiki>_</nowiki> | <nowiki>_</nowiki>+17 fanno fare | V | | <nowiki>num=P|per=3|mod=I|tmp=P</nowiki>| <nowiki>mod_rel</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 18 | a | a | T | Tf | <nowiki>def=yes</nowiki> 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+18 per per | <nowiki>_</nowiki>17 mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 19 | hatalommegosztás | hatalommegosztás | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki>22 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+19 noi noi PQ | <nowiki>gen=N|num=P|per=1</nowiki>18 prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-20 <nowiki>1990-ben</nowiki> 1990 | M | Mc | <nowiki>n=singular|case=inessive</nowiki> 21 ATT | <nowiki>_</nowiki><nowiki>_</nowiki>+20 | <nowiki>.</nowiki> | <nowiki>.</nowiki>PU PU | <nowiki>_</nowiki>19 punc | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-| 21 | kialakított | kialakított | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 22 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-22 rendszere rendszer Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki>16 SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-23 | <nowiki>:</nowiki> | <nowiki>_</nowiki>WPUNCT WPUNCT | <nowiki>_</nowiki>16 PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-24 az az Tf | <nowiki>def=yes</nowiki>26 DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-25 Pd | <nowiki>p=3rd|n=singular|case=nominative</nowiki>26 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-26 héten hét Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 27 | audienciát | audiencia | N | Nc | <nowiki>n=singular|case=accusative|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 28 | tartó | tartó | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 29 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 29 | kormányfő | kormányfő | | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki>31 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 30 | gyakorlatilag | gyakorlati | A | Af | <nowiki>deg=positive|n=singular|case=essive</nowiki> | 31 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-31 kivonta kivon | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=yes</nowiki>16 | CP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 32 | magát | maga | P | Px | <nowiki>p=3rd|n=singular|case=accusative</nowiki> | 31 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-33 az az Tf | <nowiki>def=yes</nowiki>34 DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-34 Országgyűlés Országgyűlés Np | <nowiki>n=singular|case=nominative|proper=yes</nowiki> | 35 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 35 | ellenőrzése | ellenőrzés | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki>36 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-36 | alól | alól | S | St | <nowiki>_</nowiki> | 31 | PP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 37 | <nowiki>.</nowiki><nowiki>_</nowiki> SPUNCT | SPUNCT | <nowiki>_</nowiki>16 PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |+
  
-The first sentence of the CoNLL 2007 test data:+The first two sentences of the CoNLL 2007 test data:
  
-| 1 | Tf | <nowiki>def=yes</nowiki> | 2 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 1 | LONDRA londra SP | <nowiki>gen=N|num=N</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | bankokkal | bank | N | Nc | <nowiki>n=plural|case=instrumental|proper=no</nowiki> | 4 | INS | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>.</nowiki> | <nowiki>.</nowiki>PU PU | <nowiki>_</nowiki>punc | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 3 | kell | kell | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| |||||||||| 
-4 | egyezkedniük | egyezkedik | V | Vm | <nowiki>mood=infinitive|t=present|p=3rd|n=plural</nowiki> | 3 | INF | <nowiki>_</nowiki><nowiki>_</nowiki>+Gas gas | <nowiki>gen=M|num=N</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 5 | azoknak | az | P | Pd | <nowiki>p=3rd|n=plural|case=dative</nowiki> | 8 ATT | <nowiki>_</nowiki><nowiki>_</nowiki> | +dalla da | <nowiki>gen=F|num=S</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 8 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+statua statua | <nowiki>gen=F|num=S</nowiki>prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-mezőgazdasági mezőgazdasági Af <nowiki>deg=positive|n=singular|case=nominative</nowiki> ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> +Evacuata evacuare | <nowiki>gen=F|num=S|mod=P|tmp=R</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-termelőknek termelő Nc | <nowiki>n=plural|case=dative|proper=no</nowiki> | 4 | DAT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 5 | la lo | R | RD | <nowiki>gen=F|num=S</nowiki>det | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 9 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 3 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+Tate tate SP | <nowiki>gen=N|num=N</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 10 | akik | aki | P | Pr | <nowiki>p=3rd|n=plural|case=nominative</nowiki> | 21 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+Gallery gallery SP | <nowiki>gen=N|num=N</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 11 | egy | egy | T | Ti | <nowiki>def=no</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>.</nowiki> | <nowiki>.</nowiki>PU PU | <nowiki>_</nowiki>punc | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-| 12 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 19 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 13 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 15 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 14 | múlt | múlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 15 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 15 | héten | hét | | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki>16 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-16 megjelent megjelent Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>19 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-17 földművelésügyi földművelésügyi Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>18 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-18 minisztériumi minisztériumi Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> 19 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 19 | rendelet | rendelet | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki>20 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 20 | alapján | alap | N | Nc | <nowiki>n=singular|case=superessive|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 21 | kérik | kér | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=plural|def=yes</nowiki> | 5 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 22 | ősszel ősszel | R | Rx | <nowiki>_</nowiki> | 23 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 23 | lejáró | lejáró | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>27 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-24 <nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> | 27 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 25 | éven | év | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki>26 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-26 belüli belüli Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 27 | hiteleik | hitel | N | Nc | <nowiki>n=plural|case=nominative|proper=no|pperson=3rd|pnumber=plural</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 28 | átütemezését | átütemezés | | Nc | <nowiki>n=singular|case=accusative|proper=no|pperson=3rd|pnumber=singular</nowiki>21 OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-29 | <nowiki>.</nowiki> | <nowiki>_</nowiki>SPUNCT SPUNCT | <nowiki>_</nowiki>PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |+
  
 ==== Parsing ==== ==== Parsing ====
  
-SzTB is a mildly nonprojective treebank4032 of the 139,143 tokens of the CoNLL 2007 version are attached nonprojectively (2.9%).+Nonprojectivities in ISST-CoNLL are rare354 of the 76295 tokens of the CoNLL 2007 version are attached nonprojectively (0.46%).
  
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Hungarian:+The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Italian:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| Malt (Nilsson et al.) | 80.27 83.55 +| Nakagawa | 83.61 | 87.91 | 
-| Sagae | 79.53 | 83.51 | +| Malt (Nilsson et al.) | 84.40 87.77 
-| Nakagawa | 76.74 | 82.49 +| Sagae | 83.91 87.68 
-Titov et al. 77.94 82.18 |+Carreras 83.46 87.19 |
  
 The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].
  

[ Back to the navigation ] [ Back to the content ]