[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
user:zeman:treebanks:it [2012/01/03 15:33]
zeman Size.
user:zeman:treebanks:it [2012/01/03 15:43]
zeman Inside.
Line 42: Line 42:
 ==== Inside ==== ==== Inside ====
  
-The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.+The original ISST is a phrase-based treebank. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.
  
-Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.+Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.
  
-Personal names have been collapsed into one token, using underscore as the joining character (e.g. Torgyán_József).+Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. a_causa_di).
  
 ==== Sample ==== ==== Sample ====
Line 52: Line 52:
 The first sentence of the CoNLL 2007 training data: The first sentence of the CoNLL 2007 training data:
  
-| 1 | Az az Tf | <nowiki>def=yes</nowiki>DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 1 | Non non | <nowiki>_</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | elmúlt elmúlt Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 2 | ci ci PQ | <nowiki>gen=N|num=P|per=1</nowiki>clit | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 3 | nyolc nyolc Mc | <nowiki>n=singular|case=nominative</nowiki> 4 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 3 | rendiamo rendere | <nowiki>num=P|per=1|mod=I|tmp=P</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | hónapban | hónap | N | Nc | <nowiki>n=singular|case=inessive|proper=no</nowiki>16 INE | <nowiki>_</nowiki> | <nowiki>_</nowiki>+conto conto | <nowiki>gen=M|num=S</nowiki>| <nowiki>ogg_d</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-<nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+del di | <nowiki>gen=M|num=S</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | amelyből | amely | P | Pr | <nowiki>p=3rd|n=singular|case=elative</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki>+lavoro lavoro | <nowiki>gen=M|num=S</nowiki>prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 7 | összesen | összesen | R | Rx | <nowiki>_</nowiki> | 8 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+psicologico psicologico | A | <nowiki>gen=M|num=S</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 8 | hatot | hat | M | Mc | <nowiki>n=singular|case=accusative</nowiki>11 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>,</nowiki> | <nowiki>,</nowiki>PU PU | <nowiki>_</nowiki>con | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 9 | kényszerűségből | kényszerűség | N | Nc | <nowiki>n=singular|case=elative|proper=no</nowiki> | 11 | ELA | <nowiki>_</nowiki> | <nowiki>_</nowiki>+dei di | <nowiki>gen=M|num=P</nowiki>cong | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-10 szabadságon szabadság Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki>11 SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+10 prodigi prodigio | <nowiki>gen=M|num=P</nowiki>prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-11 töltött tölt Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki>16 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+11 di di | <nowiki>_</nowiki>10 mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-12 T | Tf | <nowiki>def=yes</nowiki> | 14 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+12 equilibrio equilibrio | <nowiki>gen=M|num=S</nowiki>11 prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 13 | parlamenti | parlamenti | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>14 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+13 | <nowiki>,</nowiki> | <nowiki>,</nowiki>PU PU | <nowiki>_</nowiki>11 con | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-14 | ellenzék | ellenzék | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 11 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+14 di di | <nowiki>_</nowiki>11 cong | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 15 | <nowiki>,</nowiki><nowiki>_</nowiki> | WPUNCT WPUNCT | <nowiki>_</nowiki>16 PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+15 diplomazia diplomazia | <nowiki>gen=F|num=S</nowiki>14 prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-16 megváltozott megváltozik Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+16 che che PR | <nowiki>gen=N|num=N</nowiki>17 | <nowiki>ogg_d</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-17 itthon itthon Rx | <nowiki>_</nowiki> | 16 | LOCY | <nowiki>_</nowiki> | <nowiki>_</nowiki>+17 fanno fare | V | | <nowiki>num=P|per=3|mod=I|tmp=P</nowiki>| <nowiki>mod_rel</nowiki> | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 18 | a | a | T | Tf | <nowiki>def=yes</nowiki> 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+18 per per | <nowiki>_</nowiki>17 mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 19 | hatalommegosztás | hatalommegosztás | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki>22 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+19 noi noi PQ | <nowiki>gen=N|num=P|per=1</nowiki>18 prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-20 <nowiki>1990-ben</nowiki> 1990 | M | Mc | <nowiki>n=singular|case=inessive</nowiki> 21 ATT | <nowiki>_</nowiki><nowiki>_</nowiki>+20 | <nowiki>.</nowiki> | <nowiki>.</nowiki>PU PU | <nowiki>_</nowiki>19 punc | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-| 21 | kialakított | kialakított | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 22 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-22 rendszere rendszer Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki>16 SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-23 | <nowiki>:</nowiki> | <nowiki>_</nowiki>WPUNCT WPUNCT | <nowiki>_</nowiki>16 PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-24 az az Tf | <nowiki>def=yes</nowiki>26 DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-25 Pd | <nowiki>p=3rd|n=singular|case=nominative</nowiki>26 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-26 héten hét Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 27 | audienciát | audiencia | N | Nc | <nowiki>n=singular|case=accusative|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 28 | tartó | tartó | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 29 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 29 | kormányfő | kormányfő | | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki>31 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 30 | gyakorlatilag | gyakorlati | A | Af | <nowiki>deg=positive|n=singular|case=essive</nowiki> | 31 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-31 kivonta kivon | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=yes</nowiki>16 | CP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 32 | magát | maga | P | Px | <nowiki>p=3rd|n=singular|case=accusative</nowiki> | 31 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-33 az az Tf | <nowiki>def=yes</nowiki>34 DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-34 Országgyűlés Országgyűlés Np | <nowiki>n=singular|case=nominative|proper=yes</nowiki> | 35 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 35 | ellenőrzése | ellenőrzés | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki>36 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-36 | alól | alól | S | St | <nowiki>_</nowiki> | 31 | PP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 37 | <nowiki>.</nowiki><nowiki>_</nowiki> SPUNCT | SPUNCT | <nowiki>_</nowiki>16 PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |+
  
-The first sentence of the CoNLL 2007 test data:+The first two sentences of the CoNLL 2007 test data:
  
-| 1 | Tf | <nowiki>def=yes</nowiki> | 2 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 1 | LONDRA londra SP | <nowiki>gen=N|num=N</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 2 | bankokkal | bank | N | Nc | <nowiki>n=plural|case=instrumental|proper=no</nowiki> | 4 | INS | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>.</nowiki> | <nowiki>.</nowiki>PU PU | <nowiki>_</nowiki>punc | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 3 | kell | kell | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| |||||||||| 
-4 | egyezkedniük | egyezkedik | V | Vm | <nowiki>mood=infinitive|t=present|p=3rd|n=plural</nowiki> | 3 | INF | <nowiki>_</nowiki><nowiki>_</nowiki>+Gas gas | <nowiki>gen=M|num=N</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 5 | azoknak | az | P | Pd | <nowiki>p=3rd|n=plural|case=dative</nowiki> | 8 ATT | <nowiki>_</nowiki><nowiki>_</nowiki> | +dalla da | <nowiki>gen=F|num=S</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 8 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+statua statua | <nowiki>gen=F|num=S</nowiki>prep | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-mezőgazdasági mezőgazdasági Af <nowiki>deg=positive|n=singular|case=nominative</nowiki> ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> +Evacuata evacuare | <nowiki>gen=F|num=S|mod=P|tmp=R</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-termelőknek termelő Nc | <nowiki>n=plural|case=dative|proper=no</nowiki> | 4 | DAT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| 5 | la lo | R | RD | <nowiki>gen=F|num=S</nowiki>det | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 9 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 3 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+Tate tate SP | <nowiki>gen=N|num=N</nowiki>mod | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 10 | akik | aki | P | Pr | <nowiki>p=3rd|n=plural|case=nominative</nowiki> | 21 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+Gallery gallery SP | <nowiki>gen=N|num=N</nowiki>ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 11 | egy | egy | T | Ti | <nowiki>def=no</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+| <nowiki>.</nowiki> | <nowiki>.</nowiki>PU PU | <nowiki>_</nowiki>punc | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-| 12 | <nowiki>,</nowiki> | <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki> | 19 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 13 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 15 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 14 | múlt | múlt | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 15 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 15 | héten | hét | | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki>16 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-16 megjelent megjelent Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>19 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-17 földművelésügyi földművelésügyi Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>18 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-18 minisztériumi minisztériumi Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> 19 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 19 | rendelet | rendelet | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki>20 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 20 | alapján | alap | N | Nc | <nowiki>n=singular|case=superessive|proper=no|pperson=3rd|pnumber=singular</nowiki> | 21 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 21 | kérik | kér | V | Vm | <nowiki>mood=indicative|t=present|p=3rd|n=plural|def=yes</nowiki> | 5 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 22 | ősszel ősszel | R | Rx | <nowiki>_</nowiki> | 23 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 23 | lejáró | lejáró | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>27 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-24 <nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> | 27 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 25 | éven | év | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki>26 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-26 belüli belüli Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 27 | hiteleik | hitel | N | Nc | <nowiki>n=plural|case=nominative|proper=no|pperson=3rd|pnumber=plural</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-| 28 | átütemezését | átütemezés | | Nc | <nowiki>n=singular|case=accusative|proper=no|pperson=3rd|pnumber=singular</nowiki>21 OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>+
-29 | <nowiki>.</nowiki> | <nowiki>_</nowiki>SPUNCT SPUNCT | <nowiki>_</nowiki>PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |+
  
 ==== Parsing ==== ==== Parsing ====

[ Back to the navigation ] [ Back to the content ]