[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:treebanks:hu [2011/12/13 13:04]
zeman Size.
user:zeman:treebanks:hu [2011/12/13 22:52] (current)
zeman Personal names.
Line 54: Line 54:
 ==== Inside ==== ==== Inside ====
  
-Both versions (CoNLL 2007 and BDT-IIare in the CoNLL 2006/2007 format.+The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.
  
-The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags.+Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.
  
-Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. Espainia_Poliziak, iduri_zait).+Personal names have been collapsed into one token, using underscore as the joining character (e.g. Torgyán_József).
  
 ==== Sample ==== ==== Sample ====
Line 64: Line 64:
 The first sentence of the CoNLL 2007 training data: The first sentence of the CoNLL 2007 training data:
  
-| 1 | espainiako_poliziak Espainia_Poliziak IZE IZE_LIB PLU-<nowiki>|</nowiki>ENTI_LOC | 4 | ncsubj | _ | _ | +| 1 | Az az Tf | <nowiki>def=yes</nowiki> | 4 | DET <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 2 | hiru hiru DET DET_DZH NMGP detmod | _ | _ | +| 2 | elmúlt elmúlt Af <nowiki>deg=positive|n=singular|case=nominative</nowiki> 4 | ATT | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 3 | gazte gazte IZE IZE_ARR ABS<nowiki>|</nowiki>MG | 4 | ncobj | _ | _ | +| 3 | nyolc nyolc Mc | <nowiki>n=singular|case=nominative</nowiki> | 4 | ATT <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 4 | atxilotu atxilotu ADI ADI_SIN PART<nowiki>|</nowiki>BURU lot | _ | _ | +| 4 | hónapban hónap Nc | <nowiki>n=singular|case=inessive|proper=no</nowiki>16 INE <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 5 | ditu *edun ADL ADL A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK auxmod | _ | _ | +| 5 | <nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT <nowiki>_</nowiki> <nowiki>_</nowiki>
-atarrabian Atarrabia IZE IZE_LIB PLU-<nowiki>|</nowiki>INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC ncmod | _ | _ | +| 6 | amelyből | amely | P | Pr | <nowiki>p=3rd|n=singular|case=elative</nowiki>11 ELA <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| , | PUNC PUNC_KOMA | _ | PUNC | _ | _ | +összesen összesen Rx | <nowiki>_</nowiki> | 8 | ADV <nowiki>_</nowiki> <nowiki>_</nowiki>
-eta eta LOT LOT_JNT | 0 | ROOT | _ | _ | +| 8 | hatot | hat | M | Mc <nowiki>n=singular|case=accusative</nowiki> | 11 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-madrilera Madril IZE IZE_LIB PLU-<nowiki>|</nowiki>ALA<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC 10 ncmod | _ | _ | +| 9 | kényszerűségből | kényszerűség | N | Nc | <nowiki>n=singular|case=elative|proper=no</nowiki>11 ELA <nowiki>_</nowiki> <nowiki>_</nowiki> 
-10 eraman eraman ADI ADI_SIN PART<nowiki>|</nowiki>BURU lot | _ | _ | +10 szabadságon szabadság | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 11 | SUP | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-11 ditu *edun ADL ADL A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>NK_HARK 10 auxmod | _ | _ | +| 11 | töltött | tölt | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki> | 16 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-12 | . | PUNC PUNC_PUNC | _ | 11 PUNC | _ | _ |+| 12 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 14 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 13 | parlamenti | parlamenti | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 14 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 14 | ellenzék | ellenzék | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 11 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 15 | <nowiki>,</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> 16 PUNCT <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +16 megváltozott megváltozik Vm <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +17 itthon itthon Rx <nowiki>_</nowiki> | 16 | LOCY | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 18 | a | a | T | Tf | <nowiki>def=yes</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 19 | hatalommegosztás | hatalommegosztás | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 20 | <nowiki>1990-ben</nowiki> | 1990 | M | Mc | <nowiki>n=singular|case=inessive</nowiki> | 21 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +| 21 | kialakított | kialakított | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 22 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
 +22 rendszere | rendszer | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 16 | SUBJ | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +23 <nowiki>:</nowiki> <nowiki>_</nowiki> WPUNCT WPUNCT | <nowiki>_</nowiki> | 16 | PUNCT | <nowiki>_</nowiki><nowiki>_</nowiki> | 
 +24 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 26 | DET | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +25 Pd | <nowiki>p=3rd|n=singular|case=nominative</nowiki> | 26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
 +26 héten | hét | N | Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +27 audienciát audiencia | N | Nc | <nowiki>n=singular|case=accusative|proper=no</nowiki> | 28 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 28 | tartó | tartó | A | Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 29 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 29 | kormányfő | kormányfő | N | Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki> | 31 | SUBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 30 | gyakorlatilag | gyakorlati | A | Af | <nowiki>deg=positive|n=singular|case=essive</nowiki> | 31 | ADV | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 31 | kivonta | kivon | V | Vm | <nowiki>mood=indicative|t=past|p=3rd|n=singular|def=yes</nowiki> | 16 | CP | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 32 | magát | maga | P | Px | <nowiki>p=3rd|n=singular|case=accusative</nowiki> | 31 | OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 33 | az | az | T | Tf | <nowiki>def=yes</nowiki> | 34 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 34 | Országgyűlés | Országgyűlés | N | Np | <nowiki>n=singular|case=nominative|proper=yes</nowiki> | 35 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 35 | ellenőrzése | ellenőrzés | N | Nc | <nowiki>n=singular|case=nominative|proper=no|pperson=3rd|pnumber=singular</nowiki> | 36 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 36 | alól | alól | S | St | <nowiki>_</nowiki> | 31 | PP | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 37 | <nowiki>.</nowiki> <nowiki>_</nowiki> SPUNCT SPUNCT | <nowiki>_</nowiki> 16 PUNCT <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
 The first sentence of the CoNLL 2007 test data: The first sentence of the CoNLL 2007 test data:
  
-| 1 | epaileek epaile IZE IZE_ARR BIZ+<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM +| 1 | Tf | <nowiki>def=yes</nowiki> | 2 | DET | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 2 | diote esan ADT ADT PNT<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NK_HAIEK-K +| 2 | bankokkal bank Nc | <nowiki>n=plural|case=instrumental|proper=no</nowiki> | 4 | INS | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 3 | eaeko EAE IZE IZE_LIB SIG<nowiki>|</nowiki>GEL<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC +| 3 | kell kell Vm | <nowiki>mood=indicative|t=present|p=3rd|n=singular|def=no</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 4 | parlamentarioek parlamentario ADJ ADJ_ARR IZAUR-<nowiki>|</nowiki>ERG<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM +| 4 | egyezkedniük egyezkedik Vm | <nowiki>mood=infinitive|t=present|p=3rd|n=plural</nowiki> | 3 | INF | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 5 | eaetik_kanpo EAE SIG SIG- DEK<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>ABL_kanpo_ABS<nowiki>|</nowiki>ENTI_LOC<nowiki>|</nowiki>POS +| 5 | azoknak az Pd | <nowiki>p=3rd|n=plural|case=dative</nowiki> | 8 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 6 | eginiko egin ADI ADI_SIN PART<nowiki>|</nowiki>GEL | +| 6 | Tf | <nowiki>def=yes</nowiki>DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-delituak | delitu | IZE | IZE_ARR | BIZ-<nowiki>|</nowiki>ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM +mezőgazdasági mezőgazdasági Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>ATT | <nowiki>_</nowiki> <nowiki>_</nowiki>
-ikertzea ikertu ADI ADI_SIN ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS | +termelőknek termelő Nc | <nowiki>n=plural|case=dative|proper=no</nowiki>DAT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-eta | eta | LOT | LOT_JNT | - | +| <nowiki>,</nowiki> <nowiki>_</nowiki> | WPUNCT | WPUNCT | <nowiki>_</nowiki>PUNCT | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 10 | epaitzea | epaitu | ADI | ADI_SIN | ADIZE<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>ABS +10 akik aki Pr | <nowiki>p=3rd|n=plural|case=nominative</nowiki>21 SUBJ | <nowiki>_</nowiki> <nowiki>_</nowiki>
-11 auzitegi_gorenari auzitegi_gora ADJ ADJ_IZO DEK<nowiki>|</nowiki>GEN<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>DEK<nowiki>|</nowiki>DAT<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM<nowiki>|</nowiki>ENTI_LOC +11 egy egy Ti | <nowiki>def=no</nowiki> | 19 | DET | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-12 dagokiola | egon | ADT | ADT | PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HURA<nowiki>|</nowiki>NI_HARI | +12 | <nowiki>,</nowiki> <nowiki>_</nowiki>WPUNCT WPUNCT | <nowiki>_</nowiki>19 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-13 , | , | PUNC | PUNC_KOMA | _ | +13 Tf | <nowiki>def=yes</nowiki> | 15 | DET | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 14 | baina | baina | LOT | LOT_JNT | AURK | +14 múlt múlt Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>15 ATT | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 15 | atzerrian | atzerri | IZE | IZE_ARR | INE<nowiki>|</nowiki>NUMS<nowiki>|</nowiki>MUGM +15 héten hét Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki> | 16 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-16 izaniko izan ADI ADI_SIN PART<nowiki>|</nowiki>GEL | +16 megjelent megjelent Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>19 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-17 kontaktu | kontaktu | IZE | IZE_ARR | _ | +17 földművelésügyi földművelésügyi Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>18 ATT | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 18 | horiek | horiek | DET | DET_ERKARR | ABS<nowiki>|</nowiki>NUMP<nowiki>|</nowiki>MUGM +18 minisztériumi minisztériumi Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>19 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-19 ezin_direla ezin_izan ADI ADI_ADK PNT<nowiki>|</nowiki>KONPL<nowiki>|</nowiki>A1<nowiki>|</nowiki>NR_HAIEK<nowiki>|</nowiki>MWCorrect +19 rendelet rendelet Nc | <nowiki>n=singular|case=nominative|proper=no</nowiki>20 ATT | <nowiki>_</nowiki> <nowiki>_</nowiki>
-20 | delitutzat | delitu | IZE | IZE_ARR BIZ-<nowiki>|</nowiki>PRO<nowiki>|</nowiki>MG | +20 alapján alap Nc | <nowiki>n=singular|case=superessive|proper=no|pperson=3rd|pnumber=singular</nowiki>21 SUP | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 21 | hartu | hartu | ADI | ADI_SIN | PART | +21 kérik kér Vm | <nowiki>mood=indicative|t=present|p=3rd|n=plural|def=yes</nowiki>ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 22 | . | . | PUNC | PUNC_PUNC | _ | +22 ősszel ősszel Rx | <nowiki>_</nowiki>23 ADV | <nowiki>_</nowiki> <nowiki>_</nowiki>
- +23 lejáró lejáró Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki> | 27 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-The first sentence of the BDT-II training data: +24 | <nowiki>,</nowiki> | <nowiki>_</nowiki>WPUNCT WPUNCT | <nowiki>_</nowiki>27 | PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
- +25 éven év Nc | <nowiki>n=singular|case=superessive|proper=no</nowiki>26 | ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 1 | Estatu_Batuetako_DEAko | Estatu_Batuak_DEA IZE LIB PLU:+<nowiki>|</nowiki>IZAUR:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>MW:B<nowiki>|</nowiki>ENT:Erakundea | 2 | ncmod | _ | _ +26 belüli belüli Af | <nowiki>deg=positive|n=singular|case=nominative</nowiki>27 ATT | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-buru buru IZE ARR _ | 4 | ncsubj | _ | _ | +27 hiteleik hitel Nc | <nowiki>n=plural|case=nominative|proper=no|pperson=3rd|pnumber=plural</nowiki>28 ATT | <nowiki>_</nowiki> <nowiki>_</nowiki>
-| 3 | ohiak | ohi | ADJ | ARR | IZAUR:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ +28 átütemezését átütemezés Nc | <nowiki>n=singular|case=accusative|proper=no|pperson=3rd|pnumber=singular</nowiki>21 OBJ | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-aztertuko aztertu ADI SIN ADM:PART<nowiki>|</nowiki>ASP:GERO 0 | ROOT | _ | _ | +29 | <nowiki>.</nowiki> <nowiki>_</nowiki>SPUNCT SPUNCT | <nowiki>_</nowiki>PUNCT | <nowiki>_</nowiki> | <nowiki>_</nowiki> |
-| 5 | du | *edun | ADL ADL MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 4 | auxmod | _ | _ +
-RUCen RUC IZE IZB MTKAT:SIG<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Erakundea | 7 | ncmod | _ | _ +
-erreforma erreforma IZE ARR KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M ncobj _ | _ | +
-| 8 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 7 | PUNC | _ | _ | +
- +
-The first sentence of the BDT-II development data: +
- +
-| 1 | Irakaskuntzan | irakaskuntza | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ +
-jardun jardun ADI SIN ADM:PART<nowiki>|</nowiki>ASP:BURU ROOT _ | _ | +
-| 3 | zuen | *edun | ADL | ADL | MDN:B1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 2 | auxmod | _ | _ +
-Miel Miel IZE IZB PLU:-<nowiki>|</nowiki>ENT:Pertsona entios _ | _ | +
-| 5 | Anjel_Elustondok | Anjel_Elustondo | IZE | IZB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Pertsona | 2 | ncsubj | _ | _ +
-6 | 1980 | 1980 | IZE | ZKI | _ | 7 | ncmod | _ | _ | +
-| 7 urtetik urte IZE ARR BIZ:-<nowiki>|</nowiki>KAS:ABL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:ncmod | _ | _ | +
-| 8 | 1992ra | 1992 | IZE | ZKI | KAS:ALA<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ +
-PUNT_MARKA | PUNT_KOMA | _ | 8 | PUNC | _ | _ | +
-| 10 | hauetatik | hauek | DET ERKARR KAS:ABL<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M 16 ncmod _ | _ | +
-| 11 | hamar | hamar | DET | DZH | NMG:P | 12 | detmod | _ | _ | +
-| 12 | urtez | urte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INS<nowiki>|</nowiki>MUG:MG | 16 | lot | _ | _ +
-13 Azpeitiko Azpeitia IZE LIB PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia 14 | ncmod | _ | _ | +
-| 14 | ikastolan | ikastola | IZE ARR BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | _ | _ +
-15 irakasle irakasle IZE ARR KAS:ABS<nowiki>|</nowiki>MUG:MG 16 ncpred _ | _ | +
-| 16 | eta | eta | LOT | JNT | ERL:EMEN | 8 | aponcmod | _ | _ | +
-| 17 | beste | beste | DET | DZG | _ | 18 | detmod | _ | _ | +
-| 18 | biak | bi | IZE | ZKI | KAS:ABS<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M | 16 | lot | _ | _ +
-19 PUNT_MARKA PUNT_KOMA _ | 18 | PUNC | _ | _ | +
-| 20 | Arabako | Araba | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:GEL<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia | 21 | ncmod | _ | _ +
-21 | ikastolen | ikastola | IZE | ARR BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:P<nowiki>|</nowiki>MUG:M 22 ncmod _ | _ | +
-| 22 | elkartean | elkarte | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 16 | ncmod | | _ | +
-| 23 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 22 | PUNC | _ | _ | +
- +
-The first sentence of the BDT-II test data: +
- +
-| 1 | Hegoaldean | hegoalde | IZE | ARR | KAS:INE<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 2 | ncmod | _ | _ +
-iduri_zait iduri_izan ADI ADK ASP:PNT<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORI:NIRI<nowiki>|</nowiki>MW:B | 0 | ROOT | | _ | +
-| 3 | euskararen | euskara | IZE | ARR | BIZ:-<nowiki>|</nowiki>KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | ncmod | _ | _ +
-mundu mundu IZE ARR BIZ:- | 7 | ncsubj | _ | _ | +
-| 5 | hau | hau | DET | ERKARR | KAS:ABS<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M | 4 | detmod | _ | _ | +
-| 6 | adi-adi | adi-adi | ADB | ARR | _ | 7 | ncmod | _ | _ | +
-| 7 | dagola | egon ADT ADT ASP:PNT<nowiki>|</nowiki>ERL:KONPL<nowiki>|</nowiki>MDN:A3<nowiki>|</nowiki>NOR:HURA | 2 | ccomp_obj | _ | _ +
-PUNT_MARKA PUNT_KOMA _ | 7 | PUNC | _ | _ | +
-| 9 | Euskaltzaindiak | Euskaltzaindia | IZE | LIB | PLU:-<nowiki>|</nowiki>KAS:ERG<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M<nowiki>|</nowiki>ENT:Tokia 11 | ncsubj | _ | _ | +
-| 10 | zer | zer | DET NOLGAL NMG:MG<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 11 | ncobj | _ | _ +
-11 erranen erran ADI SIN ADM:PART<nowiki>|</nowiki>ASP:GERO 13 menos _ | _ | +
-| 12 | duen | *edun | ADL | ADL | ERL:ZHG<nowiki>|</nowiki>MDN:A1<nowiki>|</nowiki>NOR:HURA<nowiki>|</nowiki>NORK:HARK | 11 | auxmod | _ | _ +
-13 | zain | zain | ADB | ARR | _ | 7 | cmod | _ | _ | +
-| 14 | , | , | PUNT_MARKA | PUNT_KOMA | _ | 13 | PUNC | _ | _ | +
-| 15 | haren | hura | DET | ERKARR KAS:GEN<nowiki>|</nowiki>NUM:S<nowiki>|</nowiki>MUG:M 16 ncmod _ | _ | +
-| 16 | arauen | arau | IZE | ARR | KAS:ABS<nowiki>|</nowiki>MUG:MG 18 ncmod _ | _ | +
-| 17 | berehala | berehala | ADB | ARR | _ | 18 | ncmod | _ | _ | +
-| 18 | betetzeko | bete | ADI | SIN | ADM:ADIZE<nowiki>|</nowiki>ERL:HELB<nowiki>|</nowiki>KAS:ABS<nowiki>|</nowiki>MUG:MG | 7 | xmod | _ | _ | +
-| 19 | . | . | PUNT_MARKA | PUNT_PUNT | _ | 18 | PUNC | _ | _ |+
  
 ==== Parsing ==== ==== Parsing ====
  
-BDT is a mildly nonprojective treebank. 1925 of the 151,604 tokens of combined BDT-II training and test sets are attached nonprojectively (1.27%).+SzTB is a mildly nonprojective treebank. 4032 of the 139,143 tokens of the CoNLL 2007 version are attached nonprojectively (2.9%).
  
-The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Greek:+The results of the CoNLL 2007 shared task are [[http://nextens.uvt.nl/depparse-wiki/AllScores|available online]]. They have been published in [[http://aclweb.org/anthology-new/D/D07/D07-1096.pdf|(Nivre et al., 2007)]]. The evaluation procedure was changed to include punctuation tokens. These are the best results for Hungarian:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| Malt (Nilsson et al.) | 76.94 82.84 | +| Malt (Nilsson et al.) | 80.27 83.55 
-| Titov et al. | 75.49 | 81.93 +| Sagae | 79.53 83.51 
-| Sagae | 74.64 81.19 | +| Nakagawa | 76.74 | 82.49 
-| Carreras | 75.75 | 81.11 +Titov et al. | 77.94 82.18 |
-| Nakagawa | 72.56 | 81.04 | +
-| Malt (J. Hall et al.) | 74.99 80.61 +
-Johansson et al. | 75.08 80.43 |+
  
 The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]]. The two Malt parser results of 2007 (single malt and blended) are described in [[http://aclweb.org/anthology-new/D/D07/D07-1097.pdf|(Hall et al., 2007)]] and the details about the parser configuration are described [[http://w3.msi.vxu.se/users/jha/conll07/|here]].
  
-Parsing results on BDT-II have been published in Kepa Bengoetxea, Koldo Gojenola: [[http://aclweb.org/anthology-new/W/W10/W10-1404.pdf|Application of Different Techniques to Dependency Parsing of Basque]]. In: Proceedings of the First Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL 2010), NAACL Workshop, Los Angeles, California, USA, 2010. They report only Labeled Attachment Score (LAS) and their best system achieved LAS = 78.98%. 

[ Back to the navigation ] [ Back to the content ]