[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:treebanks:sl [2012/01/16 13:26]
zeman Size.
user:zeman:treebanks:sl [2012/01/16 13:38]
zeman Sample.
Line 32: Line 32:
     * Tomaž Erjavec, Peter Holozan, Vojko Gorjanc, Marko Stabej: [[http://nl.ijs.si/ME/V3/msd/html/msd.html#SECTION05600000000000000000|Morphosyntactic tagset specification for Slovene]]     * Tomaž Erjavec, Peter Holozan, Vojko Gorjanc, Marko Stabej: [[http://nl.ijs.si/ME/V3/msd/html/msd.html#SECTION05600000000000000000|Morphosyntactic tagset specification for Slovene]]
     * [[http://ufal.mff.cuni.cz/pdt/Corpora/PDT_1.0/Doc/anal.html|The analytical layer of the Prague Dependency Treebank]]     * [[http://ufal.mff.cuni.cz/pdt/Corpora/PDT_1.0/Doc/anal.html|The analytical layer of the Prague Dependency Treebank]]
 +    * Morphological and syntactic tags are also documented directly inside the TEI XML data file.
  
 ==== Domain ==== ==== Domain ====
Line 51: Line 52:
 ==== Sample ==== ==== Sample ====
  
-The first three sentences of the CoNLL 2006 training data:+The first sentence of the treebank in the TEI-compliant XML format:
  
-| 1 | Глава | _ | N | Nc | _ | 0 | ROOT | 0 | ROOT | +<code xml   <text id="Osl." lang="sl"> 
-| 2 | трета | _ | M | Mo | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | | mod | 1 | mod | +      <body> 
-| |||||||||| +        <div type="part" id="Osl.1"> 
-| 1 | НАРОДНО | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | | mod | | mod | +          <div type="chapter" id="Osl.1.2"> 
-| 2 | СЪБРАНИЕ | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | +            <p id="Osl.1.2.2"> 
-| |||||||||| +              <s id="Osl.1.2.2.1"> 
-| Народното | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod | 2 | mod | +                <w id="s1t1" afun="Pred" parallel="Co" dep="s1t8" lemma="biti" ana="Vcps-sma">Bil</w
-| 2 | събрание | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 3 | subj | 3 | subj | +                <w id="s1t2" afun="AuxV" dep="s1t1" lemma="biti" ana="Vcip3s--n">je</w> 
-| 3 | осъществява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 0 | ROOT | 0 | ROOT | +                <w id="s1t3" afun="Atr" parallel="Co" dep="s1t4" lemma="jasen" ana="Afpmsnn">jasen</w> 
-| 4 | законодателната | _ | A | Af | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 5 | mod | 5 | mod | +                <c id="s1t4" afun="Coord" dep="s1t7">,</c> 
-| 5 | власт | _ | N | Nc | _ | 3 | obj | 3 | obj | +                <w id="s1t5" afun="Atr" parallel="Co" dep="s1t4" lemma="mrzel" ana="Afpmsnn">mrzel</w
-| 6 | и | _ | C | Cp | _ | 3 | conj | 3 | conj | +                <w id="s1t6" afun="Atr" dep="s1t7" lemma="aprilski" ana="Aopmsn">aprilski</w> 
-| 7 | упражнява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 3 | conjarg | 3 | conjarg | +                <w id="s1t7" afun="Sb" dep="s1t1" lemma="dan" ana="Ncmsn">dan</w
-| 8 | парламентарен | _ | A | Am | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 9 | mod | 9 | mod | +                <w id="s1t8" afun="Coord" dep="root" lemma="in" ana="Ccs">in</w> 
-| 9 | контрол | _ | N | Nc | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 7 | obj | 7 | obj | +                <w id="s1t9" afun="Sb" dep="s1t11" lemma="ura" ana="Ncfpn">ure</w> 
-| 10 | . | _ | Punct | Punct | _ | 3 | punct | 3 | punct |+                <w id="s1t10" afun="AuxV" dep="s1t11" lemma="biti" ana="Vcip3p--n">so</w> 
 +                <w id="s1t11" afun="Pred" parallel="Co" dep="s1t8" lemma="biti" ana="Vmps-pfa">bile</w
 +                <w id="s1t12" afun="Obj" dep="s1t11" lemma="trinajst" ana="Mcnpnl">trinajst</w> 
 +                <c id="s1t13" afun="AuxK" dep="root">.</c
 +              </s></code>
  
-The first three sentences of the CoNLL 2006 test data:+The first sentence of the CoNLL 2006 training data:
  
-| 1 | Единственото An gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=mod mod | +| 1 | Bil biti Verb <nowiki>Verb-copula</nowiki> | <nowiki>VForm=participle|Tense=past|Number=singular|Gender=masculine|Voice=active</nowiki> | 8 | Pred | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
-решение | _ | Nc gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i ROOT ROOT +| 2 | je | biti | Verb | <nowiki>Verb-copula</nowiki> | <nowiki>VForm=indicative|Tense=present|Person=third|Number=singular|Negative=no</nowiki> AuxV <nowiki>_</nowiki> <nowiki>_</nowiki> | 
-| |||||||||| +jasen | jasen | Adjective | <nowiki>Adjective-qualificative</nowiki> | <nowiki>Degree=positive|Gender=masculine|Number=singular|Case=nominative|Definiteness=no</nowiki> | 4 | Atr <nowiki>_</nowiki> <nowiki>_</nowiki> | 
-Ерик | _ | Np gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=| 0 | ROOT ROOT +| 4 | <nowiki>,</nowiki><nowiki>,</nowiki> PUNC PUNC <nowiki>_</nowiki> | 7 | Coord | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-Франк Np gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=mod mod | +mrzel mrzel Adjective <nowiki>Adjective-qualificative</nowiki> <nowiki>Degree=positive|Gender=masculine|Number=singular|Case=nominative|Definiteness=no</nowiki> Atr <nowiki>_</nowiki> <nowiki>_</nowiki> | 
-Ръсел | _ | Hm gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=mod mod +aprilski | aprilski | Adjective | <nowiki>Adjective-ordinal</nowiki> | <nowiki>Degree=positive|Gender=masculine|Number=singular|Case=nominative</nowiki> | 7 | Atr <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| |||||||||| +| 7 | dan | dan | Noun | <nowiki>Noun-common</nowiki> | <nowiki>Gender=masculine|Number=singular|Case=nominative</nowiki> | 1 | Sb | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-| 1 | Пълен Am gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=mod mod +| 8 | in | in | Conjunction | <nowiki>Conjunction-coordinating</nowiki> | <nowiki>Formation=simple</nowiki> | 0 | Coord <nowiki>_</nowiki> <nowiki>_</nowiki> 
-мрак Nc gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=ROOT | 0 | ROOT +ure ura Noun <nowiki>Noun-common</nowiki> <nowiki>Gender=feminine|Number=plural|Case=nominative</nowiki> | 11 | Sb <nowiki>_</nowiki> <nowiki>_</nowiki> 
-и | _ | Cp | _ | conj conj +| 10 | so | biti | Verb <nowiki>Verb-copula</nowiki> | <nowiki>VForm=indicative|Tense=present|Person=third|Number=plural|Negative=no</nowiki> 11 AuxV <nowiki>_</nowiki> <nowiki>_</nowiki> | 
-пълна | _ | Af gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=mod mod +11 bile | biti | Verb | <nowiki>Verb-main</nowiki> | <nowiki>VForm=participle|Tense=past|Number=plural|Gender=feminine|Voice=active</nowiki> | 8 | Pred <nowiki>_</nowiki> <nowiki>_</nowiki>
-самота | _ | Nc | _ | conjarg conjarg +| 12 | trinajst | trinajst | Numeral <nowiki>Numeral-cardinal</nowiki> | <nowiki>Gender=neuter|Number=plural|Case=nominative|Form=letter</nowiki> 11 | Obj | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-| . | Punct Punct | _ | punct punct |+13 <nowiki>.</nowiki> <nowiki>.</nowiki> PUNC PUNC <nowiki>_</nowiki> AuxK <nowiki>_</nowiki> <nowiki>_</nowiki> | 
 + 
 +The first sentence of the CoNLL 2006 test data: 
 + 
 +| 1 | Na na Adposition <nowiki>Adposition-preposition</nowiki> <nowiki>Formation=simple|Case=locative</nowiki> | 5 | AuxP | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
 +| 2 | hrbtu | hrbet | Noun | <nowiki>Noun-common</nowiki> | <nowiki>Gender=masculine|Number=singular|Case=locative</nowiki> Adv | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +je biti Verb <nowiki>Verb-copula</nowiki> <nowiki>VForm=indicative|Tense=present|Person=third|Number=singular|Negative=no</nowiki> | 5 | AuxV | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 4 | lahko | lahko | Adverb | <nowiki>Adverb-general</nowiki> | <nowiki>Degree=positive</nowiki> | 5 | AuxY | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 5 | čutil | čutiti | Verb | <nowiki>Verb-main</nowiki> | <nowiki>VForm=participle|Tense=past|Number=singular|Gender=masculine|Voice=active</nowiki> | 0 | Pred | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +<nowiki>,</nowiki> <nowiki>,</nowiki> | PUNC | PUNC | <nowiki>_</nowiki> AuxX <nowiki>_</nowiki> <nowiki>_</nowiki> | 
 +da | da | Conjunction | <nowiki>Conjunction-subordinating</nowiki> | <nowiki>Formation=simple</nowiki> | 5 | AuxC | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +vsi ves | Pronoun | <nowiki>Pronoun-general</nowiki> | <nowiki>Gender=masculine|Number=plural|Case=nominative|Syntactic-Type=nominal</nowiki> | 9 | Sb | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
 +9 | upirajo | upirati | Verb | <nowiki>Verb-main</nowiki> | <nowiki>VForm=indicative|Tense=present|Person=third|Number=plural|Negative=no</nowiki> | 7 | Obj | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 10 | oči | oči | Noun | <nowiki>Noun-common</nowiki> | <nowiki>Gender=feminine|Number=plural|Case=accusative</nowiki> | 9 | Obj | <nowiki>_</nowiki> | <nowiki>_</nowiki>
 +| 11 | v | v | Adposition | <nowiki>Adposition-preposition</nowiki> | <nowiki>Formation=simple|Case=accusative</nowiki> AuxP <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +12 njegov njegov | Pronoun | <nowiki>Pronoun-possessive</nowiki> | <nowiki>Person=third|Gender=masculine|Number=singular|Case=accusative|Owner-Number=singular|Owner-Gender=masculine|Syntactic-Type=adjectival|Animate=no</nowiki> | 14 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
 +13 | modri | moder | Adjective | <nowiki>Adjective-qualificative</nowiki> | <nowiki>Degree=positive|Gender=masculine|Number=singular|Case=accusative|Definiteness=yes|Animate=no</nowiki> | 14 | Atr | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
 +14 kombinezon | kombinezon | Noun | <nowiki>Noun-common</nowiki> | <nowiki>Gender=masculine|Number=singular|Case=accusative|Animate=no</nowiki> | 11 | Adv | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +15 <nowiki>.</nowiki> <nowiki>.</nowiki> PUNC PUNC <nowiki>_</nowiki> AuxK <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
 ==== Parsing ==== ==== Parsing ====

[ Back to the navigation ] [ Back to the content ]