[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
user:zeman:treebanks:nl [2012/01/10 10:27]
zeman vytvořeno
user:zeman:treebanks:nl [2012/01/11 11:32] (current)
zeman Typo.
Line 7: Line 7:
   * The Alpino Treebank 1.0 (2002) in an XML-based format   * The Alpino Treebank 1.0 (2002) in an XML-based format
   * CoNLL 2006   * CoNLL 2006
 +  * [[http://odur.let.rug.nl/~vannoord/Lassy/|Lassy]] (2007 and later) builds upon Alpino, is much larger but not under the same license
  
 ==== Obtaining and License ==== ==== Obtaining and License ====
  
-DDT is available under the [[http://www.gnu.org/licenses/gpl-2.0.html|GNU General Public License version 2]]. Download the original distribution (DTAG + TIGER-XML formats) from http://www.buch-kromann.dk/matthias/treebank/. Download the CoNLL 2006 conversion from http://ilk.uvt.nl/conll/free_data.html. The license in short:+Alpino is available under the [[http://www.gnu.org/licenses/gpl-2.0.html|GNU General Public License]]. Download the original distribution (DTAG + TIGER-XML formats) from http://odur.let.rug.nl/~vannoord/ftp/AlpinoCDROM/. Download the CoNLL 2006 conversion from http://ilk.uvt.nl/conll/free_data.html. The license in short:
  
   * any usage, commercial or not   * any usage, commercial or not
Line 16: Line 17:
   * citation in publications not required (but it is common decency)   * citation in publications not required (but it is common decency)
  
-DDT was created by members of the [[http://www.cbs.dk/en/Research/Departments-Centres/Institutter/ISV|Department of International Language Studies and Computational Linguistics]], Copenhagen Business School (Handelshøjskolen i København), Dalgas Have 15, DK-2000 Frederiksberg, Denmark. The underlying [[http://korpus.dsl.dk/e-resurser/vilkaar.php?lang=|PAROLE]] corpus (morphologically annotated) was created by the [[http://www.dsl.dk/|Society for Danish Language and Literature]] (Det Danske Sprog- og Litteraturselskab), Christians Brygge 1DK-1219 København KDenmark.+Alpino was created by members of the [[http://www.rug.nl/let/onderwijs/afdelingen/informatiekunde/index|Alfa-informatica]], Faculty of Arts (Faculteit der Letteren), University of Groningen (Rijksuniversiteit Groningen), Oude Kijk in 't Jatstraat 26NL-9712 EK GroningenThe Netherlands.
  
 ==== References ==== ==== References ====
  
   * Website   * Website
-    * http://www.buch-kromann.dk/matthias/treebank/ (the old and no longer accessible website from <nowiki>http://www.id.cbs.dk/~mtk/</nowiki> has been moved here)+    * http://odur.let.rug.nl/~vannoord/trees/ (Alpino) 
 +    * http://odur.let.rug.nl/~vannoord/Lassy(Lassy) 
 +    * http://ilk.uvt.nl/conll/free_data.html (CoNLL 2006)
   * Data   * Data
     * //no separate citation//     * //no separate citation//
   * Principal publications   * Principal publications
-    * Matthias Trautner Kromann: [[http://www.buch-kromann.dk/matthias/files/030730-tlt-norfa.pdf|The Danish Dependency Treebank and the DTAG Treebank Tool]]. In: Proceedings of Treebanks and Linguistic TheoriesVäxjöSweden2003.+    * Robert Malouf, Gertjan van Noord: [[http://www-tsujii.is.s.u-tokyo.ac.jp/bsa/papers/malouf.pdf|Wide Coverage Parsing with Stochastic Attribute Value Grammars]]. In: Proceedings of Beyond Shallow Analyses – Formalisms and Statistical Modeling for Deep Analyses WorkshopIJCNLP, Sanya, Hainan, China, 2004. 
 +    * Leonoor van der Beek, Gosse Bouma, Jan Daciuk, Tanja Gaustad, Robert Malouf, Gertjan van Noord, Robbert Prins, Begoña Villada: [[http://odur.let.rug.nl/~vannoord/trees/Papers/report_ch5.pdf|Algorithms for Linguistic Processing NWO PIONIER Progress Report]]. GroningenNetherlands2002.
   * Documentation   * Documentation
-    * //see the left-hand-side links at the treebank website, eg.:// +    * The files {{:user:zeman:treebanks:nl-tagset.txt|doc/tagset.txt}}, ''doc/syn_prot.pdf'' and ''doc/diffs.pdf'' in the CoNLL 2006 distribution.
-    * [[http://www.buch-kromann.dk/matthias/treebank/theory.html|Dependency theory and list of dependency relation labels]] +
-    * Britt Keson: [[http://www.buch-kromann.dk/matthias/treebank/PAROLE-manual.pdf|Vejledning til det danske morfosyntaktisk taggede PAROLE-korpus]] (morphosyntactic tags)Det Danske Sprog- og Litteraturselskab (DSL)+
  
 ==== Domain ==== ==== Domain ====
  
-full cdbl (newspaper) part of the Eindhoven corpus +Newspaper. The Alpino Treebank consists of “the full cdbl (newspaper) part of the Eindhoven corpus.”
- +
-Unknown (the underlying PAROLE corpus “consists of quotations of 150-250 words from a wide range of randomly selected linguistically representative Danish texts from 1983-1992.”)+
  
 ==== Size ==== ==== Size ====
  
-The CoNLL 2006 version contains 100,238 tokens in 5512 sentences, yielding 18.19 tokens per sentence on average (CoNLL 2006 data split: 94386 tokens / 5190 sentences training, 5852 tokens / 322 sentences test).+The CoNLL 2006 version contains 200,654 tokens in 13735 sentences, yielding 14.61 tokens per sentence on average (CoNLL 2006 data split: 195,069 tokens / 13349 sentences training, 5585 tokens / 386 sentences test).
  
 ==== Inside ==== ==== Inside ====
  
-The original morphosyntactic tags have been converted to fit into the three columns (CPOSPOS and FEATof the CoNLL formatThere //should// be 1-1 mapping between the [[http://www.buch-kromann.dk/matthias/treebank/PAROLE-manual.pdf|DDT positional tags]] and the CoNLL 2006 annotationUse [[http://quest.ms.mff.cuni.cz/cgi-bin/interset/index.pl?tagset=da::conll|DZ Interset]] to inspect the CoNLL tagset.+In the CoNLL version, the original POS tags from the Alpino Treebank were replaced by POS tags from the Memory-based part-of-speech tagger using the WOTAN tagset, which is described in the file ''tagset.txt''. The morphological annotation includes lemmas. The syntactic annotation is mostly identical to that of the Corpus Gesproken Nederlands (CGNSpoken Dutch Corpusas described in the file ''syn_prot.pdf'' (Dutch only). An attempt to describe number of differences between the CGN and Alpino annotation practice is given in the file ''diff.pdf'' (which is heavily out of date, but the number of differences has been reduced)Conversion issueshead selection, multi-word units, discourse units.
  
-The morphological analysis in the CoNLL 2006 version does not include lemmas (the original DTAG version does contain them). The morphosyntactic tags have been assigned (probably) manually. +Multi-word expressions have been concatenated into one token, using underscore as the joining character (e.g. "Economische_en_Monetaire_Unie"). They have special part-of-speech tags ''MWU'', their subparts of speech and features may describe the individual parts of the unit. E.g. "aan_het" has CPOS ''MWU'', (sub)POS ''Prep_Art'' and features ''voor_bep|onzijd|neut''.
- +
-Some multi-word expressions have been collapsed into one token, using underscore as the joining character. This includes adverbially used prepositional phrases (e.g. i_lørdags = on Saturdaysbut not named entities.+
  
 ==== Sample ==== ==== Sample ====
  
-The first sentence of DDT 1.0 in the DTAG format: +The first two sentences of the CoNLL 2006 training data:
- +
-<code xml><tei.2> +
-  <teiHeader type=text> +
-    <fileDesc> +
-      <titleStmt> +
-        <title>Tagged sample of: 'Jeltsins skæbnetime'</title> +
-      </titleStmt> +
-      <extent words=158>158 running words</extent> +
-      <publicationStmt> +
-         <distributor>PAROLE-DK</distributor> +
-         <address><addrline>Christians Brygge 1,1., DK-1219 Copenhagen K.</address> +
-         <date>1998-06-02</date> +
-         <availability status=restricted><p>by agreement with distributor</availability> +
-      </publicationStmt> +
-      <sourceDesc> +
-        <biblStruct> +
-          <analytic> +
-            <title>Jeltsins skæbnetime</title> +
-            <author gender=m born=1925>Nikulin, Leon</author> +
-          </analytic> +
-          <monogr> +
-            <imprint><pubPlace>Denmark</pubPlace> +
-              <publisher>Det Fri Aktuelt</publisher> +
-              <date>1992-12-01</date> +
-            </imprint> +
-          </monogr> +
-        </biblStruct> +
-      </sourceDesc> +
-    </fileDesc> +
-    <profileDesc> +
-      <creation>1992-12-01</creation> +
-      <langUsage><language>Danish</langUsage> +
-      <textClass> +
-        <catRef target="P.M2"> +
-        <catRef target="P.G4.8"> +
-        <catRef target="P.T9.3"> +
-      </textClass> +
-    </profileDesc> +
-  </teiHeader> +
-<text id=AJK> +
-<body> +
-<div1 type=main> +
-<p> +
-<s> +
-<W lemma="to" msd="AC---U=--" in="9:subj" out="1:mod|2:mod|3:nobj|5:appr">To</W> +
-<W lemma="kendt" msd="ANP[CN]PU=[DI]U" in="-1:mod" out="">kendte</W> +
-<W lemma="russisk" msd="ANP[CN]PU=[DI]U" in="-2:mod" out="">russiske</W> +
-<W lemma="historiker" msd="NCCPU==I" in="-3:nobj" out="">historikere</W> +
-<W lemma="Andronik" msd="NP--U==-" in="1:namef" out="">Andronik</W> +
-<W lemma="Mirganjan" msd="NP--U==-" in="-5:appr" out="-1:namef|1:coord">Mirganjan</W> +
-<W lemma="og" msd="CC" in="-1:coord" out="2:conj">og</W> +
-<W lemma="Igor" msd="NP--U==-" in="1:namef" out="">Igor</W> +
-<W lemma="Klamkin" msd="NP--U==-" in="-2:conj" out="-1:namef">Klamkin</W> +
-<W lemma="tro" msd="VADR=----A-" in="" out="-9:subj|1:mod|2:pnct|3:dobj|12:pnct">tror</W> +
-<W lemma="ikke" msd="RGU" in="-1:mod" out="">ikke</W> +
-<W lemma="," msd="XP" in="-2:pnct" out="">,</W> +
-<W lemma="at" msd="CS" in="-3:dobj" out="2:vobj">at</W> +
-<W lemma="Rusland" msd="NP--U==-" in="1:subj|2:[subj]" out="">Rusland</W> +
-<W lemma="kunne" msd="VADR=----A-" in="-2:vobj" out="-1:subj|1:vobj|2:mod">kan</W> +
-<W lemma="udvikle" msd="VAF-=----P-" in="-1:vobj" out="-2:[subj]">udvikles</W> +
-<W lemma="uden" msd="SP" in="-2:mod" out="1:nobj">uden</W> +
-<W lemma="en" msd="PI-CSU--U" in="-1:nobj" out="2:nobj">en</W> +
-<W lemma="&quot;" msd="XP" in="1:pnct" out="">"</W> +
-<W lemma="jernnæve" msd="NCCSU==I" in="-2:nobj" out="-1:pnct|1:pnct">jernnæve</W> +
-<W lemma="&quot;" msd="XP" in="-1:pnct" out="">"</W> +
-<W lemma="." msd="XP" in="-12:pnct" out="">.</W> +
-</s></code> +
- +
-The first sentence of the CoNLL 2006 training data:+
  
-| 1 | Samme AN degree=pos<nowiki>|</nowiki>gender=common/neuter<nowiki>|</nowiki>number=sing/plur<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=def/indef<nowiki>|</nowiki>transcat=unmarked ROOT | _ | _ | +| 1 | Cathy Cathy | <nowiki>eigen|ev|neut</nowiki> | 2 | su | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-cifre | N | NC gender=neuter<nowiki>|</nowiki>number=plur<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=indef nobj | _ | _ | +| 2 | zag | zie | V | V | <nowiki>trans|ovt|1of2of3|ev</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-XP pnct +| 3 | hen | hen | Pron | Pron | <nowiki>per|3|mv|datofacc</nowiki> | 2 | obj1 | <nowiki>_</nowiki> | <nowiki>_</nowiki>
-de PD gender=common/neuter<nowiki>|</nowiki>number=plur<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>register=unmarked subj | +| 4 | wild | wild | Adj | Adj | <nowiki>attr|stell|onverv</nowiki>mod <nowiki>_</nowiki> <nowiki>_</nowiki> 
-norske | _ AN degree=pos<nowiki>|</nowiki>gender=common/neuter<nowiki>|</nowiki>number=plur<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=def/indef<nowiki>|</nowiki>transcat=unmarked mod | _ | _ | +zwaaien zwaai | N | | <nowiki>soort|mv|neut</nowiki> | 2 | vc | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-piger | N | NC gender=common<nowiki>|</nowiki>number=plur<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=indef nobj | _ | _ | +| 6 | <nowiki>.</nowiki> | <nowiki>.</nowiki>Punc Punc punt | 5 | punct | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 7 | tabte | V | VA mood=indic<nowiki>|</nowiki>tense=past<nowiki>|</nowiki>voice=active | 1 | rel | _ | _ | +| |||||||||| 
-| 8 | med SP SP pobj | _ | _ | +Ze ze Pron Pron | <nowiki>per|3|evofmv|nom</nowiki> | 2 | su | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
-| 9 | i_lørdags RG RG degree=unmarked mod | _ | _ | +had heb <nowiki>trans|ovt|1of2of3|ev</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-| 10 | mod SP SP pobj | _ | _ | +| 3 | met | met | Prep | Prep | voor | 8 | mod | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
-| 11 | VMs NP case=gen 10 nobj | _ | _ | +| 4 | haar | haar | Pron | Pron | <nowiki>bez|3|ev|neut|attr</nowiki>det <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 12 | værtsnation | N | NC gender=common<nowiki>|</nowiki>number=sing<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=indef | 11 | possd | _ | _ +moeder moeder | N | | <nowiki>soort|ev|neut</nowiki> | 3 | obj1 | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-| 13 | . | XP pnct | _ | _ |+| 6 | kunnen | kan | V | V | <nowiki>hulp|ott|1of2of3|mv</nowiki>vc <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +| 7 | gaan ga | V | | <nowiki>hulp|inf</nowiki> | 6 | vc | <nowiki>_</nowiki><nowiki>_</nowiki> 
 +| 8 | winkelen winkel <nowiki>intrans|inf</nowiki> 11 cnj | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +| 9 | <nowiki>,</nowiki> <nowiki>,</nowiki> Punc Punc komma punct <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +| 10 | zwemmen zwem <nowiki>intrans|inf</nowiki> 11 cnj | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +| 11 | of of Conj Conj neven vc <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +| 12 | terrassen terras | N | | <nowiki>soort|mv|neut</nowiki> | 11 | cnj | <nowiki>_</nowiki> <nowiki>_</nowiki>
 +| 13 | <nowiki>.</nowiki> <nowiki>.</nowiki> Punc Punc punt 12 punct <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
-The first sentence of the CoNLL 2006 test data:+The first two sentences of the CoNLL 2006 test data:
  
-| 1 | To AC case=unmarked 10 subj | _ | _ | +| 1 | BASISTAKENPAKKET <nowiki>basis_taken_pakket</nowiki> Prep Prep voor ROOT <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 2 | kendte AN degree=pos<nowiki>|</nowiki>gender=common/neuter<nowiki>|</nowiki>number=plur<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=def/indef<nowiki>|</nowiki>transcat=unmarked mod | _ | _ | +| 2 | JEUGDGEZONDHEIDSZORG <nowiki>jeugd_gezondheid_zorg</nowiki> | <nowiki>eigen|ev|neut</nowiki> | 0 | ROOT | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-russiske AN degree=pos<nowiki>|</nowiki>gender=common/neuter<nowiki>|</nowiki>number=plur<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=def/indef<nowiki>|</nowiki>transcat=unmarked mod | _ | _ | +| 3 | <nowiki>0-19</nowiki> | <nowiki>0-19</nowiki> | Num | Num | <nowiki>hoofd|bep|attr|onverv</nowiki>det <nowiki>_</nowiki> <nowiki>_</nowiki> 
-historikere NC gender=common<nowiki>|</nowiki>number=plur<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=indef nobj | _ | _ | +JAAR JAAR | <nowiki>eigen|ev|neut</nowiki> | 0 | ROOT | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-| 5 | Andronik | N | NP case=unmarked namef | _ | _ | +| |||||||||| 
-| 6 | Mirganjan NP case=unmarked appr | _ | _ | +| 1 | Daarvoor | daarvoor | Adv | Adv | <nowiki>pron|aanw</nowiki> | 3 | pc | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
-| 7 | og CC | 6 | coord | _ | _ | +| 2 | is | ben | V | V | <nowiki>hulpofkopp|ott|3|ev</nowiki>ROOT <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 8 | Igor NP case=unmarked namef | _ | _ | +gekozen kies | <nowiki>trans|verldw|onverv</nowiki> | 2 | vc | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-| 9 | Klamkin NP case=unmarked conj | _ | _ | +| 4 | omdat | omdat | Conj | Conj | <nowiki>onder|metfin</nowiki>mod <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 10 | tror _ | V | VA | mood=indic<nowiki>|</nowiki>tense=present<nowiki>|</nowiki>voice=active ROOT | _ | _ | +| 5 | gemeenten gemeente | N | <nowiki>soort|mv|neut</nowiki> 11 | su | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 11 | ikke RG RG degree=unmarked 10 mod | _ | _ | +| 6 | bij bij Prep Prep voor 12 mod <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 12 | XP 10 pnct | _ | _ | +| 7 | uitstek uitstek <nowiki>soort|ev|neut</nowiki> | 6 | obj1 <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 13 | at CS 10 dobj | _ | _ | +| 8 | het het Art Art <nowiki>bep|onzijd|neut</nowiki> 10 | det | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 14 | Rusland NP case=unmarked 15 subj | _ | _ | +| 9 | lokale lokaal Adj Adj <nowiki>attr|stell|vervneut</nowiki> 10 | mod | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-| 15 | kan | _ | VA mood=indic<nowiki>|</nowiki>tense=present<nowiki>|</nowiki>voice=active | 13 | vobj | _ | _ | +| 10 | gezondheidsbeleid | <nowiki>gezondheid_beleid</nowiki> | N | N | <nowiki>soort|ev|neut</nowiki>12 obj1 <nowiki>_</nowiki> <nowiki>_</nowiki> 
-16 udvikles VA mood=infin<nowiki>|</nowiki>voice=passive 15 vobj | _ | _ | +| 11 | kunnen kan <nowiki>hulp|inf</nowiki> body | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-17 uden SP SP 15 | mod | _ | _ | +| 12 | toespitsen <nowiki>spits_toe</nowiki> <nowiki>refl|inf</nowiki> 11 vc | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-18 en PI gender=common<nowiki>|</nowiki>number=sing<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>register=unmarked 17 nobj | _ | _ | +| 13 | op op Prep Prep voor 12 pc <nowiki>_</nowiki> <nowiki>_</nowiki> 
-19 XP | 20 | pnct | _ | _ | +| 14 | de de Art Art <nowiki>bep|zijdofmv|neut</nowiki> 16 | det | <nowiki>_</nowiki> <nowiki>_</nowiki> 
-20 jernnæve NC gender=common<nowiki>|</nowiki>number=sing<nowiki>|</nowiki>case=unmarked<nowiki>|</nowiki>def=indef 18 nobj | _ | _ | +| 15 | specifieke specifiek | Adj | Adj | <nowiki>attr|stell|vervneut</nowiki> | 16 | mod | <nowiki>_</nowiki> <nowiki>_</nowiki> | 
-21 XP 20 pnct | _ | _ | +| 16 | gezondheidssituatie | <nowiki>gezondheid_situatie</nowiki> | N | N | <nowiki>soort|ev|neut</nowiki> | 17 | cnj | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
-22 | . | XP 10 pnct | _ | _ |+| 17 | en | en | Conj | Conj | neven | 13 | obj1 <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +18 zorgbehoeften <nowiki>zorg_behoefte</nowiki> | <nowiki>soort|mv|neut</nowiki>17 cnj <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +19 van van Prep Prep voor 16 | mod | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +20 kinderen kind | <nowiki>soort|mv|neut</nowiki> | 21 | cnj | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +| 21 | en | en | Conj | Conj | neven | 19 | obj1 | <nowiki>_</nowiki> | <nowiki>_</nowiki> | 
 +22 jongeren | jongere | Adj | Adj | <nowiki>zelfst|vergr|vervneut</nowiki> | 21 | cnj | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +23 in in Prep Prep voor | 20 | mod <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +24 de de Art Art | <nowiki>bep|zijdofmv|neut</nowiki> | 26 | det | <nowiki>_</nowiki> | <nowiki>_</nowiki> 
 +| 25 | eigen | eigen | Pron | Pron | <nowiki>aanw|neut|attr|weigen</nowiki>26 mod <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +26 gemeente gemeente <nowiki>soort|ev|neut</nowiki> 23 | obj1 | <nowiki>_</nowiki> <nowiki>_</nowiki> 
 +27 <nowiki>.</nowiki> <nowiki>.</nowiki> Punc Punc punt 26 punct <nowiki>_</nowiki> <nowiki>_</nowiki> |
  
 ==== Parsing ==== ==== Parsing ====
  
-Nonprojectivities in DDT are not frequent. Only 988 of the 100,238 tokens in the CoNLL 2006 version are attached nonprojectively (0.99%).+Nonprojectivities in Alpino are quite frequent. 10858 of the 200,654 tokens in the CoNLL 2006 version are attached nonprojectively (5.41%).
  
-The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Danish:+The results of the CoNLL 2006 shared task are [[http://ilk.uvt.nl/conll/results.html|available online]]. They have been published in [[http://aclweb.org/anthology-new/W/W06/W06-2920.pdf|(Buchholz and Marsi, 2006)]]. The evaluation procedure was non-standard because it excluded punctuation tokens. These are the best results for Dutch:
  
 ^ Parser (Authors) ^ LAS ^ UAS ^ ^ Parser (Authors) ^ LAS ^ UAS ^
-| MST (McDonald et al.) | 84.79 | 90.58 +| MST (McDonald et al.) | 79.19 83.57 
-Malt (Nivre et al.84.77 | 89.80 +Riedel et al. | 78.59 | 82.91 | 
-Riedel et al. | 83.63 89.66 |+| Basis (John O'Neil) | 77.51 81.73 
 +Malt (Nivre et al.78.59 81.35 |
  

[ Back to the navigation ] [ Back to the content ]