[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
user:zeman:turecka-morfologie [2013/06/17 22:20]
zeman Tree Tagger.
user:zeman:turecka-morfologie [2013/06/20 10:34]
zeman Turecké značky jsou nyní přizpůsobovány Schmidovu RFTaggeru.
Line 20: Line 20:
 cat /net/data/conll/2007/tr/train.conll | prepare_lexicon_from_conll.pl --type openclass > openclass.tr.txt cat /net/data/conll/2007/tr/train.conll | prepare_lexicon_from_conll.pl --type openclass > openclass.tr.txt
 cat /net/data/conll/2007/tr/test.conll | prepare_lexicon_from_conll.pl --type test > test.tr.txt cat /net/data/conll/2007/tr/test.conll | prepare_lexicon_from_conll.pl --type test > test.tr.txt
-bin/train-tree-tagger lexicon.tr.txt openclass.tr.txt train.tr.txt tr.par -st 'Punc|Punc|_'+cat /net/data/conll/2007/tr/test.conll | prepare_lexicon_from_conll.pl --type train > gold.tr.txt 
 +bin/train-tree-tagger lexicon.tr.txt openclass.tr.txt train.tr.txt tr.par -st 'Punc.Punc._'
 bin/tree-tagger -token -lemma tr.par < test.tr.txt > tagged.tr.txt bin/tree-tagger -token -lemma tr.par < test.tr.txt > tagged.tr.txt
 +eval_tree_tagger.pl tagged.tr.txt gold.tr.txt
 </code> </code>
 +
 +Výsledky na tureckém treebanku CoNLL 2007 jsou následující:
 +
 +3610 total tokens.
 +1221 unknown tokens (33.822715 %).
 +2656 correct tags (73.573407 %).
 +2200 correct tags of known words (92.088740 %).
 +456 correct tags of unknown words (37.346437 %).
 +3199 correct parts of speech (88.614958 %).
 +2270 correct parts of speech of known words (95.018836 %).
 +929 correct parts of speech of unknown words (76.085176 %).
  
 ===== Turecká Wikipedie ===== ===== Turecká Wikipedie =====

[ Back to the navigation ] [ Back to the content ]