Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
user:zeman:self-training [2008/07/08 17:53] zeman vytvořeno (přeneseno z wiki CLIP a převedeno z MediaWiki do DokuWiki |
user:zeman:self-training [2008/07/09 16:43] (current) zeman |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | This page describes an experiment conducted by [[User: | + | This page describes an experiment conducted by [[User: |
- | I am trying to repeat the experiment of David McClosky, Eugene Charniak, and Mark Johnson ([http:// | + | I am trying to repeat the experiment of David McClosky, Eugene Charniak, and Mark Johnson ([[http:// |
Once the original self-training experiment works as expected, we are going to use a similar scheme for [[Parser Adaptation|parser adaptation]] to a new language. | Once the original self-training experiment works as expected, we are going to use a similar scheme for [[Parser Adaptation|parser adaptation]] to a new language. | ||
Line 9: | Line 9: | ||
Note: I am going to move around some stuff, especially that in my home folder. | Note: I am going to move around some stuff, especially that in my home folder. | ||
- | * '' | + | * '' |
* ''/ | * ''/ | ||
* ''/ | * ''/ | ||
Line 57: | Line 57: | ||
</ | </ | ||
- | + | | Section | 22 || 23 || | |
- | | Section | + | | Parser | Charniak | Brown | Charniak | Brown | |
- | + | | Precision | 90.54 | 92.81 | 90.43 | 92.35 | | |
- | | Parser | + | | Recall | 90.43 | 91.92 | 90.21 | 91.61 | |
- | + | | F-score | 90.48 | 92.36 | 90.32 | 91.98 | | |
- | | Precision | + | | Tagging | 96.15 | 92.41 | 96.78 | 92.33 | |
- | + | | Crossing | 0.66 | 0.49 | 0.72 | 0.59 | | |
- | | Recall | + | |
- | + | ||
- | | F-score | + | |
- | + | ||
- | | Tagging | + | |
- | + | ||
- | | Crossing | + | |
Line 76: | Line 69: | ||
See [[North American News Text Corpus]] for more information on the data and its preparation. | See [[North American News Text Corpus]] for more information on the data and its preparation. | ||
+ | |||
=====Parsing NANTC using P< | =====Parsing NANTC using P< | ||
- | See [[Parsers|here]] for more information on the Brown Reranking Parser. We parsed the LATWP part of NANTC on the C cluster using the following command: | + | See [[:Parsery|here]] for more information on the Brown Reranking Parser. We parsed the LATWP part of NANTC on the C cluster using the following command: |
< | < | ||
cd / | cd / | ||
- | $PARSINGROOT/ | + | $PARSINGROOT/ |
-o latwp.05a.brown.penn -w workdir05 -k | -o latwp.05a.brown.penn -w workdir05 -k | ||
</ | </ | ||
Line 91: | Line 85: | ||
=====Retraining the first-stage parser===== | =====Retraining the first-stage parser===== | ||
- | The following command trains the Charniak parser on 5 copies of the sections 02-21 of the Penn Treebank Wall Street Journal, and 1 copy of the parsed part of NANTC (<span style=" | + | The following command trains the Charniak parser on 5 copies of the sections 02-21 of the Penn Treebank Wall Street Journal, and 1 copy of the parsed part of NANTC (< |
< | < | ||
- | $PARSINGROOT/ | + | $PARSINGROOT/ |
</ | </ | ||
Line 103: | Line 97: | ||
< | < | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
- | | + | |
$PARSINGROOT/ | $PARSINGROOT/ | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
- | | + | |
$PARSINGROOT/ | $PARSINGROOT/ | ||
</ | </ | ||
- | + | | Section | 22 | 23 | | |
- | | Section | + | | Precision | 87.74 | 88.26 | |
- | + | | Recall | 88.65 | 88.54 | | |
- | | Precision | + | | F-score | 88.19 | 88.40 | |
- | + | | Tagging | 92.67 | 92.84 | | |
- | | Recall | + | | Crossing | 0.80 | 0.91 | |
- | + | ||
- | | F-score | + | |
- | + | ||
- | | Tagging | + | |
- | + | ||
- | | Crossing | + | |
Line 133: | Line 121: | ||
< | < | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
</ | </ | ||
Line 140: | Line 128: | ||
< | < | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
-o ptbwsj22.br.ptb+latwp3000.penn | -o ptbwsj22.br.ptb+latwp3000.penn | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
-o ptbwsj23.br.ptb+latwp3000.penn | -o ptbwsj23.br.ptb+latwp3000.penn | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
</ | </ | ||
- | + | | Section | 22 | 23 | | |
- | | Section | + | | Precision | 90.39 | 90.68 | |
- | + | | Recall | 90.30 | 90.24 | | |
- | | Precision | + | | F-score | 90.34 | 90.46 | |
- | + | | Tagging | 93.43 | 93.65 | | |
- | | Recall | + | | Crossing | 0.61 | 0.71 | |
- | + | ||
- | | F-score | + | |
- | + | ||
- | | Tagging | + | |
- | + | ||
- | | Crossing | + | |
Line 169: | Line 151: | ||
< | < | ||
- | head -1750000 latwp.05a.brown.penn | + | head -1750000 latwp.05a.brown.penn |
</ | </ | ||
Line 175: | Line 157: | ||
< | < | ||
- | $PARSINGROOT/ | + | $PARSINGROOT/ |
</ | </ | ||
Line 182: | Line 164: | ||
< | < | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
</ | </ | ||
Line 189: | Line 171: | ||
< | < | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
-o ptbwsj22.ec.ptb+latwp1750.penn | -o ptbwsj22.ec.ptb+latwp1750.penn | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
-o ptbwsj23.ec.ptb+latwp1750.penn | -o ptbwsj23.ec.ptb+latwp1750.penn | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
-o ptbwsj22.br.ptb+latwp1750.penn | -o ptbwsj22.br.ptb+latwp1750.penn | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
-o ptbwsj23.br.ptb+latwp1750.penn | -o ptbwsj23.br.ptb+latwp1750.penn | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
Line 214: | Line 196: | ||
< | < | ||
- | $PARSINGROOT/ | + | $PARSINGROOT/ |
</ | </ | ||
Line 221: | Line 203: | ||
< | < | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
-o ptbwsj22.ec.5ptb.penn | -o ptbwsj22.ec.5ptb.penn | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
- | | + | |
-o ptbwsj23.ec.5ptb.penn | -o ptbwsj23.ec.5ptb.penn | ||
$PARSINGROOT/ | $PARSINGROOT/ | ||
Line 237: | Line 219: | ||
Remember that Charniak parser means without reranker, Brown parser means with reranker. PTB WSJ (or WSJ) in training means sections 02-21. 50k NANTC means 50,000 sentences of NANTC LATWP. | Remember that Charniak parser means without reranker, Brown parser means with reranker. PTB WSJ (or WSJ) in training means sections 02-21. 50k NANTC means 50,000 sentences of NANTC LATWP. | ||
- | + | | Parsing NANTC using || Parsing test using || | |
- | |colspan=2 rowspan=2| Parsing NANTC using ||colspan=2 rowspan=2| Parsing test using ||colspan=4 align=center| Section | + | | || || |
- | + | | parser | trained on | parser | trained on | McClosky | Zeman | McClosky | Zeman | | |
- | |colspan=2 align=center| 22 ||colspan=2 align=center| 23 | + | | | | Stanford | PTB WSJ | | | | 86.5 | |
- | + | | | | Charniak | PTB WSJ | 90.3 | 90.5 | 89.7 | 90.3 | | |
- | | parser | + | | Brown | PTB WSJ | Charniak | WSJ + 50k NANTC | 90.7 |
- | + | | Brown | PTB WSJ | Charniak | WSJ + 250k NANTC | 90.7 | 91.0 | | 90.9 | | |
- | | || || Stanford | + | | Brown | PTB WSJ | Charniak | WSJ + 500k NANTC | 90.9 |
- | + | | Brown | PTB WSJ | Charniak | WSJ + 750k NANTC | 91.0 | |
- | | || || Charniak | + | | Brown | PTB WSJ | Charniak | WSJ + 1000k NANTC | 90.8 |
- | + | | Brown | PTB WSJ | Charniak | WSJ + 1500k NANTC | 90.8 | |
- | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Charniak | WSJ + 2000k NANTC | 91.0 |
- | + | | Brown | PTB WSJ | Charniak | 5 × WSJ | 84.7 | |
- | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Charniak | 5 × WSJ + 1750k NANTC | | 87.6 | 91.0 | 87.9 | |
- | + | | Brown | PTB WSJ | Charniak | 5 × WSJ + 3143k NANTC | | 88.2 | | 88.4 | | |
- | | Brown || PTB WSJ || Charniak | + | | | | Brown | PTB WSJ | | 92.4 | 91.3 | 92.0 | |
- | + | | Brown | PTB WSJ | Brown | WSJ + 50k NANTC | 92.4 | |
- | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Brown | WSJ + 250k NANTC | 92.3 | 92.2 | | 92.3 | |
- | + | | Brown | PTB WSJ | Brown | WSJ + 500k NANTC | 92.4 | |
- | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Brown | WSJ + 750k NANTC | 92.4 |
- | + | | Brown | PTB WSJ | Brown | WSJ + 1000k NANTC | 92.2 | |
- | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Brown | WSJ + 1500k NANTC | 92.1 |
- | + | | Brown | PTB WSJ | Brown | WSJ + 2000k NANTC | 92.0 | |
- | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Brown | 5 × WSJ + 1750k NANTC | | 89.9 | 92.1 | 90.0 | |
- | + | | Brown | PTB WSJ | Brown | 5 × WSJ + 3143k NANTC | | 90.3 | | 90.5 | | |
- | | Brown || PTB WSJ || Charniak | + | |
- | + | ||
- | | Brown || PTB WSJ || Charniak | + | |
- | + | ||
- | | Brown || PTB WSJ || Charniak | + | |
- | + | ||
- | | || || Brown || PTB WSJ || ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || WSJ + 50k NANTC ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || WSJ + 250k NANTC ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || WSJ + 500k NANTC ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || WSJ + 750k NANTC ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || WSJ + 1000k NANTC ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || WSJ + 1500k NANTC ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || WSJ + 2000k NANTC ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || 5 × WSJ + 1750k NANTC || ||align=center| | + | |
- | + | ||
- | | Brown || PTB WSJ || Brown || 5 × WSJ + 3143k NANTC || ||align=center| | + | |
- | + | ||
- | + | ||
- | [[Category: | + | |
- | [[Category: | + | |
- | [[Category: | + | |