Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| user:zeman:self-training [2008/07/08 17:53] zeman vytvořeno (přeneseno z wiki CLIP a převedeno z MediaWiki do DokuWiki | user:zeman:self-training [2008/07/09 16:43] (current) zeman | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | This page describes an experiment conducted by [[User: | + | This page describes an experiment conducted by [[User: | 
| - | I am trying to repeat the experiment of David McClosky, Eugene Charniak, and Mark Johnson ([http:// | + | I am trying to repeat the experiment of David McClosky, Eugene Charniak, and Mark Johnson ([[http:// | 
| Once the original self-training experiment works as expected, we are going to use a similar scheme for [[Parser Adaptation|parser adaptation]] to a new language. | Once the original self-training experiment works as expected, we are going to use a similar scheme for [[Parser Adaptation|parser adaptation]] to a new language. | ||
| Line 9: | Line 9: | ||
| Note: I am going to move around some stuff, especially that in my home folder. | Note: I am going to move around some stuff, especially that in my home folder. | ||
| - | * '' | + | * '' | 
| * ''/ | * ''/ | ||
| * ''/ | * ''/ | ||
| Line 57: | Line 57: | ||
| </ | </ | ||
| - | + | | Section | 22 || 23 || | |
| - | | Section | + | | Parser | Charniak | Brown | Charniak | Brown | | 
| - | + | | Precision | 90.54 | 92.81 | 90.43 | 92.35 | | |
| - | |  Parser | + | | Recall | 90.43 | 91.92 | 90.21 | 91.61 | | 
| - | + | | F-score | 90.48 | 92.36 | 90.32 | 91.98 | | |
| - | |  Precision | + | | Tagging | 96.15 | 92.41 | 96.78 | 92.33 | | 
| - | + | | Crossing | 0.66 | 0.49 | 0.72 | 0.59 | | |
| - | |  Recall | + | |
| - | + | ||
| - | |  F-score | + | |
| - | + | ||
| - | |  Tagging | + | |
| - | + | ||
| - | |  Crossing | + | |
| Line 76: | Line 69: | ||
| See [[North American News Text Corpus]] for more information on the data and its preparation. | See [[North American News Text Corpus]] for more information on the data and its preparation. | ||
| + | |||
| =====Parsing NANTC using P< | =====Parsing NANTC using P< | ||
| - | See [[Parsers|here]] for more information on the Brown Reranking Parser. We parsed the LATWP part of NANTC on the C cluster using the following command: | + | See [[:Parsery|here]] for more information on the Brown Reranking Parser. We parsed the LATWP part of NANTC on the C cluster using the following command: | 
| < | < | ||
| cd / | cd / | ||
| - | $PARSINGROOT/ | + | $PARSINGROOT/ | 
| -o latwp.05a.brown.penn -w workdir05 -k | -o latwp.05a.brown.penn -w workdir05 -k | ||
| </ | </ | ||
| Line 91: | Line 85: | ||
| =====Retraining the first-stage parser===== | =====Retraining the first-stage parser===== | ||
| - | The following command trains the Charniak parser on 5 copies of the sections 02-21 of the Penn Treebank Wall Street Journal, and 1 copy of the parsed part of NANTC (<span style=" | + | The following command trains the Charniak parser on 5 copies of the sections 02-21 of the Penn Treebank Wall Street Journal, and 1 copy of the parsed part of NANTC (< | 
| < | < | ||
| - | $PARSINGROOT/ | + | $PARSINGROOT/ | 
| </ | </ | ||
| Line 103: | Line 97: | ||
| < | < | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| - |  | + |  | 
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| - |  | + |  | 
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| </ | </ | ||
| - | + | | Section | 22 | 23 | | |
| - | |  Section | + | | Precision | 87.74 | 88.26 | | 
| - | + | | Recall | 88.65 | 88.54 | | |
| - | |  Precision | + | | F-score | 88.19 | 88.40 | | 
| - | + | | Tagging | 92.67 | 92.84 | | |
| - | |  Recall | + | | Crossing | 0.80 | 0.91 | | 
| - | + | ||
| - | |  F-score | + | |
| - | + | ||
| - | |  Tagging | + | |
| - | + | ||
| - | |  Crossing | + | |
| Line 133: | Line 121: | ||
| < | < | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| </ | </ | ||
| Line 140: | Line 128: | ||
| < | < | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| -o ptbwsj22.br.ptb+latwp3000.penn | -o ptbwsj22.br.ptb+latwp3000.penn | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| -o ptbwsj23.br.ptb+latwp3000.penn | -o ptbwsj23.br.ptb+latwp3000.penn | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| </ | </ | ||
| - | + | | Section | 22 | 23 | | |
| - | |  Section | + | | Precision | 90.39 | 90.68 | | 
| - | + | | Recall | 90.30 | 90.24 | | |
| - | |  Precision | + | | F-score | 90.34 | 90.46 | | 
| - | + | | Tagging | 93.43 | 93.65 | | |
| - | |  Recall | + | | Crossing | 0.61 | 0.71 | | 
| - | + | ||
| - | |  F-score | + | |
| - | + | ||
| - | |  Tagging | + | |
| - | + | ||
| - | |  Crossing | + | |
| Line 169: | Line 151: | ||
| < | < | ||
| - | head -1750000 latwp.05a.brown.penn | + | head -1750000 latwp.05a.brown.penn | 
| </ | </ | ||
| Line 175: | Line 157: | ||
| < | < | ||
| - | $PARSINGROOT/ | + | $PARSINGROOT/ | 
| </ | </ | ||
| Line 182: | Line 164: | ||
| < | < | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| </ | </ | ||
| Line 189: | Line 171: | ||
| < | < | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| -o ptbwsj22.ec.ptb+latwp1750.penn | -o ptbwsj22.ec.ptb+latwp1750.penn | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| -o ptbwsj23.ec.ptb+latwp1750.penn | -o ptbwsj23.ec.ptb+latwp1750.penn | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| -o ptbwsj22.br.ptb+latwp1750.penn | -o ptbwsj22.br.ptb+latwp1750.penn | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| -o ptbwsj23.br.ptb+latwp1750.penn | -o ptbwsj23.br.ptb+latwp1750.penn | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| Line 214: | Line 196: | ||
| < | < | ||
| - | $PARSINGROOT/ | + | $PARSINGROOT/ | 
| </ | </ | ||
| Line 221: | Line 203: | ||
| < | < | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| -o ptbwsj22.ec.5ptb.penn | -o ptbwsj22.ec.5ptb.penn | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| - |  | + |  | 
| -o ptbwsj23.ec.5ptb.penn | -o ptbwsj23.ec.5ptb.penn | ||
| $PARSINGROOT/ | $PARSINGROOT/ | ||
| Line 237: | Line 219: | ||
| Remember that Charniak parser means without reranker, Brown parser means with reranker. PTB WSJ (or WSJ) in training means sections 02-21. 50k NANTC means 50,000 sentences of NANTC LATWP. | Remember that Charniak parser means without reranker, Brown parser means with reranker. PTB WSJ (or WSJ) in training means sections 02-21. 50k NANTC means 50,000 sentences of NANTC LATWP. | ||
| - | + | | Parsing NANTC using || Parsing test using || | |
| - | |colspan=2 rowspan=2| Parsing NANTC using ||colspan=2 rowspan=2| Parsing test using ||colspan=4 align=center| Section | + | |  ||  || | 
| - | + | | parser | trained on | parser | trained on | McClosky | Zeman | McClosky | Zeman | | |
| - | |colspan=2 align=center| 22 ||colspan=2 align=center| 23 | + | | | | Stanford | PTB WSJ | | | | 86.5 | | 
| - | + | | | | Charniak | PTB WSJ | 90.3 | 90.5 | 89.7 | 90.3 | | |
| - | |  parser | + | | Brown | PTB WSJ | Charniak | WSJ + 50k NANTC |  90.7 | 
| - | + | | Brown | PTB WSJ | Charniak | WSJ + 250k NANTC | 90.7 | 91.0 | | 90.9 | | |
| - | | || || Stanford | + | | Brown | PTB WSJ | Charniak | WSJ + 500k NANTC |  90.9 | 
| - | + | | Brown | PTB WSJ | Charniak | WSJ + 750k NANTC |  91.0 | |
| - | | || || Charniak | + | | Brown | PTB WSJ | Charniak | WSJ + 1000k NANTC |  90.8 | 
| - | + | | Brown | PTB WSJ | Charniak | WSJ + 1500k NANTC |  90.8 | |
| - | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Charniak | WSJ + 2000k NANTC |  91.0 | 
| - | + | | Brown | PTB WSJ | Charniak | 5 × WSJ |  84.7 | |
| - | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Charniak | 5 × WSJ + 1750k NANTC | | 87.6 | 91.0 | 87.9 | | 
| - | + | | Brown | PTB WSJ | Charniak | 5 × WSJ + 3143k NANTC | | 88.2 | | 88.4 | | |
| - | | Brown || PTB WSJ || Charniak | + | | | | Brown | PTB WSJ | | 92.4 | 91.3 | 92.0 | | 
| - | + | | Brown | PTB WSJ | Brown | WSJ + 50k NANTC |  92.4 | |
| - | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Brown | WSJ + 250k NANTC | 92.3 | 92.2 | | 92.3 | | 
| - | + | | Brown | PTB WSJ | Brown | WSJ + 500k NANTC |  92.4 | |
| - | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Brown | WSJ + 750k NANTC |  92.4 | 
| - | + | | Brown | PTB WSJ | Brown | WSJ + 1000k NANTC |  92.2 | |
| - | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Brown | WSJ + 1500k NANTC |  92.1 | 
| - | + | | Brown | PTB WSJ | Brown | WSJ + 2000k NANTC |  92.0 | |
| - | | Brown || PTB WSJ || Charniak | + | | Brown | PTB WSJ | Brown | 5 × WSJ + 1750k NANTC | | 89.9 | 92.1 | 90.0 | | 
| - | + | | Brown | PTB WSJ | Brown | 5 × WSJ + 3143k NANTC | | 90.3 | | 90.5 | | |
| - | | Brown || PTB WSJ || Charniak | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Charniak | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Charniak | + | |
| - | + | ||
| - | | || || Brown || PTB WSJ || ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || WSJ + 50k NANTC ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || WSJ + 250k NANTC ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || WSJ + 500k NANTC ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || WSJ + 750k NANTC ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || WSJ + 1000k NANTC ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || WSJ + 1500k NANTC ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || WSJ + 2000k NANTC ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || 5 × WSJ + 1750k NANTC || ||align=center| | + | |
| - | + | ||
| - | | Brown || PTB WSJ || Brown || 5 × WSJ + 3143k NANTC || ||align=center| | + | |
| - | + | ||
| - | + | ||
| - | [[Category: | + | |
| - | [[Category: | + | |
| - | [[Category: | + | |
