[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
user:zeman:self-training [2008/07/09 10:55]
zeman Upravená konverze z MediaWiki zvládá lépe tabulky.
user:zeman:self-training [2008/07/09 16:43] (current)
zeman
Line 1: Line 1:
-This page describes an experiment conducted by [[User:Zeman|Dan Zeman]] in November and December 2006.+This page describes an experiment conducted by [[User:Zeman:start|Dan Zeman]] in November and December 2006.
  
 I am trying to repeat the experiment of David McClosky, Eugene Charniak, and Mark Johnson ([[http://www.cog.brown.edu/~mj/papers/naacl06-self-train.pdf|NAACL 2006, New York]]) with self-training a parser. The idea is that you train a parser on small data, run it over big data, re-train it on its own output for the big data, and have a better-performing parser. The folks at Brown University used Charniak's reranking parser, i.e. a parser-reranker sequence. The big data was parsed by the whole reranking parser but only the first-stage parser was retrained on it. The reranker only saw the small data. I am trying to repeat the experiment of David McClosky, Eugene Charniak, and Mark Johnson ([[http://www.cog.brown.edu/~mj/papers/naacl06-self-train.pdf|NAACL 2006, New York]]) with self-training a parser. The idea is that you train a parser on small data, run it over big data, re-train it on its own output for the big data, and have a better-performing parser. The folks at Brown University used Charniak's reranking parser, i.e. a parser-reranker sequence. The big data was parsed by the whole reranking parser but only the first-stage parser was retrained on it. The reranker only saw the small data.
Line 9: Line 9:
 Note: I am going to move around some stuff, especially that in my home folder. Note: I am going to move around some stuff, especially that in my home folder.
  
-  * ''$PARSINGROOT'' - working copy of the parsers and related scripts. See [[Parsing]] on how to create your own.+  * ''$PARSINGROOT'' - working copy of the parsers and related scripts. See [[:parsery|Parsing]] on how to create your own.
   * ''/fs/clip-corpora/ptb/processed'' - [[Penn Treebank]] (referred to as ''$PTB'')   * ''/fs/clip-corpora/ptb/processed'' - [[Penn Treebank]] (referred to as ''$PTB'')
   * ''/fs/clip-corpora/north_american_news'' - [[North American News Text Corpus]], including everything I made of it   * ''/fs/clip-corpora/north_american_news'' - [[North American News Text Corpus]], including everything I made of it
Line 69: Line 69:
  
 See [[North American News Text Corpus]] for more information on the data and its preparation. See [[North American News Text Corpus]] for more information on the data and its preparation.
 +
  
 =====Parsing NANTC using P<sub>0</sub>===== =====Parsing NANTC using P<sub>0</sub>=====
  
-See [[Parsers|here]] for more information on the Brown Reranking Parser. We parsed the LATWP part of NANTC on the C cluster using the following command:+See [[:Parsery|here]] for more information on the Brown Reranking Parser. We parsed the LATWP part of NANTC on the C cluster using the following command:
  
 <code> <code>
Line 243: Line 244:
 | Brown | PTB WSJ | Brown | 5 × WSJ + 1750k NANTC |  |  89.9  |  92.1  |  90.0  | | Brown | PTB WSJ | Brown | 5 × WSJ + 1750k NANTC |  |  89.9  |  92.1  |  90.0  |
 | Brown | PTB WSJ | Brown | 5 × WSJ + 3143k NANTC |  |  90.3  |  |  90.5  | | Brown | PTB WSJ | Brown | 5 × WSJ + 3143k NANTC |  |  90.3  |  |  90.5  |
- 
- 
-[[Category:Experiments]] 
-[[Category:English]] 
-[[Category:Parsing]] 
  

[ Back to the navigation ] [ Back to the content ]