Both sides previous revision
Previous revision
Next revision
|
Previous revision
|
parsery [2007/10/16 18:49] zeman Brown Reranking Parser. |
parsery [2007/10/16 21:55] (current) zeman N-best parsing with Charniak. |
make all | make all |
</code> | </code> |
| |
| |
| |
| |
| |
Our SVN version of the Brown parser has some advantages over the standard distribution: | Our SVN version of the Brown parser has some advantages over the standard distribution: |
* Number of output sentences equals to the number of input sentences. If parse of a sentence failed, the output line will be "__FAILED__...something" which can be easily fixed by one of our scripts. The original parser did not tell you //where// it failed, which was very difficult to fix. | * Number of output sentences equals to the number of input sentences. If parse of a sentence failed, the output line will be "<nowiki>__FAILED__</nowiki>...something" which can be easily fixed by one of our scripts. The original parser did not tell you //where// it failed, which was very difficult to fix. |
* The parser does not say just //Segmentation fault// when it hits the vocabulary size limit. Moreover, the limit has been pushed from 50,000 words to 1,000,000 words. | * The parser does not say just //Segmentation fault// when it hits the vocabulary size limit. Moreover, the limit has been pushed from 50,000 words to 1,000,000 words. |
* The reranker has been freed from its dependency on the Penn Treebank. The various data relations, originally wired deeply into their Makefile, are now generalized to the extent that we can call a training script and supply the training data as standard input. Not all parts of the Makefile have been generalized, yet. | * The reranker has been freed from its dependency on the Penn Treebank. The various data relations, originally wired deeply into their Makefile, are now generalized to the extent that we can call a training script and supply the training data as standard input. Not all parts of the Makefile have been generalized, yet. |
Both folders, ''charniak-parser'' and ''brown-reranking-parser'', have a ''scripts'' subfolder with the basic set of ''parse.pl'', ''cluster-parse.pl'', and ''train.pl''. These scripts are invoked in much the same fashion as for the Stanford parser (see above). | Both folders, ''charniak-parser'' and ''brown-reranking-parser'', have a ''scripts'' subfolder with the basic set of ''parse.pl'', ''cluster-parse.pl'', and ''train.pl''. These scripts are invoked in much the same fashion as for the Stanford parser (see above). |
| |
Although Charniak parser can output n best parses for each sentence, the scripts currently do not support that. You would have to go into the scripts and add ''-N5'' (replace 5 by the N you want) to the ''parseIt'' command line. The output will also show you the log prob of the parse. | The ''parse.pl'' and ''cluster-parse.pl'' scripts of the Charniak parser accept the ''-Nbest'' option, in addition to standard options of these scripts. ''-Nbest 50'' translates as ''-N50'' on Charniak's ''parseIt'' commandline. It asks the parser to output N (here 50) best parses, instead of just one. The output format for N>1 differs from the default: the set of parses is preceded by a line with the number of parses and the ID (number) of the sentence, and every parse is preceded by a line with the weight (log probability) of the parse. This option only applies to ''charniak-parser''. It is ignored by ''brown-reranking-parser''. |
| |
=== Training === | === Training === |