[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
parsery [2007/10/16 18:36]
zeman
parsery [2007/10/16 19:03]
zeman Unwanted wiki markup.
Line 77: Line 77:
 make all make all
 </code> </code>
 +
 +
  
 ==== Brown Reranking Parser ==== ==== Brown Reranking Parser ====
Line 89: Line 91:
  
 Our SVN version of the Brown parser has some advantages over the standard distribution: Our SVN version of the Brown parser has some advantages over the standard distribution:
-  * Number of output sentences equals to the number of input sentences. If parse of a sentence failed, the output line will be "__FAILED__...something" which can be easily fixed by one of our scripts. The original parser did not tell you //where// it failed, which was very difficult to fix.+  * Number of output sentences equals to the number of input sentences. If parse of a sentence failed, the output line will be "<nowiki>__FAILED__</nowiki>...something" which can be easily fixed by one of our scripts. The original parser did not tell you //where// it failed, which was very difficult to fix.
   * The parser does not say just //Segmentation fault// when it hits the vocabulary size limit. Moreover, the limit has been pushed from 50,000 words to 1,000,000 words.   * The parser does not say just //Segmentation fault// when it hits the vocabulary size limit. Moreover, the limit has been pushed from 50,000 words to 1,000,000 words.
   * The reranker has been freed from its dependency on the Penn Treebank. The various data relations, originally wired deeply into their Makefile, are now generalized to the extent that we can call a training script and supply the training data as standard input. Not all parts of the Makefile have been generalized, yet.   * The reranker has been freed from its dependency on the Penn Treebank. The various data relations, originally wired deeply into their Makefile, are now generalized to the extent that we can call a training script and supply the training data as standard input. Not all parts of the Makefile have been generalized, yet.
Line 112: Line 114:
  
 Options: Options:
-''-nick da-delex'' +  * ''-nick da-delex'' 
-Assigns a nickname (''da-delex'' in this example) to the intermediate files the training procedure creates in the $PARSINGROOT subtree. Ensures that older models do not get overwritten. Is only needed if you want to reuse the intermediate files or if you want to train two different models in parallel. The resulting tgzipped model appears anyway on the standard output and it is your responsibility to save it. +    Assigns a nickname (''da-delex'' in this example) to the intermediate files the training procedure creates in the $PARSINGROOT subtree. Ensures that older models do not get overwritten. Is only needed if you want to reuse the intermediate files or if you want to train two different models in parallel. The resulting tgzipped model appears anyway on the standard output and it is your responsibility to save it. 
-''-reuse'' +  ''-reuse'' 
-Reuse old intermediate files, if available and up-to-date (make-wise). In other words, do not perform ''make clean'' in the beginning.+    Reuse old intermediate files, if available and up-to-date (make-wise). In other words, do not perform ''make clean'' in the beginning.
  
 If you have two tgzipped models and want to use the first-stage parser from the first model, plus the reranker from the second model, call If you have two tgzipped models and want to use the first-stage parser from the first model, plus the reranker from the second model, call

[ Back to the navigation ] [ Back to the content ]