[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
ufal:tasks [2012/01/19 12:01]
ufal
ufal:tasks [2012/01/19 12:13]
ufal
Line 13: Line 13:
   * **efficiency**: NA    * **efficiency**: NA 
   * **reference**:    * **reference**: 
 +
 +  @inproceedings{Koehn:2005,
 +  author = {Philipp Koehn},
 +  booktitle = {{Conference Proceedings: the tenth Machine Translation Summit}},
 +  pages = {79--86},
 +  title = {{Europarl: A Parallel Corpus for Statistical Machine Translation}},
 +  address = {Phuket, Thailand},
 +  year = {2005}}
 +
   * **contact:**   * **contact:**
 +
 +
 +=== Europarl tokenizer ===
 +| **description:** | A sample rule-based tokenizer, can use a list of prefixes which are usually followed by a dot but don't break a sentence. Distributed as a part of the Europarl tools. |
 +| **version:** | v6 (Jan 2012)  |
 +| **author:** | Philipp Koehn and Josh Schroeder |
 +| **licence:** | free |
 +| **url:** | http://www.statmt.org/europarl/ |
 +| **languages:** | in principle applicable to all languages using space-separated words; nonbreaking prefixes available for DE, EL, EN, ES, FR, IT, PT, SV. |
 +| **efficiency**: | NA  |
 +| **reference**: |
 +  @inproceedings{Koehn:2005,
 +  author = {Philipp Koehn},
 +  booktitle = {{Conference Proceedings: the tenth Machine Translation Summit}},
 +  pages = {79--86},
 +  title = {{Europarl: A Parallel Corpus for Statistical Machine Translation}},
 +  address = {Phuket, Thailand},
 +  year = {2005}}
 +|
 +| **contact:** | |
 +
  
 ===== Language Identification ====== ===== Language Identification ======

[ Back to the navigation ] [ Back to the content ]