Differences

This shows you the differences between two versions of the page.

--- ufal:tasks [2012/01/18 15:38]
ufal
+++ ufal:tasks [2012/01/23 10:54]
ufal
@@ Line 5: / Line 5: @@
 === Europarl tokenizer ===
-  * **info:** A sample rule-based tokenizer, can use a list of prefixes which are usually followed by a dot but don't break a sentence. Distributed as a part of the Europarl tools.
+  * **description:** A sample rule-based tokenizer, can use a list of prefixes which are usually followed by a dot but don't break a sentence. Distributed as a part of the Europarl tools.
   * **version:** v6 (Jan 2012)
   * **author:** Philipp Koehn and Josh Schroeder
@@ Line 12: / Line 12: @@
   * **languages:** in principle applicable to all languages using space-separated words; nonbreaking prefixes available for DE, EL, EN, ES, FR, IT, PT, SV.
   * **efficiency**: NA
+  * **reference**:
+  @inproceedings{Koehn:2005,
+  author = {Philipp Koehn},
+  booktitle = {{Conference Proceedings: the tenth Machine Translation Summit}},
+  pages = {79--86},
+  title = {{Europarl: A Parallel Corpus for Statistical Machine Translation}},
+  address = {Phuket, Thailand},
+  year = {2005}}
   * **contact:**
+=== Europarl tokenizer ===
+| **description:** | A sample rule-based tokenizer, can use a list of prefixes which are usually followed by a dot but don't break a sentence. Distributed as a part of the Europarl tools. |
+| **version:** | v6 (Jan 2012)  |
+| **author:** | Philipp Koehn and Josh Schroeder |
+| **licence:** | free |
+| **url:** | http://www.statmt.org/europarl/ |
+| **languages:** | in principle applicable to all languages using space-separated words; nonbreaking prefixes available for DE, EL, EN, ES, FR, IT, PT, SV. |
+| **efficiency**: | NA  |
+| **reference**: |
+  @inproceedings{Koehn:2005,
+  author = {Philipp Koehn},
+  booktitle = {{Conference Proceedings: the tenth Machine Translation Summit}},
+  pages = {79--86},
+  title = {{Europarl: A Parallel Corpus for Statistical Machine Translation}},
+  address = {Phuket, Thailand},
+  year = {2005}}
+|
+| **contact:** | |
 ===== Language Identification ======
@@ Line 23: / Line 54: @@
 ===== Part-of-Speech Tagging =====
+=== POS Taggers integrated in Treex ===
+  * Featurama
+  * Morce
+  * MxPost tagger
+  * Tree tagger
+  * TnT tagger
+  * Jan Hajič's tagger
+  * a number of toy tagger prototypes (students' assignments) for about ten languages
+=== Details on Czech Tagging ===
+A Guide to Czech Language Tagging at UFAL  http://ufal.mff.cuni.cz/czech-tagging/
 ===== Lemmatization =====
+=== Lemmatizers integrated in Treex ===
+* Martin Popel's lemmatizer for English
+* a number of toy lemmatizers for about ten langauges (students' homeworks)
+* for Czech, lemmatization is traditionally treated as a part of POS disambiguations, so almost all Czech taggers are capable of lemmatization
 ===== Analytical Parsing =====

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences