Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
ufal:tasks [2012/01/18 15:38] ufal |
ufal:tasks [2012/01/23 10:54] ufal |
||
---|---|---|---|
Line 5: | Line 5: | ||
=== Europarl tokenizer === | === Europarl tokenizer === | ||
- | * **info:** A sample rule-based tokenizer, can use a list of prefixes which are usually followed by a dot but don't break a sentence. Distributed as a part of the Europarl tools. | + | * **description:** A sample rule-based tokenizer, can use a list of prefixes which are usually followed by a dot but don't break a sentence. Distributed as a part of the Europarl tools. |
* **version: | * **version: | ||
* **author:** Philipp Koehn and Josh Schroeder | * **author:** Philipp Koehn and Josh Schroeder | ||
Line 12: | Line 12: | ||
* **languages: | * **languages: | ||
* **efficiency**: | * **efficiency**: | ||
+ | * **reference**: | ||
+ | |||
+ | @inproceedings{Koehn: | ||
+ | author = {Philipp Koehn}, | ||
+ | booktitle = {{Conference Proceedings: | ||
+ | pages = {79--86}, | ||
+ | title = {{Europarl: A Parallel Corpus for Statistical Machine Translation}}, | ||
+ | address = {Phuket, Thailand}, | ||
+ | year = {2005}} | ||
+ | |||
* **contact: | * **contact: | ||
+ | |||
+ | |||
+ | === Europarl tokenizer === | ||
+ | | **description: | ||
+ | | **version: | ||
+ | | **author:** | Philipp Koehn and Josh Schroeder | | ||
+ | | **licence: | ||
+ | | **url:** | http:// | ||
+ | | **languages: | ||
+ | | **efficiency**: | ||
+ | | **reference**: | ||
+ | @inproceedings{Koehn: | ||
+ | author = {Philipp Koehn}, | ||
+ | booktitle = {{Conference Proceedings: | ||
+ | pages = {79--86}, | ||
+ | title = {{Europarl: A Parallel Corpus for Statistical Machine Translation}}, | ||
+ | address = {Phuket, Thailand}, | ||
+ | year = {2005}} | ||
+ | | | ||
+ | | **contact: | ||
+ | |||
===== Language Identification ====== | ===== Language Identification ====== | ||
Line 23: | Line 54: | ||
===== Part-of-Speech Tagging ===== | ===== Part-of-Speech Tagging ===== | ||
+ | |||
+ | === POS Taggers integrated in Treex === | ||
+ | * Featurama | ||
+ | * Morce | ||
+ | * MxPost tagger | ||
+ | * Tree tagger | ||
+ | * TnT tagger | ||
+ | * Jan Hajič' | ||
+ | * a number of toy tagger prototypes (students' | ||
+ | |||
+ | === Details on Czech Tagging === | ||
+ | A Guide to Czech Language Tagging at UFAL http:// | ||
===== Lemmatization ===== | ===== Lemmatization ===== | ||
+ | |||
+ | === Lemmatizers integrated in Treex === | ||
+ | |||
+ | * Martin Popel' | ||
+ | * a number of toy lemmatizers for about ten langauges (students' | ||
+ | * for Czech, lemmatization is traditionally treated as a part of POS disambiguations, | ||
===== Analytical Parsing ===== | ===== Analytical Parsing ===== |