[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
ufal:tasks [2012/01/23 10:54]
ufal
ufal:tasks [2012/01/23 11:15] (current)
ufal
Line 44: Line 44:
 | **contact:** | | | **contact:** | |
  
 +=== Tokenizers integrated in Treex ===
 +* rule-based (reg.exp.) tokenizers
 +* trainable tokenizer TextSeg
  
 ===== Language Identification ====== ===== Language Identification ======
 +Martin Majliš's language identifier (covers about 100 languages) http://wiki.ufal.ms.mff.cuni.cz/~majlis/publications/master-thesis.pdf
  
 ===== Sentence Segmentation ===== ===== Sentence Segmentation =====
 +=== Segmenters integrated in Treex ===
 +* rule-based segmenters
 +* TextSeg (trainable)
  
 ===== Morphological Segmentation ===== ===== Morphological Segmentation =====
  
 ===== Morphological Analysis ===== ===== Morphological Analysis =====
 +=== Morphological Analyzers integrated in Treex ===
 +* Jan Hajič's Czech morphological analyzer
 +* toy analyzers for about ten languages (students' homeworks)
  
 ===== Part-of-Speech Tagging ===== ===== Part-of-Speech Tagging =====
Line 68: Line 78:
  
 ===== Lemmatization ===== ===== Lemmatization =====
- 
 === Lemmatizers integrated in Treex === === Lemmatizers integrated in Treex ===
- 
 * Martin Popel's lemmatizer for English * Martin Popel's lemmatizer for English
 * a number of toy lemmatizers for about ten langauges (students' homeworks) * a number of toy lemmatizers for about ten langauges (students' homeworks)
Line 76: Line 84:
  
 ===== Analytical Parsing ===== ===== Analytical Parsing =====
 +=== Analytical parsers integrated in Treex ===
 +* Ryan McDonald's MST parser
 +* Rudolf Rosa's MST parser
 +* MALT parser
 +* ZPar
 +* Stanford parser
 +
 +=== Details on Czech parsing ===
 +A Complete Guide to Czech Language Parsing http://ufal.mff.cuni.cz/czech-parsing/
 +
  
 ===== Tectogrammatical Parsing ===== ===== Tectogrammatical Parsing =====
 +=== Conversion of analytical trees to tectogrammatical trees integrated in Treex ===
 +* a scenario for rule-based tree transformation
 +* Ondřej Dušek's tools for functor assignment trained on PDT and PCEDT
  
 ===== Named Entity Recognition ===== ===== Named Entity Recognition =====
 +=== NE recognizers integrated in Treex ===
 +* Jana Straková's SVM based recognizer for Czech http://www.aclweb.org/anthology/W/W09/W09-3538.pdf
 +* Stanford Named Entity Recognizer for Czech
  
 ===== Machine Translation ===== ===== Machine Translation =====
 +
 +=== MT implemented in Treex ===
 +* elaborated English->Czech tecto-based translation
 +* prototype of Czech->English tecto-based translation
  
 ===== Coreference resolution ===== ===== Coreference resolution =====
 +=== Coreference resolvers integrated in Treex ===
 +* simple rule-based baseline resolvers for Czech and English
 +* Michal Novák's trainable resolvers
 +* Ngụy Giang Linh's trainable (perceptron-based] resolver
  
 ===== Spell Checking ===== ===== Spell Checking =====

[ Back to the navigation ] [ Back to the content ]