[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Overview of NLP/CL tools available at UFAL

Tokenization (word segmentation)

Segmentation of text into tokens (words, punctuation marks, etc.). For languages using space-separated words (English. Czech, etc), the taks is relatively easy. For other languages (Chinese, Japanese, etc.) the task is much more difficult.

Europarl tokenizer

Language Identification

Sentence Segmentation

Morphological Segmentation

Morphological Analysis

Part-of-Speech Tagging

Lemmatization

Analytical Parsing

Tectogrammatical Parsing

Named Entity Recognition

Machine Translation

Coreference resolution

Spell Checking

Text Similarity

Recasing

Rekonstrukce diakritiky


[ Back to the navigation ] [ Back to the content ]