[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Overview of NLP/CL tools available at UFAL

Tokenization (word segmentation)

Segmentation of text into tokens (words, punctuation marks, etc.). For languages using space-separated words (English. Czech, etc), the taks is relatively easy. For other languages (Chinese, Japanese, etc.) the task is much more difficult.

Europarl tokenizer

Language Identification

Sentence Segmentation

Morphological Segmentation

Morphological Analysis

Part-of-Speech Tagging

Lemmatization

Analytical Parsing

Tectogrammatical Parsing

Named Entity Recognition

Machine Translation

Coreference resolution

Spell Checking

Text Similarity

Recasing

Diacritic Reconstruction

Other tasks

Word Sense Disambiguation
Relationship Extraction
Topic Segmentation
Information Retrieval
Information Extraction
Text Sumarization
Speech Reconstruction
Question Answering
Sentiment Analysis


[ Back to the navigation ] [ Back to the content ]