[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Tag Set Drivers

This is an overview of existing tag set drivers. Tag-set or language specific issues are described here.

Chinese

The only corpus covered so far is the Sinica Treebank, converted to the CoNLL format. The tag set lacks comprehensive documentation (almost zero supplied with CoNLL data, and only a little found in the web). The tags do not encode any morphological features. Instead, there is a comprehensive (but undocumented) hierarchy of word classes and subclasses. Most of the information encoded in the tags cannot be mapped to Interset.

Pronouns are special cases of nouns. Numerals are special cases of determiners.

There are many sorts of particles, some of which have special tags (DE).

Work started: 21.10.2007
Work finished: 5.3.2008
Total work time: 21:30 h

Most of the time was dedicated to extracting, transcribing and translating examples in an effort to understand the tag classes.


[ Back to the navigation ] [ Back to the content ]