[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

To do

Infrastructure

Strict encoding

The encoder should be able to work in two modes:

Although this requirement has been known since the very beginning of this work, no driver so far implements strict encoding. It would almost double the work needed to write the encode() function and I have not needed the strict conversions so far.

The process of strict encoding could be automated using the list() function of the driver. A service function could decode every possible tag (as listed by list()) and build a graph of feature values. The strict encoding would correspond to a path through the graph. Every node in the graph would correspond to a partially encoded tag. The set of edges leaving the node would be pruned so as not to allow any unexpected combination of features.

The graph would not be constructed in full. Such a process would be too costly. Instead, every feature would be given a numeric priority, and there would be a total ordering of the features according to their priorities. The encoding process would always consider the features with higher priority first, thus the graph would be a tree with as many leaves as there are tags in the tagset.

How will the strict encoding look in practice? A service function will read the feature tree and a set of feature values to be encoded. The function will traverse the tree and adjust the feature values to match a tag from the list. Then the normal encode() function of the driver will be called. A prerequisite is that the driver keeps the condition encode(decode(x))=x. This has not been tested so far but the drivers indeed should conform to it.

2 druhy striktního kódování:

Service functions

Features and values

Specific drivers

Paper notes


[ Back to the navigation ] [ Back to the content ]