[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:interset:common-problems [2008/03/05 12:03]
zeman 的.
user:zeman:interset:common-problems [2008/04/04 13:54]
zeman
Line 53: Line 53:
 Chinese 的 (de) has a part of speech of its own in the Sinica treebank, ''DE''. The easiest approach is to decode it as a particle and remember its special nature using a new ''subpos'' value, or just storing it in the ''other'' feature. However, //de//'s usage could be compared to that of conjunctions. (It's not a coordinative conjunction, though. It connects two elements with different roles: often a possessor and the possessed object, e.g. 我的腦海.) Chinese 的 (de) has a part of speech of its own in the Sinica treebank, ''DE''. The easiest approach is to decode it as a particle and remember its special nature using a new ''subpos'' value, or just storing it in the ''other'' feature. However, //de//'s usage could be compared to that of conjunctions. (It's not a coordinative conjunction, though. It connects two elements with different roles: often a possessor and the possessed object, e.g. 我的腦海.)
  
 +===== Combinations of values in one feature structure =====
 +
 +The current version allows for storing arrays of values in one feature. For instance, we can say that a word is either in nominative or in accusative by assigning
 +
 +<code perl>$f{case} = ["nom", "acc"];</code>
 +
 +However, we cannot define complex combinations of values of different features. For instance, if we assign
 +
 +<code perl>$f{gender} = ["fem", "neut"];
 +$f{number} = ["sing", "plu"];</code>
 +
 +all four combinations of the gender and number values are permitted. We cannot properly decode a tag that applies to either ''fem+sing'' or ''neut+plu'' but not ''fem+plu'', nor ''neut+sing'' (real example taken from ''cs::pdt''). The only way to encode this is to exit our one-tag-at-a-time scope and create two parallel feature structures as the result of decoding. That would complicate using the feature structure(s) by the user, and also subsequent encoding into a physical tagset. Even the arrays that are already implemented make the system quite complex.
 +
 +The inability to describe value combinations also plays a role in the situation where one feature value of the physical tagset has to be decomposed into values of multiple features in Interset, and the decomposed value should be one of multiple values in an array. For instance, the ''cs::pdt'' gender ''I'' is decoded as ''gender = "masc", animateness = "anim"''. Now how shall we decode the physical gender ''T'', meaning disjunction of physical genders ''I'' and ''F'' (masculine inanimate or feminine)? ''gender = ["masc", "fem"], animateness = ["inan", ""]'' is not exactly the description of what's going on.

[ Back to the navigation ] [ Back to the content ]