Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:interset:common-problems [2007/03/06 22:18] zeman |
user:zeman:interset:common-problems [2008/04/04 15:48] zeman Proposal for future versions. |
||
---|---|---|---|
Line 49: | Line 49: | ||
Even if you know that your own decode() always sets '' | Even if you know that your own decode() always sets '' | ||
+ | ===== Chinese particles ===== | ||
+ | |||
+ | Chinese 的 (de) has a part of speech of its own in the Sinica treebank, '' | ||
+ | |||
+ | |||
+ | ===== Combinations of values in one feature structure ===== | ||
+ | |||
+ | The current version allows for storing arrays of values in one feature. For instance, we can say that a word is either in nominative or in accusative by assigning | ||
+ | |||
+ | <code perl> | ||
+ | |||
+ | However, we cannot define complex combinations of values of different features. For instance, if we assign | ||
+ | |||
+ | <code perl> | ||
+ | $f{number} = [" | ||
+ | |||
+ | all four combinations of the gender and number values are permitted. We cannot properly decode a tag that applies to either '' | ||
+ | |||
+ | The inability to describe value combinations also plays a role in the situation where one feature value of the physical tagset has to be decomposed into values of multiple features in Interset, and the decomposed value should be one of multiple values in an array. For instance, the '' | ||
+ | |||
+ | Similar situation is in '' | ||
+ | |||
+ | The correct solution would be to decode such tag into multiple parallel feature structures. Every structure would only contain single values, no arrays. This would remove one level of complexity inside the structures but add another level around the structure. We can consider making this change in a future version of Interset. There could be two interfaces to the decoding function: one that would output an array of (references to) feature structures, and the other that would output (reference to) just one feature structure but there would be an additional feature with reference to the next feature structure. The encoder would select the structure that requires the least modification to fit the target tagset. If the user can deal with more than one target tag, they would ask for encoding each of them separately. If the target tagset could accommodate alternate values in some features, the encoder could look at multiple structures at a time; it is unclear how this would be done. | ||
+ | |||
+ | Multiple alternate feature structures can also be stored in a packed form. There is only one structure. It has an additional feature called '' | ||
+ | |||
+ | <code perl>%f = | ||
+ | ( | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | ' | ||
+ | [ | ||
+ | {' | ||
+ | {' | ||
+ | ] | ||
+ | );</ |