[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

This is an old revision of the document!


Table of Contents

Interset Features and Values

pos

Part of speech.

Value Description
noun noun
adj adjective
det determiner
pron pronoun
num numeral, number
verb verb
adv adverb
prep preposition
inf infinitive mark: English “to,” Danish “at,” Swedish “att.” Sometimes tagged as particle, sometimes as conjunction, sometimes has its own tag.
conj conjunction
part particle
int interjection
punc punctuation

subpos

Detailed part of speech.

Value Main pos Description
prop noun proper noun (“George”, “Bush”, “Paris”)
pdt adj predeterminer (adjectival word that can stand before an article, such as all in all the flowers)
pers pron personal pronoun
clit pron clitic personal pronoun (Czech “mě”, “ti”, “mu”, “se”, “si”…)
recip pron reciprocal pronoun (Danish “hinanden”)
digit num number written using digits
roman num number written using Roman numerals (“XIV”)
card num cardinal number
ord num ordinal number
mult num multiplier number (“five times”)
frac num fraction (“one fifth”)
aux verb, part auxiliary verb used to construct complex verb forms (Czech “být”, English “have”, “will”)
mod verb modal verb (German “dürfen”, “können”, “mögen”, “müssen”, “sollen”, “wollen”, “wissen”; Czech “muset”, “mít”, “moci”, “smět”, “umět”, “chtít”; English “must”, “can”, “shall”); note that adverbs and particles have their own mod subpos
intr verb intransitive verb (does not have object)
tran verb transitive verb (does have object)
verbconj verb finite verb with the enclitic “-ť” (Czech “neboť” = “because”)
man adv adverb of manner
loc adv adverb of location
tim adv adverb of time
deg adv adverb of quantity or degree
cau adv adverb of cause (“why”)
mod adv, part modal particle (Bulgarian “май” = “possibly”, “нека” = “let”; Czech “ať”, “kéž”, “nechť”) or adverb of modal nature (Bulgarian “апропо”); note that verbs have their own mod subpos
ex adv existential “there” in English
voc prep vocalized preposition (Czech “ve” as opposed to base form “v”)
preppron prep preposition and pronoun in one word (Czech “proň” = “pro něj”, “nač” = “na co”)
comprep prep first part of compound preposition (Czech “nehledě na”, “vzhledem k”)
coor conj coordinating conjunction
sub conj subordinating conjunction
emp part particle of emphasis (Bulgarian “даже” = “even”)
sent punc artificial sentence root node, beginning of sentence

punclass

Punctuation class.

Value Description
peri period at the end of sentence; in Penn tagset, includes quotation and exclamation
qest question mark
excl exclamation mark
quot quotation marks
brck bracket
comm comma
colo colon; in Penn tagset, “:” is in fact tag for generic other punctuation
semi semicolon
dash dash

puncside

Distinguishes between initial and final form of pairwise punctuation (brackets, qutation marks). Note that “initial” and “final” are better terms than “left” and “right”. The latter would be confusing in languages writing from right to left, like Arabic.

Value Description
ini initial (left bracket in English texts)
fin final (right bracket in English texts)

synpos

Does the pronoun or numeral behave syntactically as a noun, adjective, or adverb?

Value Description
subst substantive (like a noun)
attr attributive (like an adjective)
adv adverbial (like an adverb)

poss

Is this a possessive adjective or pronoun?

Value Description
poss possessive

reflex

Is this a reflexive pronoun?

Value Description
reflex reflexive

negativeness

Distinguishes also negative pronouns like “nothing.”

Value Description
pos positive, affirmative
neg negative

definiteness

Distinguishes also determinative (“this”) and indefinite (“some”) pronouns.

Value Description
col collective: all, every
ind indefinite
def definite
red reduced: used in construct state in Arabic. If two nouns are in genitive relation, the first one has “reduced definiteness,” the second is the genitive.
wh interrogative or relative (deprecated!)
int only interrogative
rel only relative

subjobj

Distinguishes subject and object forms of pronouns in, e.g., Swedish.

Value Description
subj subject
obj object

foreign

Value Description
foreign foreign word (not a loan word but a citation in a foreign language — e.g., the title of a foreign book)

gender, possgender

Possgender is possessor's gender.

Value Description
masc masculine
fem feminine
com common, utrum
neut neuter

animateness

Value Description
anim animate
inan inanimate

number, possnumber

Possnumber is possessor's number.

Value Description
sing singular
dual dual
plu plural

case

Value Description
nom nominative
gen genitive
dat dative
acc accusative or oblique
voc vocative
loc locative
ins instrumental

compdeg

Degree of comparison.

Value Description
norm non-comparative, first degree (we hesitate to call it “positive”, since negative properties can be compared, too)
comp comparative, second degree
sup superlative, third degree
abs absolute superlative

person

Value Description
1 first (I, we)
2 second (you)
3 third (he, she, it, they)

politeness

Value Description
inf informal (Czech “ty/vy”, German “du/ihr”, Spanish “tú/vosotros”)
pol polite (Czech “vy”, German “Sie”, Spanish “usted”)

subcat

There are tag sets (e.g. Bulgarian CoNLL) that classify verbs as intransitive or transitive.

Value Description
intr intransitive verb
tran transitive verb

verbform

Value Description
fin finite
inf infinitive
sup supine (with motion verbs: “go do something”; infinitive used in languages where there is no supine)
part participle (present (“doing”), past (“done”), passive (Czech “udělán” distinguished from adjective “udělaný” by variant=short)), gerundive
trans transgressive, adverbial participle (modifies other verbs, behaves like adverb; Czech present “dělaje”, past “udělav”)
ger gerund (verbal noun)

mood

Value Description
ind indicative
imp imperative
sub subjunctive, conditional
jus jussive (přací)

tense

Value Description
past past
pres present
fut future

subtense

Finer classification of tenses, may not be available in all languages. (And in many languages, these tenses are built using auxiliaries, rather than special morphemes.) Having these separated from the main past-present-future distinction allows that drivers need not check for aorist/imperfect, if they know just one past tense.

Note that, unfortunately, imperfect tense is not always the same as past tense + imperfective aspect. For instance, in Bulgarian, there is lexical aspect, inherent in verb meaning, and grammatical aspect, which does not necessarily always match the lexical one. In main clauses, imperfective verbs can have imperfect tense and perfective verbs have perfect tense. However, both rules can be violated in embedded clauses. Aorist is aspect-neutral and can freely appear with both imperfective and perfective verbs.

Value Description
aor aorist
imp imperfect

aspect

Value Description
imp imperfect
perf perfect

voice

Value Description
act active
pass passive

abbr

Is this an abbreviation?

Value Description
abbr abbreviation

hyph

Is this a part of a hyphenated compound?

Value Description
hyph hyphenated prefix (“anglo-” in “anglo-saxon”)

style

Value Description
arch archaic, obsolete, rare
form formal, literary
norm normal, neutral
coll colloquial

variant

Allows for distinguishing between word forms that otherwise would share values of all features.

Value Description
short short form
long long form
0 variant form 0
1 variant form 1
2 variant form 2
3 variant form 3
4 variant form 4
5 variant form 5
6 variant form 6
7 variant form 7
8 variant form 8
9 variant form 9

tagset, other

The tagset feature identifies the source tag set driver. Value should be identical to the name of the driver that filled the feature values. Works together with the “other” feature.

Feature “other”
Any value or reference to array or hash is allowed. Serves to preserve information if the decoding driver happens to be the one who did the encoding. No other driver should expect anything meaningful here.
Only information that cannot be stored in other features should be stored here.
The apparently easiest approach — to store the complete original tag — would not work if the user needs to change feature values between decode() and encode().


[ Back to the navigation ] [ Back to the content ]