====== Pronouns, determiners and similar words ====== Most languages and tagsets have personal pronouns, i.e. words like "I", "you", "he", "she", "it", "we", "they". Various interrogative and relative function words (wh-words in English) are often also considered pronouns. In addition, grammars of some languages distinguish //determiners// while others prefer to categorize the same thing as a sort of pronouns. According to the definition of EAGLES, **pronoun** is a function word that //replaces// a noun phrase, while **determiner** is a function word that //modifies// a noun phrase. As a result, proper EAGLES-pronouns behave like nouns and determiners behave like adjectives. Note that possessive pronouns (i.e. "my", "your", "his", "her", "its", "our", "their"), also found in many languages, are personal possessive determiners in the sense of the EAGLES definition. Because tagsets often disagree in what is pronoun, what is determiner etc., it is difficult to find a unifying approach. We decided to limit the number of the major parts of speech in order to minimize the cases where a word would end up with an empty part of speech. If there were a part of speech called //determiner//, drivers of tagsets not having determiners would either have to check whether ''pos = det'' during encoding, or they would fall back into a residual word class. On the other hand, if we tag determiners as special cases of adjectives (which is what DZ Interset does), such drivers will simply encode determiners as adjectives (provided they have adjectives—but these are much more common). We also followed this solution with pronouns because of the following reasons: * Although pronouns are found in most tagsets, there is much controversy about the precise extent of that category. * Some tagsets allow for distinguishing between //substantive// and //attributive// pronouns (roughly those that behave like nouns and those that behave like adjectives). Assigning pronouns to nouns and adjectives respectively helps preserve that distinction. * Some of the features that distinguish pronouns from real nouns and adjectives (interrogativeness, for instance) are found with adverbs and numerals as well and thus it makes sense to separate them. ===== Interset solution ===== The ''pos'' feature should be set to ''noun'' for real (according to the EAGLES definition) pronouns, and to ''adj'' for determiners (or attributive pronouns). Default value is ''noun'' and should be used if the correct value cannot be figured out from the source tagset (for instance, the tagset has only one tag for all pronouns and determiners; or there are words that can occur in both pronominal and determinative contexts, and the tagset does not distinguish these contexts). The ''prontype'' feature distinguishes personal, demonstrative, interrogative, relative, indefinite, negative and other pronouns. It also distinguishes pronouns and determiners from real nouns and adjectives, from which it follows: * It should never be empty if the word is a pronoun or a determiner. If the source tagset does not specify the subclass of the pronoun, the default value ''prs'' (personal) should be used. * The use of ''prontype'' is not restricted to nouns and adjectives. Some adverbs and numerals are good candidates, too. However, the driver should treat this feature carefully because it is its means of restoring pronouns if they are to be encoded. The more word classes have ''prontype'' the more difficult it is to recognize real pronouns, as defined by the driver. Other features (''poss'', ''reflex'') are orthogonal to those mentioned above, although many tagsets encode them together with the main pronoun type. However, we can have relative possessive determiners ("whose"), reflexive possessive pronouns/determiners (Czech "svůj") etc. ==== Drawbacks ==== Our solution makes it difficult to recognize a pronoun for a target tagset that can encode it. It is especially annoying if the tagset needs to query the part of speech many times in order to select the correct features. Under such circumstances the probably best thing to do is to define a tagset-specific ''is_pronoun()'' function, e.g. as $f{pos} =~ m/^(noun|adj)$/ && $f{prontype} ne "" ===== Approaches taken in various tagsets ===== ==== cs::pdt ==== There are no determiners, just pronouns. I.e., EAGLES-defined determiners are tagged as pronouns. ==== cs::multext ==== There are no determiners, just pronouns. I.e., EAGLES-defined determiners are tagged as pronouns. ==== bg::conll ==== One of the broadest (but also most systematic) pronoun categories. Pronouns include EAGLES-defined determiners and also interrogative and indefinite numerals and pronominal adverbs. Subcategories: personal, possessive, demonstrative, interrogative, relative, collective, indefinite, negative. Orthogonally to that, the tagset specifies what the pronoun refers to (entity, number, location, time...) ==== en::penn ==== ''DT'' = determiner ("a", "the", "some") ''PDT'' = predeterminer ("all" in "all the flowers", "both" in "both his children") ''PRP'' = personal pronoun ("I", "you", "he", "she", "it", "we", "they") ''PRP$'' = possessive pronoun ("my", "your", "his", "her", "its", "our", "their") ''WDT'' = wh-determiner ("which") ''WP'' = wh-pronoun ("who") ''WP$'' = possessive wh-pronoun ("whose") There are also wh-adverbs (''WRB'', e.g. "where", "when", "how", as opposed to adverbs, ''RB''). ==== de::stts ==== ''PPER'' = irreflexive personal pronoun ("ich", "du", "er", ..., "mir", "mich", ...); also "meiner", if used as genitive of "ich", not as a possessive pronoun ''PRF'' = reflexive personal pronoun ("mir", "mich" etc. if used reflexively as in "ich freue mich daran"; also "einander" in "sie mögen sich einander") Other pronouns systematically distinguish substitutive usage (EAGLES-pronouns) from attributive usage (EAGLES-determiners) and adverbial usage (pronominal adverbs). Often the same word can be tagged either substitutively or attributively depending on context. There are personal, possessive, demonstrative, interrogative, relative and indefinite pronouns. Pronominal adverbs ("wann", "wo", "warum", "worüber", ...) are categorized under pronouns, not adverbs. Articles ("der", "die", "das", ...) have their own tag ''ART'' so they are different from demonstrative pronouns. ==== da::conll ==== There are personal, possessive, reciprocal, demonstrative, indefinite and interrogative/relative pronouns. No separate category for determiners. ==== sv::hajic ==== Determiners (tag starts with ''D'') and pronouns (tag starts with ''P''). Subcategories: personal, indefinite, interrogative and possessive. ==== sv::mamba ==== ''EN'' = indefinite article or numeral "en", "ett" (one) ''PO'' = pronoun ''TP'' = totality pronoun ==== pt::conll ==== Articles ("a", "as", "o", "os", "uma", "um") are tagged ''art''. Pronouns ("que", "outro", "ela", "certo", "o", "algum", "todo", "nós"...) have three main subclasses: Personal pronouns ("ela", "elas", "ele", "eles", "eu", "nós", "se", "tu", "você", "vós") Determiner-pronouns ("algo", "ambos", "bastante", "demais", "este", "menos", "nosso", "o", "que", "todo_o") Independent pronouns ("algo", "aquilo", "cada_qual", "o", "o_que", "que", "todo_o_mundo", "um_pouco") ==== ar::conll ==== Pronouns are demonstrative, relative, personal/possessive. There are interrogative (''FI'') and negative (''FN'') particles but I am not sure whether and how they relate to WH pronouns in other languages. ==== zh::conll ==== Pronouns form a subclass of nouns (''Nh''). Determiners and cardinal numbers are in the same group (''Ne''): ''Nep'' = anaphoric determiner ("this", "that") ''Neq'' = classifying determiner ("much", "half") ===== Other Sources ===== Helbig-Busch, p. 357-11 (German grammar for foreigners) * Artikelwörter * Artikel: "der", "ein" * Adjektivische Demonstrativpronomen: "dieser", "ein solcher", "jener" * Adjektivische Possessivpronomen: "mein", "dessen", "wessen" * Adjektivische Interrogativpronomen: "welcher", "welch ein" * Adjektivische Indefinitpronomen: "jeder", "mancher", "aller", "kein" ''Nes'' = specific determiner ("you", "shang", "ge"=every) ''Neu'' = numeric determiner ("one", "two", "three")