====== DBMT ====== //Czech-English Dependency-based Machine Translation -- Čmejrek, Cuřín, and Havelka 03// //PCEDT BLEU dtest/etest: 0.1974 0.1704// je to Magenta pipeline, jen generovani je rule-based (misto statistikeho tree-to-tree transducing a pak LM) na českém prekladu Penn Treebanku - tokenizace a tagging [Hajic 98] - parsing do a_trees [Hajic 98, Charniak 99] - afun assigment [ZZ 02] - a_tree -> t_tree [Bohmova 01] - func assigment C4.5 [ZZ 02] - slovnik pomoci GIZA++ [Och and Nay 02] // one most probable translation, 1-2 as 1-1 multiword - generator ====== Generator ====== dostane TGTS bez tfa, a co koreference :?: == 1. determining contextual boundness == povazuji v CZ od slovesa vlevo jako CB -> definite article od slovesa vpravo nezapojene -> indefinite article == 2. reordering of constituents == podle CB se z ACT|PAT|ADDR vyberou Sb declarative sentence: CB adjuncts + Sb + V + direct/indirect Obj + UB adjuncts == 3. generation of verb forms == pasivum/aktivum se dela podle Sb funktoru == 4. insertion of prepositions and articles == preps: podle české a podle EN nounu articles: definite při postmodified NP, premodified by superlative or ordinal num article prevented: uncountable + proper nouns nebo predetermination by possessive a demonstrative pronouns == 5. morphology == asi ne morpha :!: hledaji v tabulce ^ word form ^ morphological tag ^ lemma ^ kdyz nenajdou tak somple rules taky vokalizace pro indefinite article