[ Skip to the content ]

[ Back to the navigation ]

You are here: start » user » ptacek » dbmt

This is an old revision of the document!

DBMT

Czech-English Dependency-based Machine Translation

je to Magenta pipeline, jen generovani je rule-based misto statistikeho tree-to-tree transducing a pak LM

na českém prekladu Penn Treebanku

tokenizace a tagging [Hajic 98]
parsing do a_trees [Hajic 98, Charniak 99]
afun assigment [ZZ 02]
a_tree → t_tree [Bohmova 01]
func assigment C4.5 [ZZ 02]
slovnik pomoci GIZA++ [Och and Nay 02] one most probable translation, 1-2 as 1-1 multiword

====== Generator ======
dostane TGTS bez tfa, a co koreference

== 1. determining contextual boundness ==
povazuji v CZ od slovesa vlevo jako CB → definite article
od slovesa vpravo nezapojene → indefinite article

== 2. reordering of constituents ==
podle CB se z ACT|PAT|ADDR vyberou Sb
declarative sentence: CB adjuncts + Sb + V + direct/indirect Obj + UB adjuncts

== 3. generation of verb forms ==
pasivum/aktivum se dela podle Sb funktoru

== 4. insertion of prepositions and articles ==
preps: podle české a podle EN nounu

articles: definite při postmodified NP, premodified by superlative or ordinal num
article prevented: uncountable + proper nouns nebo predetermination by possessive a demonstrative pronouns

== 5. morphology ==
asi ne morpha
hledaji v tabulce
^ word form ^ morphological tag ^ lemma ^
kdyz nenajdou tak somple rules
taky vokalizace pro indefinite article

[ Back to the navigation ] [ Back to the content ]