This is an old revision of the document!
DBMT
Czech-English Dependency-based Machine Translation – Čmejrek, Cuřín, and Havelka 03
je to Magenta pipeline, jen generovani je rule-based (misto statistikeho tree-to-tree transducing a pak LM)
na českém prekladu Penn Treebanku
- tokenizace a tagging [Hajic 98]
- parsing do a_trees [Hajic 98, Charniak 99]
- afun assigment [ZZ 02]
- a_tree → t_tree [Bohmova 01]
- func assigment C4.5 [ZZ 02]
- slovnik pomoci GIZA++ [Och and Nay 02] one most probable translation, 1-2 as 1-1 multiword
- generator
====== Generator ======
dostane TGTS bez tfa, a co koreference
== 1. determining contextual boundness ==
povazuji v CZ od slovesa vlevo jako CB → definite article
od slovesa vpravo nezapojene → indefinite article
== 2. reordering of constituents ==
podle CB se z ACT|PAT|ADDR vyberou Sb
declarative sentence: CB adjuncts + Sb + V + direct/indirect Obj + UB adjuncts
== 3. generation of verb forms ==
pasivum/aktivum se dela podle Sb funktoru
== 4. insertion of prepositions and articles ==
preps: podle české a podle EN nounu
articles: definite při postmodified NP, premodified by superlative or ordinal num
article prevented: uncountable + proper nouns nebo predetermination by possessive a demonstrative pronouns
== 5. morphology ==
asi ne morpha
hledaji v tabulce
^ word form ^ morphological tag ^ lemma ^
kdyz nenajdou tak somple rules
taky vokalizace pro indefinite article