Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
user:ptacek:dbmt [2007/05/07 19:15] ptacek |
user:ptacek:dbmt [2007/05/07 19:50] (current) ptacek |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== DBMT ====== | ====== DBMT ====== | ||
- | Czech-English Dependency-based Machine Translation | + | //Czech-English Dependency-based Machine Translation |
+ | //PCEDT BLEU dtest/ | ||
+ | |||
+ | je to Magenta pipeline, jen generovani je rule-based (misto statistikeho tree-to-tree transducing a pak LM) | ||
+ | |||
+ | na českém prekladu Penn Treebanku | ||
+ | |||
+ | - tokenizace a tagging | ||
+ | - parsing do a_trees [Hajic 98, Charniak 99] | ||
+ | - afun assigment [ZZ 02] | ||
+ | - a_tree -> t_tree [Bohmova 01] | ||
+ | - func assigment C4.5 [ZZ 02] | ||
+ | - slovnik pomoci GIZA++ [Och and Nay 02] // one most probable translation, | ||
+ | - generator | ||
+ | |||
+ | ====== Generator ====== | ||
+ | dostane TGTS bez tfa, a co koreference :?: | ||
+ | |||
+ | == 1. determining contextual boundness == | ||
+ | povazuji v CZ od slovesa vlevo jako CB -> definite article | ||
+ | od slovesa vpravo nezapojene -> indefinite article | ||
+ | |||
+ | |||
+ | == 2. reordering of constituents == | ||
+ | podle CB se z ACT|PAT|ADDR vyberou Sb | ||
+ | declarative sentence: CB adjuncts + Sb + V + direct/ | ||
+ | |||
+ | == 3. generation of verb forms == | ||
+ | pasivum/ | ||
+ | |||
+ | == 4. insertion of prepositions and articles == | ||
+ | preps: podle české a podle EN nounu | ||
+ | |||
+ | articles: definite při postmodified NP, premodified by superlative or ordinal num | ||
+ | article prevented: uncountable + proper nouns nebo predetermination by possessive a demonstrative pronouns | ||
+ | |||
+ | == 5. morphology == | ||
+ | asi ne morpha :!: | ||
+ | hledaji v tabulce | ||
+ | ^ word form ^ morphological tag ^ lemma ^ | ||
+ | kdyz nenajdou tak somple rules | ||
+ | taky vokalizace pro indefinite article | ||
- | je to Magenta pipeline, jen generovani je rule-based misto statistikeho tree-to-tree transducing a pak LM | ||
- | - determining contextual boundness \\ povazuji v CZ od slovesa vlevo jako CB + definite article, od slovesa vpravo nezapojene -> indefinite article | ||
- | - reordering of constituents | ||
- | - generation of verb forms | ||
- | - insertion of prepositions and articles | ||
- | - morphology | ||