Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
user:ptacek:dbmt [2007/05/07 19:13] ptacek |
user:ptacek:dbmt [2007/05/07 19:50] (current) ptacek |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== DBMT ====== | ====== DBMT ====== | ||
| - | Czech-English Dependency-based Machine Translation | + | //Czech-English Dependency-based Machine Translation |
| + | //PCEDT BLEU dtest/ | ||
| + | |||
| + | je to Magenta pipeline, jen generovani je rule-based (misto statistikeho tree-to-tree transducing a pak LM) | ||
| + | |||
| + | na českém prekladu Penn Treebanku | ||
| + | |||
| + | - tokenizace a tagging | ||
| + | - parsing do a_trees [Hajic 98, Charniak 99] | ||
| + | - afun assigment [ZZ 02] | ||
| + | - a_tree -> t_tree [Bohmova 01] | ||
| + | - func assigment C4.5 [ZZ 02] | ||
| + | - slovnik pomoci GIZA++ [Och and Nay 02] // one most probable translation, | ||
| + | - generator | ||
| + | |||
| + | ====== Generator ====== | ||
| + | dostane TGTS bez tfa, a co koreference :?: | ||
| + | |||
| + | == 1. determining contextual boundness == | ||
| + | povazuji v CZ od slovesa vlevo jako CB -> definite article | ||
| + | od slovesa vpravo nezapojene -> indefinite article | ||
| + | |||
| + | |||
| + | == 2. reordering of constituents == | ||
| + | podle CB se z ACT|PAT|ADDR vyberou Sb | ||
| + | declarative sentence: CB adjuncts + Sb + V + direct/ | ||
| + | |||
| + | == 3. generation of verb forms == | ||
| + | pasivum/ | ||
| + | |||
| + | == 4. insertion of prepositions and articles == | ||
| + | preps: podle české a podle EN nounu | ||
| + | |||
| + | articles: definite při postmodified NP, premodified by superlative or ordinal num | ||
| + | article prevented: uncountable + proper nouns nebo predetermination by possessive a demonstrative pronouns | ||
| + | |||
| + | == 5. morphology == | ||
| + | asi ne morpha :!: | ||
| + | hledaji v tabulce | ||
| + | ^ word form ^ morphological tag ^ lemma ^ | ||
| + | kdyz nenajdou tak somple rules | ||
| + | taky vokalizace pro indefinite article | ||
| - | je to Magenta pipeline, jen generovani je rule-based misto statistikeho tree-to-tree transducing a pak LM | ||
| - | - determining contextual boundness \\delaji v CZ od slovesa vlevo zapojene, od slovesa vpravo nezapojene -> indefinite article | ||
| - | - reordering of constituents | ||
| - | - generation of verb forms | ||
| - | - insertion of prepositions and articles | ||
| - | - morphology | ||
