# Differences

This shows you the differences between two versions of the page.

 courses:rg:2013:convolution-kernels [2013/03/11 18:42]dusek courses:rg:2013:convolution-kernels [2013/03/12 11:27] (current)popel x was not rendered Both sides previous revision Previous revision 2013/03/12 11:27 popel x was not rendered2013/03/11 18:54 dusek 2013/03/11 18:49 dusek 2013/03/11 18:42 dusek 2013/03/11 18:42 dusek 2013/03/11 18:31 dusek 2013/02/26 10:03 dusek vytvořeno Next revision Previous revision 2013/03/12 11:27 popel x was not rendered2013/03/11 18:54 dusek 2013/03/11 18:49 dusek 2013/03/11 18:42 dusek 2013/03/11 18:42 dusek 2013/03/11 18:31 dusek 2013/02/26 10:03 dusek vytvořeno Line 19: Line 19: * They are able to "generate" fake inputs, but this feat is not used very often. * They are able to "generate" fake inputs, but this feat is not used very often. * Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields * Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields - * **Discriminative models** do everything in one-step -- they learn the posterior P(y|x) as a function of some features of x. + * **Discriminative models** do everything in one-step -- they learn the posterior P(y|x) as a function of some features of x. * They are simpler and can use many more features, but are prone to missing inputs. * They are simpler and can use many more features, but are prone to missing inputs. - * Examples: SVM, Logistic Regression, Neuron. sítě, k-NN, Conditional Random Fields + * Examples: SVM, Logistic Regression, Neural network, k-NN, Conditional Random Fields - Each CFG rule generates just one level of the derivation tree. Therefore, using "standard" nonterminals, it is not possible to generate e.g. this sentence: - Each CFG rule generates just one level of the derivation tree. Therefore, using "standard" nonterminals, it is not possible to generate e.g. this sentence: * ''(S (NP (PRP He)) (VP (VBD saw)(NP (PRP himself))))'' * ''(S (NP (PRP He)) (VP (VBD saw)(NP (PRP himself))))'' Line 33: Line 33: - = \sum_{n_a \in N_a}\sum_{n_b \in N_b}\sum_i I_i(n_b)\cdot I_i(n_a) (change summation order) - = \sum_{n_a \in N_a}\sum_{n_b \in N_b}\sum_i I_i(n_b)\cdot I_i(n_a) (change summation order) - = \sum_{n_a \in N_a}\sum_{n_b \in N_b}C(n_a, n_b) (definition of C ) - = \sum_{n_a \in N_a}\sum_{n_b \in N_b}C(n_a, n_b) (definition of C ) - - + - Convolution is defined like this: (f*g)_k = \sum_i f_i g_{k-i}, so it measures the presence of structures that //complement// each other. Here, we have a measure of structures that are //similar//. So it is something different. But the main idea is the same -- we can combine smaller structures (kernels) into more complex ones. + - There is a (tiny) error in the last formula of Section 3. You cannot actually multiply tree parses, so it should read: \bar{w}^{*} \cdot h(\mathbf{x}) = \dots + + ==== Report ==== + + We discussed the answers to the questions most of the time. Other issues raised in the discussion were: + + * **Usability** -- the approach is only usable for //reranking// the output of some other parser. + * **Scalability** -- they only use 800 sentences and 20 candidates per sentence for training. We believe that for large data (milions of examples) this will become too complex. + * **Evaluation** -- it looks as if they used a non-standard evaluation metric to get "better" results. The standard here would be F1-score.

[ Back to the navigation ] [ Back to the content ] 