[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
courses:rg:2013:convolution-kernels [2013/03/11 18:31]
dusek
courses:rg:2013:convolution-kernels [2013/03/11 18:49]
dusek
Line 19: Line 19:
       * They are able to "generate" fake inputs, but this feat is not used very often.       * They are able to "generate" fake inputs, but this feat is not used very often.
       * Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields       * Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields
-    * **Diskriminative models** do everything in one-step -- they learn the posterior <latex>P(y|x)</latex> as a function of some features of <latex>x</latex>.+    * **Discriminative models** do everything in one-step -- they learn the posterior <latex>P(y|x)</latex> as a function of some features of <latex>x</latex>.
       * They are simpler and can use many more features, but are prone to missing inputs.       * They are simpler and can use many more features, but are prone to missing inputs.
       * Examples: SVM, Logistic Regression, Neuron. sítě, k-NN, Conditional Random Fields       * Examples: SVM, Logistic Regression, Neuron. sítě, k-NN, Conditional Random Fields
-  - +  - Each CFG rule generates just one level of the derivation tree. Therefore, using "standard" nonterminals, it is not possible to generate e.g. this sentence: 
 +    * ''(S (NP (PRP He)) (VP (VBD saw)(NP (PRP himself))))'' 
 +      * It could be modelled with an augmentation of the nonterminal labels. 
 +    * CFGs can't generate non-projective sentences. 
 +      * But they can be modelled using traces. 
 +  - The derivation is actually quite simple: 
 +    - <latex>h(T_a)\cdot h(T_b) = \sum_i h_i(T_a) \cdot h_i(T_b)</latex> -- (definition of the dot product) 
 +    - <latex>= \sum_i \left(\sum_{n_a \in N_a} I_i(n_a)\right) \left(\sum_{n_b \in N_b} I_i(n_b)\right)</latex> (from the definition of <latex>I</latex> in the paragraph above the formula) 
 +    - <latex>= \sum_i\sum_{n_a \in N_a}\sum_{n_b \in N_b} I_i(n_b)\cdot I_i(n_a)</latex> (since <latex>(a+b)(c+d) = ac+ad+bc+bd</latex>
 +    - <latex>= \sum_{n_a \in N_a}\sum_{n_b \in N_b}\sum_i I_i(n_b)\cdot I_i(n_a)</latex> (change summation order) 
 +    - <latex>= \sum_{n_a \in N_a}\sum_{n_b \in N_b}C(n_a, n_b)</latex> (definition of <latex> C </latex>
 +  - Convolution is defined like this: <latex>(f*g)_k = \sum_i f_i g_{k-i}</latex>, so it measures the presence of structures that //complement// each other. Here, we have a measure of structures that are //similar//. So it is something different. But the main idea is the same -- we can combine smaller structures (kernels) into more complex ones. 
 +  - There is a (tiny) error in the last formula of Section 3. You cannot actually multiply tree parses, so it should read: <latex>\bar{w}^{*} \cdot h(\mathbf{x}) = \dots</latex>

[ Back to the navigation ] [ Back to the content ]