[ Skip to the content ]

[ Back to the navigation ]

You are here: start » courses » rg » 2013 » convolution-kernels

This is an old revision of the document!

Table of Contents

Michael Collins, Nigel Duffy: Convolution kernels for natural language
- - Questions
  - Answers

Michael Collins, Nigel Duffy: Convolution kernels for natural language

Questions

What is a generative model, what is a discriminative model and what is their main difference?
What are the “fairly strong independence assumptions” in PCFG? Come up with an example tree that can't be modelled by a PCFG.
Derive and explain the formula for h(T1)*h(T2) on page 3 at the bottom.
What is a convolution? Why are “convolution” kernels called like this?
Find an error in one of the formulae in the paper.

Answers

- Generative models use a two-step setup. They learn class-conditional (likelihood) <latex>P(x|y)</latex>, prior <latex>P(y)</latex> and use the Bayes rule to obtain the posterior.
  - they learn the joint distributions: marginalize P(y), condition P(y|x) = P(x,y) / P(x)
  - They learn more than is actually needed, but are not prone to partially missing input data.
  - They are able to “generate” fake inputs, but this feat is not used very often.
  - Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields
- Diskriminative models do everything in one-step – they learn the posterior <latex>P(y|x)</latex> as a function of some features of <latex>x</latex>.
  - They are simpler and can use many more features, but are prone to missing inputs.
  - Examples: SVM, Logistic Regression, Neuron. sítě, k-NN, Conditional Random Fields

[ Back to the navigation ] [ Back to the content ]