**This is an old revision of the document!**

### Table of Contents

# Michael Collins, Nigel Duffy: Convolution kernels for natural language

### Questions

- What is a generative model, what is a discriminative model and what is their main difference?
- What are the “fairly strong independence assumptions” in PCFG? Come up with an example tree that can't be modelled by a PCFG.
- Derive and explain the formula for h(T1)*h(T2) on page 3 at the bottom.
- What is a convolution? Why are “convolution” kernels called like this?
- Find an error in one of the formulae in the paper.

### Answers

**Generative models**use a two-step setup. They learn class-conditional (likelihood) <latex>P(x|y)</latex>, prior <latex>P(y)</latex> and use the Bayes rule to obtain the posterior.- they learn the joint distributions: marginalize P(y), condition P(y|x) = P(x,y) / P(x)
- They learn more than is actually needed, but are not prone to partially missing input data.
- They are able to “generate” fake inputs, but this feat is not used very often.
- Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields

**Diskriminative models**do everything in one-step – they learn the posterior <latex>P(y|x)</latex> as a function of some features of <latex>x</latex>.- They are simpler and can use many more features, but are prone to missing inputs.
- Examples: SVM, Logistic Regression, Neuron. sítě, k-NN, Conditional Random Fields