This is an old revision of the document!
Table of Contents
Michael Collins, Nigel Duffy: Convolution kernels for natural language
Questions
- What is a generative model, what is a discriminative model and what is their main difference?
- What are the “fairly strong independence assumptions” in PCFG? Come up with an example tree that can't be modelled by a PCFG.
- Derive and explain the formula for h(T1)*h(T2) on page 3 at the bottom.
- What is a convolution? Why are “convolution” kernels called like this?
- Find an error in one of the formulae in the paper.
Answers
- Generative models use a two-step setup. They learn class-conditional (likelihood) <latex>P(x|y)</latex>, prior <latex>P(y)</latex> and use the Bayes rule to obtain the posterior.
- they learn the joint distributions: marginalize P(y), condition P(y|x) = P(x,y) / P(x)
- They learn more than is actually needed, but are not prone to partially missing input data.
- They are able to “generate” fake inputs, but this feat is not used very often.
- Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields
- Diskriminative models do everything in one-step – they learn the posterior <latex>P(y|x)</latex> as a function of some features of <latex>x</latex>.
- They are simpler and can use many more features, but are prone to missing inputs.
- Examples: SVM, Logistic Regression, Neuron. sítě, k-NN, Conditional Random Fields