[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki

[ Back to the navigation ]

This is an old revision of the document!

Table of Contents

Michael Collins, Nigel Duffy: Convolution kernels for natural language

Paper link


  1. What is a generative model, what is a discriminative model and what is their main difference?
  2. What are the “fairly strong independence assumptions” in PCFG? Come up with an example tree that can't be modelled by a PCFG.
  3. Derive and explain the formula for h(T1)*h(T2) on page 3 at the bottom.
  4. What is a convolution? Why are “convolution” kernels called like this?
  5. Find an error in one of the formulae in the paper.


    • Generative models use a two-step setup. They learn class-conditional (likelihood) <latex>P(x|y)</latex>, prior <latex>P(y)</latex> and use the Bayes rule to obtain the posterior.
      • they learn the joint distributions: marginalize P(y), condition P(y|x) = P(x,y) / P(x)
      • They learn more than is actually needed, but are not prone to partially missing input data.
      • They are able to “generate” fake inputs, but this feat is not used very often.
      • Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields
    • Diskriminative models do everything in one-step – they learn the posterior <latex>P(y|x)</latex> as a function of some features of <latex>x</latex>.
      • They are simpler and can use many more features, but are prone to missing inputs.
      • Examples: SVM, Logistic Regression, Neuron. sítě, k-NN, Conditional Random Fields

[ Back to the navigation ] [ Back to the content ]