 * They are able to "generate" fake inputs, but this feat is not used very often.
* Examples: Naive Bayes, Mixtures of Gaussians, HMM, Bayesian Networks, Markov Random Fields
* **Discriminative models** do everything in one-step -- they learn the posterior P(y|x) as a function of some features of x.
* They are simpler and can use many more features, but are prone to missing inputs.
* Examples: SVM, Logistic Regression, Neural network, k-NN, Conditional Random Fields

Each CFG rule generates just one level of the derivation tree. Therefore, using "standard" nonterminals, it is not possible to generate e.g. this sentence:

* ''(S (NP (PRP He)) (VP (VBD saw)(NP (PRP himself))))''

