1. Suppose you have a tagset consisting of two tags, N(noun), X(not noun) and a training sentence:
luke/N i/X am/X your/X father/N
During the training, this best tag sequence for this sentence is found:
N N X N X
How would this result alter values of alfa_X,X,X, alfa_N,father?
Supposing that the best tag sequence won't change, what would be your answer if “father/N” is replaced by “luke/X”?
2. Suppose this tagged sentence as the only entry in your training data:
a/DT boy/NN saw/VBD a/DT girl/NN with/IN a/DT nice/JJ hat/NN
How many features will a tagger from section 2.4 have, when its training is identical to the one from section 2.1?
(For some reasons, you want to use all 36 tags from Penn Treebank tagset.)
3. What is the difference between Maximum-Entropy and the Perceptron model training in the experiments?
4. Do you think that this task can be parallelized?
How do you think the performance of tagger presented in the paper will change when you introduce parallelism?
(You might compare this problem to similar one in paper presented on RG in winter - http://aclweb.org/anthology-new/N/N10/N10-1069.pdf)
Intro
Autors said that training perceptron is quicker and easier solution.
Definition
We defined structured perceptron and parametres for special case of structured perceptron.
Parameters are log of conditional probability for trigram words and conditional probability for word and tag.
αx,y,z = log P(z | x,y)
αt,w = log P(w| t)
Logs are used for precision
Then speaker show how work process of learning.
Autors then deal with separable and inseparable data.
They defined condition for inseparable data.
We spend rest of time solution of questions. During that solution were shown others fetatures of this percepton training.