 1. In three main types of word representations described in the paper, to which types the following two samples belong:

a)
dog -0.087099201783 -0.136966257697 0.106813367913 [47 more numbers]
cat -0.103287428163 -0.0066971301398 -0.0346911076188 [47 more numbers]

b)
dog 11010111010
cat 11010111010

2. Section 4.1 defines a corrupted (or noise) n-gram, but there is a tiny error/typo in the definition. Try nitpicking and point it out.

3. Section 7.4 states that "word representations in NER brought larger gains on the out-of-domain data than on the in-domain data." Try to guess what is the reason.

4. Consider the feature set used in the NER task (section 7.4).
a) Does it contain any feature with word representations?​
b) Does it contain any compound feature with word representations?​
c) Give an example of a possible compound feature with word representations for the NER task.

5.
Consider the C&W embedding vectors with 50 dimensions. Guess which word has the embedding vector most similar (by Euclidean distance) to the following vector:
a) vector(king) - vector(man) + vector(woman)
b) vector(dollars) - vector(dollar) + vector(mouse)

**Hint** : The paper is 11-page long. You can skip section 2 and section 3.2 which are the literature review.

