# Questions

Aron Culotta, Jeffrey Sorensen: Dependency Tree Kernels for Relation Extraction, ACL 2004.

- Given Figure 1, what is the smallest common subtree that includes both t1 (Troops) and t2 (near)?
- Section 5: “Therefore, d(a)=l(a).” When is this true and why? (Assume this holds for the following questions.)
- Let = {general-pos-tag, entity-type, relation-arguments} (in accordance with the paper). Let (unlike in the paper). Based on Figure 2 and Section 5, compute the following matching functions and similarity functions:
`m(t0,u0)=? m(t1,u1)=? m(t2,u2)=?`

`s(t0,u0)=? s(t1,u1)=? s(t2,u2)=?`

- Let . Compute the contiguous kernel for the two trees in Figure 2: . Provide the final number and some counts along the way, so its clear how you got the number. Optionally, compute also the sparse kernel .
- Let DT be a function that assigns the correct augmented dependency tree to a sentence. Compute (estimate) contiguous kernel and bag-of-words kernel for the following sentences:
- (DT(“Peter sleeps”), DT(“Bob runs”))=?
- (DT(“Peter sleeps”), DT(“Bob runs”))=?

- Lets have a pair of sentences:
- “Bob saw US troops that moved towards Baghdad”
- “US troops that moved towards Baghdad were seen by Bob”

You want to check the relation between entities “US” and “Baghdad”. Compute (estimate) and .

# Answers

- Depends on the exact definition of smallest common subtree, but keep in mind you need at least some non-trivial “context”. The definition should be such that contiguous and sparse kernels will effectively be different things. The whole subtree is probably the right answer here.
- d(a) is defined as the last member of the sequence - the first member + 1. If the sequence is contiguous (no missing indices) it can be shown (eg. by induction) that the equation holds, unless some of the indices is repeated. Note that e.g. a sequence (1,1,1) is valid according to the definition of sequence in the paper.
- Depends on how you treat “N/A” values, by the definition you should sum values that are “the same/compatible” (disregarding the “type” of the feature).
`m(t0,u0)=1 m(t1,u1)=1 m(t2,u2)=0`

`s(t0,u0)=5 s(t1,u1)=3 s(t2,u2)=4`

- First this depends on the previous one (the “N/A” values) and second the paper doesn't say how to compute , where A,B are sequences with more than one member. One proposed solution was to use
- When counting K_1 you leave out the part

- When you regard bag-of-words kernel as number of matching forms then K_2 is zero whereas K_1 is positive
- It was argued that we'll probably end up with different relation-args (
*troops*being ARG_B in the first sentence, but ARG_A in the second sentence), thus there will be no match

# Misc

- There was some discussion what are the features for bag-of-words kernel (just presence of a word in sentence?)
- Feature selection, mainly the relation-args feature
- “Two level” classification, why it might be a good idea