Table of Contents

Questions

Aron Culotta, Jeffrey Sorensen: Dependency Tree Kernels for Relation Extraction, ACL 2004.

  1. Given Figure 1, what is the smallest common subtree that includes both t1 (Troops) and t2 (near)?
  2. Section 5: “Therefore, d(a)=l(a).” When is this true and why? (Assume this holds for the following questions.)
  3. Let <latex>\phi_m</latex> = {general-pos-tag, entity-type, relation-arguments} (in accordance with the paper). Let <latex>\phi_s = \phi_m</latex> (unlike in the paper). Based on Figure 2 and Section 5, compute the following matching functions and similarity functions:
    • m(t0,u0)=? m(t1,u1)=? m(t2,u2)=?
    • s(t0,u0)=? s(t1,u1)=? s(t2,u2)=?
  4. Let <latex>\lambda=0.5</latex>. Compute the contiguous kernel for the two trees in Figure 2: <latex>K_1(T,U)=?</latex>. Provide the final number and some counts along the way, so its clear how you got the number. Optionally, compute also the sparse kernel <latex>K_0(T,U)=?</latex>.
  5. Let DT be a function that assigns the correct augmented dependency tree to a sentence. Compute (estimate) contiguous kernel and bag-of-words kernel for the following sentences:
    • <latex>K_1</latex>(DT(“Peter sleeps”), DT(“Bob runs”))=?
    • <latex>K_2</latex>(DT(“Peter sleeps”), DT(“Bob runs”))=?
  6. Lets have a pair of sentences:
    • “Bob saw US troops that moved towards Baghdad”
    • “US troops that moved towards Baghdad were seen by Bob”

You want to check the relation between entities “US” and “Baghdad”. Compute (estimate) <latex>K_1</latex> and <latex>K_2</latex>.

Answers

  1. Depends on the exact definition of smallest common subtree, but keep in mind you need at least some non-trivial “context”. The definition should be such that contiguous and sparse kernels will effectively be different things. The whole subtree is probably the right answer here.
  2. d(a) is defined as the last member of the sequence - the first member + 1. If the sequence is contiguous (no missing indices) it can be shown (eg. by induction) that the equation holds, unless some of the indices is repeated. Note that e.g. a sequence (1,1,1) is valid according to the definition of sequence <latex>a</latex> in the paper.
  3. Depends on how you treat “N/A” values, by the definition you should sum values that are “the same/compatible” (disregarding the “type” of the feature).
    • m(t0,u0)=1 m(t1,u1)=1 m(t2,u2)=0
    • s(t0,u0)=5 s(t1,u1)=3 s(t2,u2)=4
  4. First this depends on the previous one (the “N/A” values) and second the paper doesn't say how to compute <latex>K_0(t_i[A],t_j[B])</latex>, where A,B are sequences with more than one member. One proposed solution was to use <latex>K_0(t_i[A],t_j[B])=\sum_{s=0..l(A)}K_0(t_i[a_s],t_j[b_s])</latex>
    • <latex>K_0(T,U)=s(t_0,u_0)+\lambda^2K_0(t_1,u_1)+\lambda^2K_0(t_1,u_2)+</latex><latex>\lambda^2K_0(t_2,u_1)+\lambda^2K_0(t_2,u_2)+\lambda^2K_0(t_3,u_1)+</latex><latex>\lambda^2K_0(t_3,u_2)+\lambda^4K_0(\{t_1,t_2\},\{u_1,u_2\})+</latex><latex>\lambda^4K_0(\{t_2,t_3\},\{u_1,u_2\})+\lambda^5K_0(\{t_1,t_3\},\{u_1,u_2\})</latex><latex>=s(t_0,u_0)+\lambda^2s(t_1,u_1)+\lambda^2s(t_3,u_2)+\lambda^4s(t_4,u_3)+\lambda^4s(t_1,u_1)+</latex><latex>\lambda^4s(t_3,u_2)+\lambda^6s(t_4,u_3)+\lambda^5s(t_1,u_1)+\lambda^5s(t_3,u_2)+\lambda^7s(t_4,u_3)</latex>
    • When counting K_1 you leave out the <latex>(\{t_1,t_3\},\{u_1,u_2\})</latex> part
  5. When you regard bag-of-words kernel as number of matching forms then K_2 is zero whereas K_1 is positive
  6. It was argued that we'll probably end up with different relation-args (troops being ARG_B in the first sentence, but ARG_A in the second sentence), thus there will be no match

Misc

  1. There was some discussion what are the features for bag-of-words kernel (just presence of a word in sentence?)
  2. Feature selection, mainly the relation-args feature
  3. “Two level” classification, why it might be a good idea