[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Table of Contents

Questions

Aron Culotta, Jeffrey Sorensen: Dependency Tree Kernels for Relation Extraction, ACL 2004.

  1. Given Figure 1, what is the smallest common subtree that includes both t1 (Troops) and t2 (near)?
  2. Section 5: “Therefore, d(a)=l(a).” When is this true and why? (Assume this holds for the following questions.)
  3. Let Graph = {general-pos-tag, entity-type, relation-arguments} (in accordance with the paper). Let Graph (unlike in the paper). Based on Figure 2 and Section 5, compute the following matching functions and similarity functions:
    • m(t0,u0)=? m(t1,u1)=? m(t2,u2)=?
    • s(t0,u0)=? s(t1,u1)=? s(t2,u2)=?
  4. Let Graph. Compute the contiguous kernel for the two trees in Figure 2: Graph. Provide the final number and some counts along the way, so its clear how you got the number. Optionally, compute also the sparse kernel Graph.
  5. Let DT be a function that assigns the correct augmented dependency tree to a sentence. Compute (estimate) contiguous kernel and bag-of-words kernel for the following sentences:
    • Graph(DT(“Peter sleeps”), DT(“Bob runs”))=?
    • Graph(DT(“Peter sleeps”), DT(“Bob runs”))=?
  6. Lets have a pair of sentences:
    • “Bob saw US troops that moved towards Baghdad”
    • “US troops that moved towards Baghdad were seen by Bob”

You want to check the relation between entities “US” and “Baghdad”. Compute (estimate) Graph and Graph.

Answers

  1. Depends on the exact definition of smallest common subtree, but keep in mind you need at least some non-trivial “context”. The definition should be such that contiguous and sparse kernels will effectively be different things. The whole subtree is probably the right answer here.
  2. d(a) is defined as the last member of the sequence - the first member + 1. If the sequence is contiguous (no missing indices) it can be shown (eg. by induction) that the equation holds, unless some of the indices is repeated. Note that e.g. a sequence (1,1,1) is valid according to the definition of sequence in the paper.
  3. Depends on how you treat “N/A” values, by the definition you should sum values that are “the same/compatible” (disregarding the “type” of the feature).
    • m(t0,u0)=1 m(t1,u1)=1 m(t2,u2)=0
    • s(t0,u0)=5 s(t1,u1)=3 s(t2,u2)=4
  4. First this depends on the previous one (the “N/A” values) and second the paper doesn't say how to compute Graph, where A,B are sequences with more than one member. One proposed solution was to use Graph
    • GraphGraphGraphGraphGraphGraph
    • When counting K_1 you leave out the Graph part
  5. When you regard bag-of-words kernel as number of matching forms then K_2 is zero whereas K_1 is positive
  6. It was argued that we'll probably end up with different relation-args (troops being ARG_B in the first sentence, but ARG_A in the second sentence), thus there will be no match

Misc

  1. There was some discussion what are the features for bag-of-words kernel (just presence of a word in sentence?)
  2. Feature selection, mainly the relation-args feature
  3. “Two level” classification, why it might be a good idea

[ Back to the navigation ] [ Back to the content ]