Semantic Taxonomy Induction from Heterogenous Evidence
- Introduction
- Section 2 (A Probabilistic Framework...)
  - 2.1
  - 2.2
  - 2.3
  - 2.4
- Section 3
  - 3.1
  - 3.2
  - 3.3
  - 3.4
- Evaluation
- DISCUSSION

Semantic Taxonomy Induction from Heterogenous Evidence

Introduction

- related methods (WordNet – hand-made, CYC)
- hand-made patterns “filled in” by words that satisfy them (automaticaly)
- “such NP(y) as NP (x)” ⇒ y is hypernym of x (reversed in the paper! probably a copy-paste error)
- most methods disregard ambiguity (rose bush)

Section 2 (A Probabilistic Framework...)

2.1

- taxonomy = set of relations
- the H notation (H^n_ij)
- pairwise ~ binary? probably… (Sec. 2.1)
- are there different taxonomies for all types of relations? (extra one for verb entailment etc.?), not clear from the paper
- ISA is transtivive
- cousins have a least common subsumer

2.2

- looking for the taxonomy that maximizes its likelihood given the evindence

2.3

- describes the main part of the algorithm
- start with an initial taxonomy, add relations
- definition of the multiplicative change + explanation
- adding a relation involves adding relations implied by transitivity

2.4

- model adaptation for ambiguity
- adding relations between word senses

Section 3

3.1

- evidence is a vector
- training data from WordNet, trained a classifier using logistic regression
- overfitting: don't want to model too long-distance relations

3.2

- cousins
- clustering – similarity is cosine distance (within the cluster, 0 otherwise)
- softmax regression: more than 2 classes, otherwise similar to logistic regression

3.3

- identify the set of word pairs that can be hyper/hyponyms
- for each proposed hypernym… was explained
- add the one with highest score according to the classifiers

3.4

- sense disambiguation
- works by itself
- the “carrier” example explained

Evaluation

- manual evaluation – uniformly generated samples from the first n links, human judge
- annotators were to classify into 4 classes (4.1)
- all the various evaluation methods discussed

DISCUSSION

- what is MiniPar? very simple parser (that's the reason it was
used?) – maybe it is important how the dependencies are labeled.
- features are both the dependency labels but they are also lexicalized:

    vehicle
   /       \ A
 such       as
             \ B
              car
             /
            a

results in features? maybe “vehicle, car … A → as → B” ?
- definitely no hand-written rules
- trained from wordnet?

This was used to enrich wordnet. Can it be used on its own (just with the parser)?
- technically, yes
- but we don't know about the performance

We didn't find out how the probability of links is computed.
- only nouns are considered, maybe computed for each noun-noun pair?

- terminology is difficult for people outside semantics

What does it mean that f-score is improved by 23% over wordnet? Does wordnet have 100% precision but low recall?
- 5000 manually labeled noun pairs
- look at table 4, wordnet does not have a very good precision as measured on those noun pairs
- it's not wordnet as such, but it seems that “the best wordnet classifier” is just built upon wordnet, going as far as k edges (the best was distance 4, i.e. car→motor vehicle→vehicle→machine according to wordnet, but not further)
- allowing furhter links lowers precision but increses recall, so 4 is the best trade-off for F-measure
- so wordnet is not altogether trustworthy

Table of Contents