This is an old revision of the document!
Table of Contents
Semantic Taxonomy Induction from Heterogenous Evidence
Introduction
- related methods (WordNet – hand-made, CYC)
- hand-made patterns “filled in” by words that satisfy them (automaticaly)
- “such NP(y) as NP (x)” ⇒ y is hypernym of x (reversed in the paper! probably a copy-paste error)
- most methods disregard ambiguity (rose bush)
Section 2 (A Probabilistic Framework...)
2.1
- taxonomy = set of relations
- the H notation (H^n_ij)
- pairwise ~ binary? probably… (Sec. 2.1)
- are there different taxonomies for all types of relations? (extra one for verb entailment etc.?), not clear from the paper
- ISA is transtivive
- cousins have a least common subsumer
2.2
- looking for the taxonomy that maximizes its likelihood given the evindence
2.3
- describes the main part of the algorithm
- start with an initial taxonomy, add relations
- definition of the multiplicative change + explanation
- adding a relation involves adding relations implied by transitivity
2.4
- model adaptation for ambiguity
- adding relations between word senses
Section 3
3.1
- evidence is a vector
- training data from WordNet, trained a classifier using logistic regression
- overfitting: don't want to model too long-distance relations
3.2
- cousins
- clustering – similarity is cosine distance (within the cluster, 0 otherwise)
- softmax regression: more than 2 classes, otherwise similar to logistic regression
3.3
- identify the set of word pairs that can be hyper/hyponyms
- for each proposed hypernym… was explained
- add the one with highest score according to the classifiers
3.4
- sense disambiguation
- works by itself
- the “carrier” example explained
Evaluation
- manual evaluation – uniformly generated samples from the first n links, human judge
- annotators were to classify into 4 classes (4.1)
- all the various evaluation methods discussed
DISCUSSION
- what is MiniPar? very simple parser (that's the reason it was
used?) – maybe it is important how the dependencies are labeled.
- features are both the dependency labels but they are also lexicalized:
vehicle / \ A
such as
\ B car / a
results in features? maybe “vehicle, car … A → as → B” ?
- definitely no hand-written rules
- trained from wordnet?
This was used to enrich wordnet. Can it be used on its own (just with the parser)?
- technically, yes
- but we don't know about the performance
We didn't find out how the probability of links is computed.
- only nouns are considered, maybe computed for each noun-noun pair?
- terminology is difficult for people outside semantics
What does it mean that f-score is improved by 23% over wordnet? Does wordnet have 100% precision but low recall?
- 5000 manually labeled noun pairs
- look at table 4, wordnet does not have a very good precision as measured on those noun pairs
- it's not wordnet as such, but it seems that “the best wordnet classifier” is just built upon wordnet, going as far as k edges (the best was distance 4, i.e. car→motor vehicle→vehicle→machine according to wordnet, but not further)
- allowing furhter links lowers precision but increses recall, so 4 is the best trade-off for F-measure
- so wordnet is not altogether trustworthy