====== Semantic Taxonomy Induction from Heterogenous Evidence ======

===== Introduction =====

- related methods (WordNet -- hand-made, CYC)
- hand-made patterns "filled in" by words that satisfy them (automaticaly)
- "such NP(y) as NP (x)" => y is hypernym of x (reversed in the paper! probably a copy-paste error)
- most methods disregard ambiguity (rose bush)

===== Section 2 (A Probabilistic Framework...) =====

==== 2.1 ====

- taxonomy = set of relations
- the H notation (H^n_ij)
- pairwise ~ binary? probably... (Sec. 2.1) 
- are there different taxonomies for all types of relations? (extra one for verb entailment etc.?), not clear from the paper
- ISA is transtivive
- cousins have a least common subsumer

==== 2.2 ==== 
- looking for the taxonomy that maximizes its likelihood given the evindence

==== 2.3 ====
- describes the main part of the algorithm
- start with an initial taxonomy, add relations
- definition of the multiplicative change + explanation
- adding a relation involves adding relations implied by transitivity

==== 2.4 ====
- model adaptation for ambiguity
- adding relations between word senses

===== Section 3 =====

==== 3.1 ==== 
- evidence is a vector
- training data from WordNet, trained a classifier using logistic regression
- overfitting: don't want to model too long-distance relations

==== 3.2 ====
- cousins
- clustering -- similarity is cosine distance (within the cluster, 0 otherwise)
- softmax regression: more than 2 classes, otherwise similar to logistic regression

==== 3.3 ====
- identify the set of word pairs that can be hyper/hyponyms
- for each proposed hypernym... was explained
- add the one with highest score according to the classifiers

==== 3.4 ====
- sense disambiguation
- works by itself
- the "carrier" example explained

===== Evaluation =====
- manual evaluation -- uniformly generated samples from the first n links, human judge
- annotators were to classify into 4 classes (4.1)
- all the various evaluation methods discussed

===== DISCUSSION =====

- what is MiniPar? very simple parser (that's the reason it was
used?) -- maybe it is important how the dependencies are labeled.
- features are both the dependency labels but they are also lexicalized:

<code>
    vehicle
   /       \ A
 such       as
             \ B
              car
             /
            a
</code>

results in features? maybe "vehicle, car ... A -> as -> B" ?
- definitely no hand-written rules
- trained from wordnet?

This was used to enrich wordnet. Can it be used on its own (just with the parser)?
- technically, yes
- but we don't know about the performance

We didn't find out how the probability of links is computed.
- only nouns are considered, maybe computed for each noun-noun pair?

- terminology is difficult for people outside semantics

What does it mean that f-score is improved by 23% over wordnet? Does wordnet have 100% precision but low recall?
- 5000 manually labeled noun pairs
- look at table 4, wordnet does not have a very good precision as measured on those noun pairs
- it's not wordnet as such, but it seems that "the best wordnet classifier" is just built upon wordnet, going as far as k edges (the best was distance 4, i.e. car->motor vehicle->vehicle->machine according to wordnet, but not further)
- allowing furhter links lowers precision but increses recall, so 4 is the best trade-off for F-measure
- so wordnet is not altogether trustworthy