====== Semantic Taxonomy Induction from Heterogenous Evidence ====== ===== Introduction ===== - related methods (WordNet -- hand-made, CYC) - hand-made patterns "filled in" by words that satisfy them (automaticaly) - "such NP(y) as NP (x)" => y is hypernym of x (reversed in the paper! probably a copy-paste error) - most methods disregard ambiguity (rose bush) ===== Section 2 (A Probabilistic Framework...) ===== ==== 2.1 ==== - taxonomy = set of relations - the H notation (H^n_ij) - pairwise ~ binary? probably... (Sec. 2.1) - are there different taxonomies for all types of relations? (extra one for verb entailment etc.?), not clear from the paper - ISA is transtivive - cousins have a least common subsumer ==== 2.2 ==== - looking for the taxonomy that maximizes its likelihood given the evindence ==== 2.3 ==== - describes the main part of the algorithm - start with an initial taxonomy, add relations - definition of the multiplicative change + explanation - adding a relation involves adding relations implied by transitivity ==== 2.4 ==== - model adaptation for ambiguity - adding relations between word senses ===== Section 3 ===== ==== 3.1 ==== - evidence is a vector - training data from WordNet, trained a classifier using logistic regression - overfitting: don't want to model too long-distance relations ==== 3.2 ==== - cousins - clustering -- similarity is cosine distance (within the cluster, 0 otherwise) - softmax regression: more than 2 classes, otherwise similar to logistic regression ==== 3.3 ==== - identify the set of word pairs that can be hyper/hyponyms - for each proposed hypernym... was explained - add the one with highest score according to the classifiers ==== 3.4 ==== - sense disambiguation - works by itself - the "carrier" example explained ===== Evaluation ===== - manual evaluation -- uniformly generated samples from the first n links, human judge - annotators were to classify into 4 classes (4.1) - all the various evaluation methods discussed ===== DISCUSSION ===== - what is MiniPar? very simple parser (that's the reason it was used?) -- maybe it is important how the dependencies are labeled. - features are both the dependency labels but they are also lexicalized: vehicle / \ A such as \ B car / a results in features? maybe "vehicle, car ... A -> as -> B" ? - definitely no hand-written rules - trained from wordnet? This was used to enrich wordnet. Can it be used on its own (just with the parser)? - technically, yes - but we don't know about the performance We didn't find out how the probability of links is computed. - only nouns are considered, maybe computed for each noun-noun pair? - terminology is difficult for people outside semantics What does it mean that f-score is improved by 23% over wordnet? Does wordnet have 100% precision but low recall? - 5000 manually labeled noun pairs - look at table 4, wordnet does not have a very good precision as measured on those noun pairs - it's not wordnet as such, but it seems that "the best wordnet classifier" is just built upon wordnet, going as far as k edges (the best was distance 4, i.e. car->motor vehicle->vehicle->machine according to wordnet, but not further) - allowing furhter links lowers precision but increses recall, so 4 is the best trade-off for F-measure - so wordnet is not altogether trustworthy