====== Predicting Human Brain Activity Associated with the Meanings of Nouns ======
Tom M. Mitchell, Svetlana V. Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L. Malave, Robert A. Mason, Marcel Adam Just
[[http://www.ccbi.cmu.edu/reprints/Mitchell_Science2008_corpus-prediction.pdf|Predicting Human Brain Activity Associated with the Meanings of Nouns]] 


===== Comments =====
==== Summary ====
  * authors present a computational model, which predicts the functional magnetic resonance imaging (fMRI) of neural activation associated with words for which no fMRI data are available
  * fMRI prediction for a word ''w'' is a two-step process:
    - compute a vector of semantic features over a huge corpus
      * 25 features are defined in terms of co-occurrence of ''w'' with forms of 25 manually selected sensory-motor verbs
    - predict neural fMRI activation as a weighted sum of semantic features
      * weights for every voxel (3D pixel) and feature are estimated using multiple regression
  * fMRI data
    * they created 60 representative fMRI images
      * word - picture combination from 12 semantic categories
      * they measured brain activation of 9 participants after being exposed to all 60 word - picture combinations
  * evaluation
    * they carried out "leave-two-out" cross validation with all 60 examples
      * 58 of them served as a training data
      * for two of them fMRI images were predicted and compared with observed images. On the basis of cosine similarity measure a matching was determined. If the predicted image for the first word matches with its corresponding observed image, one positive point is scored - aggregated over all folds, it forms an accuracy measure
  * experiments and results
    * quantitative measurments
      * matching two unseen words to their seen fMRI images
        * 0.77 averaged over all 9 participants - significantly above a chance level
        * higher activation tends to be predicted in the left hemisphere - it is consistent with the generally held view that left hemisphere is more responsible for semantic representation than the right one
      * prediction for a word in a new semantic category
        * in training stage, they excluded all examples from the same semantic category as either of the two tested words
        * 0.70 averaged over all 9 participants - still above the chance level
      * prediction, when two tested words belong to the same category
        * hard to distinguish
        * 0.62 averaged over all participants - slightly above the chance level
      * ability to distinguish in even more diverse range of words
        * model trained on 59 examples and tested on a remaining example + another 1000 highly frequent words
        * it ranked predicted fMRI images for 1001 words with respect to its similarity to observed fMRI image of the testing example
        * 0.72 over 9 participants
    * examination of learnt basis set of fMRI signatures for 25 verb-based signatures
      * 'eat' predicts strong activation in gustatory cortex involved in the sense of taste, 'push' in a part of brain involved in the planning of complex coordinated movements, 'run' in the part involved in perception of biological motion
      * though for other verbs these correspondences between the function and brain regions is not present across all participants
    * convenience of selected verbs as a basis for features
      * they generated 115 random sets of 25 features constructed from 5000 highly frequent words (excluding 25 verbs used in an original setting) in corpus and trained system using these feature sets
      * accuracy of prediction fMRI ranged from 0.46 to 0.68 (with mean equal to 0.60) => compared with 0.77 in setting using 25 manually selected verbs it suggest that these 25 designed features are distinctive in capturing regularities in the neural activation encoding of the semantic content of words
  * conclusion
    * this work presented a predictive relationship between word co-occurrence statistics and neural activation
    * high accuracy of selected 25 features shows that neural representation of concrete words is to a large extent grounded in sensory-motor features
    * it shows that semantic features share commonalities across individuals and may help to predict neural representations across individuals, as well
    * the model captures semantic, rather than visual aspect of words

===== What do we like about the paper =====
  * paper describes an interesting field of neurolinguistics and combines it in a very neat way with distributional methods of computational linguistics
  * authors designed a lot of various smart experiments to prove the quality of their model. The results are promising.

===== What do we dislike about the paper =====
  * authors selected 25 sensory-motor verbs as a basis for their co-occurence features. But they did not sufficiently explain what led them to pick exactly these ones.
  * gold standard fMRI images are the result of word-picture pair being presented to participants. We think, that showing a picture on the one hand can be helpful for an individual to imagine the thing, on the other hand it can strictly limit the variety of concepts that an individual can assign to the word. It is similar to reading a book vs. watching a movie.
  * for how long were the participants exposed to every word-picture pair? If it is too long, an individual can start to think about other things that he considers to be related with an original thing
  * it would be interesting to carry out similar research on abstract words


Written by Michal Novák