[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki

[ Back to the navigation ]

  1. Having described an input image with SIFT descriptors, why do they cluster them (using the K-means clustering algorithm, top of p. 834)?
  2. What kind of things, in Feng and Lapata's work, have topics? What do these topics predict?
  3. They have defined WI* (eqs. 2, 3, 4, 5, 7, 8) as “the subset of keywords which appropriately describe image I (…)”. However, in the end of this subsection (Image Annotation), they say they take an n-best list from candidates ordered by the probability. What is the difference between this n-best list and WI* as defined in the equations above?
  4. In the last paragraph of section 5, they argue for not using (7) for the text illustration task, as opposed to image annotation. How did they obtain the result in image annotation then?

[ Back to the navigation ] [ Back to the content ]