[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision Both sides next revision
courses:rg:cross-lingual-link-structure-of-wikipedia [2011/04/04 17:31]
green vytvořeno
courses:rg:cross-lingual-link-structure-of-wikipedia [2011/04/11 14:47]
popel formatting
Line 1: Line 1:
 +====== Gerard de Melo, Gerhard Weikum (2010): Untangling the Cross-Lingual Link Structure of Wikipedia ======
 +
  
 ===== Comments ===== ===== Comments =====
--The paper had to main goals: 1) to find mismatched cross lingual topics and use their algorithm to correct the cross lingual links and 2) to extract some cross language named entity pairs. +  * The paper had to main goals: 
--We discussed the general algorithm and its approach to using distinct sets of categories and its approximation algorithm for weighted distinctness graph separation. +     to find mismatched cross lingual topics and use their algorithm to correct the cross lingual links and 
--We discussed the use of bots in wikipedia to populate the links. It was unclear how many of the poor links are bot generated. It was suggested that it would be useful to augment the algorithm if wikipedia indicates in the change log which links came from bots (it was not clear this was possible).+     to extract some cross language named entity pairs. 
 +  We discussed the general algorithm and its approach to using distinct sets of categories and its approximation algorithm for weighted distinctness graph separation. 
 +  We discussed the use of bots in wikipedia to populate the links. It was unclear how many of the poor links are bot generated. It was suggested that it would be useful to augment the algorithm if wikipedia indicates in the change log which links came from bots (it was not clear if this was possible).
  
 ===== What do we dislike about the paper ===== ===== What do we dislike about the paper =====
--It was unclear in Figure 1 what was simplified. The figure is also directed but does not show it graphically which made it difficult to figure out if we could have a term going to multiple languages. +  * It was unclear in Figure 1 what was simplified. The figure is also directed but does not show it graphically which made it difficult to figure out if we could have a term going to multiple languages. 
--The software uses CPLEX which is commercial. This makes it hard to verify the results of the paper. +  The software uses CPLEX which is commercial. This makes it hard to verify the results of the paper. 
--The algorithm also seemed to not work on large datasets which would be a problem for most of its intended uses.+  The algorithm also seemed to not work on large datasets which would be a problem for most of its intended uses.
  
  

[ Back to the navigation ] [ Back to the content ]