[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
user:hladka:playcoref [2009/02/26 12:02]
hladka
user:hladka:playcoref [2009/02/26 12:15]
hladka
Line 77: Line 77:
  
 ====== Specification ====== ====== Specification ======
 +
  
  
Line 87: Line 88:
    * What my partner is doing? If (s)he hooks up the same pair of words as I hooked up then the pair of words starts **???????**. If (s)he links a word I have not linked so far then a given word starts **???????**    * What my partner is doing? If (s)he hooks up the same pair of words as I hooked up then the pair of words starts **???????**. If (s)he links a word I have not linked so far then a given word starts **???????**
    * The players can re-hook up any word any time in the session.       * The players can re-hook up any word any time in the session.   
-   * To design the game for a particular language the following data and tools are needed (or are welcome):+   * To design the game for a particular language the following data and tools are needed (or, better said, are welcome):
      - corpus of manually anotated coreference      - corpus of manually anotated coreference
      - POS tagger      - POS tagger
      - coreference resolution procedure      - coreference resolution procedure
 +
 +
 +
  
  
Line 98: Line 102:
  
 === Text Selection === === Text Selection ===
-  * CS data ^JM^+  * CS data
      * Anja's data    ## // PDT data that are currently being annotated for the extended coreference //      * Anja's data    ## // PDT data that are currently being annotated for the extended coreference //
-     * more 'user-friendly' texts    ## // texts that are currently in the LGame db// +     * **JM**: It would be nice if the players could choose a domain of the texts to play on (science-fiction, fantasy, thriller, romance, ...), maybe even the author or the very title. The available resources of free electronic books in Czech are scarce but there are plenty of free electronic books in English and other languages, e.g. [[http://www.gutenberg.org/wiki/Main_Page|Project Gutenberg]]. **BH**: It is a very nice idea but I would postpone it till the next versions of the PlayCoref game. However, we have already selected more 'user-friendly' texts into the LGame db - see [[http://ufallab2.ms.mff.cuni.cz/lgame/|this page]]. So we can use them for the PlayCoref game as well.  
-     * **^JM^**: It would be nice if the players could choose a domain of the texts to play on (science-fiction, fantasy, thriller, romance, ...), maybe even the author or the very title. The available resources of free electronic books in Czech are scarce but there are plenty of free electronic books in English and other languages, e.g. [[http://www.gutenberg.org/wiki/Main_Page|Project Gutenberg]]. **^BH^**: It is a very nice idea but I would postpone it till the next versions of the PlayCoref game. However, we have already selected more 'user-friendly' texts into the LGame db - see [[http://ufallab2.ms.mff.cuni.cz/lgame/|this page]]. So we can use them for the PlayCoref game as well. +      * **---JM TO DO---** na datech od Anji zjistit pro nas zajimave statistiky typu 
 +vety/dokument; sipky_noun_noun-noun_pronoun-pronoun-pronoun/document; ... 
    * **EN**    * **EN**
       * search the data that are available       * search the data that are available
Line 117: Line 123:
    * sentence by sentence    * sentence by sentence
    * supervised selection of documents for a session     * supervised selection of documents for a session 
 +
  
  
Line 122: Line 129:
    * ''pts_of_player_A = w1*(player_A's_output vs. automatic_annotation) + w1*(player_A's_output vs. player_B's_output) + speed_pts''    * ''pts_of_player_A = w1*(player_A's_output vs. automatic_annotation) + w1*(player_A's_output vs. player_B's_output) + speed_pts''
    
-**^JM^**:+**JM**:
 Já myslím, že do shody je tlačit chceme. Je žádoucí, aby anotace byla co nejúplnější. Když druhý hráč uvidí, že první hráč spojil nějaké slovo, vyvíjí to na něj tlak, aby se podíval, jestli to Já myslím, že do shody je tlačit chceme. Je žádoucí, aby anotace byla co nejúplnější. Když druhý hráč uvidí, že první hráč spojil nějaké slovo, vyvíjí to na něj tlak, aby se podíval, jestli to
-nepřehlédl a jestli by ho nemohl zapojit také. Neukazuje se mu kam, takže když nenajde žádný cíl, nezapojí ho a bude se radovat, že první hráč udělal nějakou chybu.+nepřehlédl a jestli by ho nemohl zapojit také. Neukazuje se mu kam, takže když nenajde žádný cíl, nezapojí ho a bude se radovat, že první hráč udělal nějakou chybu. Myslím, že ta funkce by měla brát **buď** automatickou anotaci **nebo** manuální, podle toho, co je k dispozici. Rovněž si teď myslím, že manuálně anotovaná data budeme používat minimálně - pouze pro změření úspěšnosti anotace pomocí hry - to ale nemusí být vůbec součástí skóre hry, to se udělá off-line. Manuálně anotovaných dat máme málo, jsou už anotovaná a nejsou zábavná. Z toho mi vyplývá, že bych manuální anotaci pro určování skóre nebral vůbec v úvahu a ze vzorečku nahoře bych první člen vyhodil. 
 + 
 +**BH**: Jirka ma pravdu. Pocitani skore musi byt objektivni. Proto jsem vzorecek upravila tak, ze nebude pocitat shodu hrace vzhledem k rucni anotaci.
  
-Myslím, že ta funkce by měla brát **buď** automatickou anotaci **nebo** manuální, podle toho, co je k dispozici. Rovněž si teď myslím, že manuálně anotovaná data budeme používat minimálně - pouze pro změření úspěšnosti anotace pomocí hry - to ale nemusí být vůbec součástí skóre hry, to se udělá off-line. Manuálně anotovaných dat máme málo, jsou už anotovaná a nejsou zábavná. Z toho mi vyplývá, že bych manuální anotaci pro určování skóre nebral vůbec v úvahu a ze vzorečku nahoře bych první člen vyhodil. 
-**^BH^**: Jirka ma pravdu. Pocitani skore musi byt objektivni. Proto jsem vzorecek upravila tak, ze nebude pocitat shodu hrace vzhledem k rucni anotaci (je-li k dispozici). 
  
  
Line 136: Line 143:
       * player_A_id, player_B_id       * player_A_id, player_B_id
       * document(s)       * document(s)
-      * number of corrections by player_A and by player_B (JM: I do not see the point in this) +      * number of corrections by player_A and by player_B (**JM**: I do not see the point in this) 
-      * corrections by player_A and by player_B (JM: and maybe nor in this)+      * corrections by player_A and by player_B (**JM**: and maybe nor in this) (**BH**: I am interested in the manner of the players. Maybe the corrections will be total mess, but we have to see the data at least from the very first sessions. )
  
 ===== Design ===== ===== Design =====
Line 143: Line 150:
       * session time = elapsed time + remaining time       * session time = elapsed time + remaining time
       * how many sentences my partner has read so far        * how many sentences my partner has read so far 
-      * running pts **???????** (JM: I would be very cautious with this; the user might be tempted to cancel an action if the score decreases; the user might also try to fit the automatic annotation (by trying various arrows and watching if the score goes up or down), which is not what we want)+      * running pts **???????** (**JM**: I would be very cautious with this; the user might be tempted to cancel an action if the score decreases; the user might also try to fit the automatic annotation (by trying various arrows and watching if the score goes up or down), which is not what we want)
    * Format of the text    * Format of the text
-      * JM: nouns and pronouns might be displayed slightly differently so that the user avoids other parts of speech easily; he should not be allowed to use other parts of speech at all+      * **JM**: nouns and pronouns might be displayed slightly differently so that the user avoids other parts of speech easily; he should not be allowed to use other parts of speech at all
    * Visualization of the coreference pairs    * Visualization of the coreference pairs
       * colors       * colors
-      * arrows (JM: to avoid too many arrows on the screen, possibly only if the mouse pointer hovers over a word, arrows that start or end at the word would be displayed)+      * arrows (**JM**: to avoid too many arrows on the screen, possibly only if the mouse pointer hovers over a word, arrows that start or end at the word would be displayed)
       * ...       * ...
 +
  
  
Line 157: Line 165:
  
 ===== Tools needed ===== ===== Tools needed =====
-   * tagger ^BH^ ## tool_chain (CAC2.0) +   * tagger ## tool_chain (CAC2.0) 
-   * Linh's coreference resolution procedure ^PS^ ## What type of input data the Linh's procedure works with? ''tool_chain'' is going to be extended by the ''S'' option enabling to run Vasek Klimes' t-parser in a basic version, i.e. just t-tree and functors. See more info [[https://wiki.ufal.ms.mff.cuni.cz/user:hladka:playcoref#automaticke-urcovani-koreference-v-ceskych-datech-prehled]].+   * Linh's coreference resolution procedure **---PS TO DO---** What type of input data the Linh's procedure works with? ''tool_chain'' is going to be extended by the ''S'' option enabling to run Vasek Klimes' t-parser in a basic version, i.e. just t-tree and functors. See more info [[https://wiki.ufal.ms.mff.cuni.cz/user:hladka:playcoref#automaticke-urcovani-koreference-v-ceskych-datech-prehled]].
   * conversion: csts <-> pml m_coref scheme   * conversion: csts <-> pml m_coref scheme

[ Back to the navigation ] [ Back to the content ]