Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:hladka:playcoref [2009/02/26 12:14] hladka |
user:hladka:playcoref [2009/03/24 21:55] hladka |
||
---|---|---|---|
Line 21: | Line 21: | ||
* [[http:// | * [[http:// | ||
* Projekt anotace rozšířené textové koreference a bridging vztahů v PDT. (Anja Nedolužko: [[http:// | * Projekt anotace rozšířené textové koreference a bridging vztahů v PDT. (Anja Nedolužko: [[http:// | ||
+ | |||
+ | |||
Line 30: | Line 32: | ||
====== Automatické určování koreference v českých datech - přehled ====== | ====== Automatické určování koreference v českých datech - přehled ====== | ||
* Experiments with Czech so far | * Experiments with Czech so far | ||
+ | - Kučová L., Žabokrtský Z.: Anaphora in Czech: Large Data and Experiments with Automatic Anaphora Resolution. TSD 2005. **Available: | ||
- Nguy Giang Linh: Návrh souboru pravidel pro analýzu anafor v českém jazyce (A set of rules for anaphora resolution in Czech), MFF UK 2006. **Available: | - Nguy Giang Linh: Návrh souboru pravidel pro analýzu anafor v českém jazyce (A set of rules for anaphora resolution in Czech), MFF UK 2006. **Available: | ||
- Nguy Giang Linh; Žabokrtský, | - Nguy Giang Linh; Žabokrtský, | ||
Line 77: | Line 80: | ||
====== Specification ====== | ====== Specification ====== | ||
+ | |||
+ | |||
Line 84: | Line 89: | ||
* A game of two players. Players are paired randomly. Computer as a player: automatic coreference resolution **???????** | * A game of two players. Players are paired randomly. Computer as a player: automatic coreference resolution **???????** | ||
* Session time up to **???????** minutes. | * Session time up to **???????** minutes. | ||
- | * At the beginning of the game, if there is no coreference pair in the first two sentences (as determined by the manual/ | + | * At the beginning of the game, if there is no coreference pair in the first two sentences (as determined by the manual/ |
* What my partner is doing? If (s)he hooks up the same pair of words as I hooked up then the pair of words starts **??????? | * What my partner is doing? If (s)he hooks up the same pair of words as I hooked up then the pair of words starts **??????? | ||
* The players can re-hook up any word any time in the session. | * The players can re-hook up any word any time in the session. | ||
- | * To design the game for a particular language the following data and tools are needed (or are welcome): | + | * To design the game for a particular language the following data and tools are needed (or, better said, are welcome): |
- corpus of manually anotated coreference | - corpus of manually anotated coreference | ||
- POS tagger | - POS tagger | ||
- coreference resolution procedure | - coreference resolution procedure | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Line 103: | Line 121: | ||
* CS data | * CS data | ||
* Anja's data ## // PDT data that are currently being annotated for the extended coreference // | * Anja's data ## // PDT data that are currently being annotated for the extended coreference // | ||
- | * **JM**: It would be nice if the players could choose a domain of the texts to play on (science-fiction, | + | * **JM**: It would be nice if the players could choose a domain of the texts to play on (science-fiction, |
- | * **---JM TO DO---** na datech od Anji zjistit | + | |
- | vety/dokument; sipky_noun_noun-noun_pronoun-pronoun-pronoun/document; | + | ***BH (16/ |
* **EN** | * **EN** | ||
- | * search the data that are available | + | * search the data that are available; **BH (11/3/09)** Z dokumentace dat, ktera bychom meli mit, jsem nasla MUC6, ale nevidim tam data s koreferenci. Jirka zjisti, jestli jsou nekde jinde nebo jak jinak se k nim muzeme dostat. |
=== Coding === | === Coding === | ||
* utf-8 | * utf-8 | ||
Line 122: | Line 139: | ||
* sentence by sentence | * sentence by sentence | ||
* supervised selection of documents for a session | * supervised selection of documents for a session | ||
+ | |||
+ | |||
+ | |||
===== Scoring ===== | ===== Scoring ===== | ||
- | * '' | + | * '' |
**JM**: | **JM**: | ||
Line 138: | Line 158: | ||
===== Output Data Needed ===== | ===== Output Data Needed ===== | ||
* score list ## // | * score list ## // | ||
- | * documents after the '' | + | * documents after the '' |
+ | (**JM**: Mluvil jsem kvůli měření mezianotátorské shody v anotování koreference se Zdeňkem a vyšlo z toho, že na měření shody na šipkách by použil prostě jen F-measure. Její smysl je jasný a je symetrická. Kappa je nevhodná kvůli tomu, že pravděpodobnost náhodné shody je poměrně nízká a těžko se určuje; kappa se hodí spíš pro klasifikační úlohy (proto ji použiju v Anjiině projektu na shodu v určování typu koreference, | ||
+ | - kappa measure | ||
+ | - G-theory - see [[http:// | ||
+ | Identifying Sources of Disagreement: | ||
+ | - the Pearson correlation - see (Snow et al., 2008) [[http:// | ||
* session | * session | ||
* player_A_id, | * player_A_id, | ||
* document(s) | * document(s) | ||
* number of corrections by player_A and by player_B (**JM**: I do not see the point in this) | * number of corrections by player_A and by player_B (**JM**: I do not see the point in this) | ||
- | * corrections by player_A and by player_B (**JM**: and maybe nor in this) (**BH**: I am interested in the manner of the players. Maybe the corrections will be total mess, but we have to see the data at least from the very first sessions. ) | + | * corrections by player_A and by player_B (**JM**: and maybe nor in this) (**BH**: I am interested in the players' behaviour. Maybe the corrections will be total mess, but we have to see the data at least from the very first sessions. ) |
===== Design ===== | ===== Design ===== | ||
Line 156: | Line 182: | ||
* arrows (**JM**: to avoid too many arrows on the screen, possibly only if the mouse pointer hovers over a word, arrows that start or end at the word would be displayed) | * arrows (**JM**: to avoid too many arrows on the screen, possibly only if the mouse pointer hovers over a word, arrows that start or end at the word would be displayed) | ||
* ... | * ... | ||
- | |||
- | |||
Line 164: | Line 188: | ||
===== Tools needed ===== | ===== Tools needed ===== | ||
* tagger ## tool_chain (CAC2.0) | * tagger ## tool_chain (CAC2.0) | ||
- | * Linh's coreference resolution procedure **PS TO DO** What type of input data the Linh's procedure works with? '' | + | * Linh's coreference resolution procedure |
- | * conversion: csts <-> pml m_coref scheme | + | * vyzkouset |
+ | * conversion: csts <-> pml m_coref scheme | ||
+ | |||
+ | |||
+ | |||
+ | ===== Pro toho, kdo bude hru implementovat ===== | ||
+ | |||
+ | |||
+ | |||
+ | ====== ACL - IJCNLP2009 ====== | ||
+ | | ||
+ | * [[http://www.acl-ijcnlp-2009.org/ | ||
+ | | ||
+ | * 23/3/09 - castecne jsem rozmyslela osnovu, podivejte se prosim na to a sve pripominky psat primo do latexovskeho zdrojaku |