Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:hladka:playcoref [2009/02/25 22:24] hladka |
user:hladka:playcoref [2009/02/26 12:02] hladka |
||
---|---|---|---|
Line 6: | Line 6: | ||
Určitě chceme mít jakékoli anotace, pro které se rozhodneme (tedy i koreferenční), | Určitě chceme mít jakékoli anotace, pro které se rozhodneme (tedy i koreferenční), | ||
- | |||
- | |||
- | |||
- | ====== Specification ====== | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | ===== Strategy ===== | ||
- | * **Hook up the words which refer to the same entity.** | ||
- | * A game of two players. Players are paired randomly. Computer as a player: automatic coreference resolution **???????** | ||
- | * Session time up to **???????** minutes. | ||
- | * At the beginning, two first sentences of the document are displayed to each player. The players hook up the nouns and pronouns which refer to the same object independently of each other. If a player hooks up all the related words in the given sentences (s)he keeps in mind then (s)he asks for the next sentence of the document. The session goes on this way until the end of the session time. (// | ||
- | * What my partner is doing? If (s)he hooks up the same pair of words as hooked up then the pair of words starts **??????? | ||
- | * The players can re-hook up any word any time in the session. | ||
- | * To design the game for a particular language the following data and tools are needed (or are welcome): | ||
- | - corpus of manually anotated coreference | ||
- | - POS tagger | ||
- | - coreference resolution procedure | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | ===== Input Texts ===== | ||
- | |||
- | === Text Selection === | ||
- | * CS data ^JM^ | ||
- | * Anja's data ## // PDT data that are currently being annotated for the extended coreference // | ||
- | * more ' | ||
- | * **EN** | ||
- | * search the data that are available | ||
- | === Coding === | ||
- | * utf-8 | ||
- | |||
- | === Internal format === | ||
- | * sgml ## //propose dtd file: include the element '' | ||
- | |||
- | === (Pre)processing === | ||
- | * tagging ## //see Tools needed below// | ||
- | * acr by Linh ## // dtto // | ||
- | |||
- | === Text handling === | ||
- | * sentence by sentence | ||
- | * supervised selection of documents for a session | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | ===== Scoring ===== | ||
- | * '' | ||
- | |||
- | // w1 by mela byt nejvyssi; w2 by mela urcite nejak zohlednit uspesnost automaticke procedury - uspesnost merenou na jakych datech?; w3: kdyz hracum budeme zobrazovat i ta slova, ktera oznacil protihrac, a ja je neoznacila, nebudeme je tim tlacit do vynucene shody? pro to, aby w3 bylo ' | ||
- | |||
- | |||
- | |||
- | ===== Output Data Needed ===== | ||
- | * score list ## // | ||
- | * documents after the '' | ||
- | * session | ||
- | * player_A_id, | ||
- | * document(s) | ||
- | * number of corrections by player_A and by player_B | ||
- | * corrections by player_A and by player_B | ||
- | |||
- | |||
- | ===== Design ===== | ||
- | * What info to be displayed in the session? | ||
- | * session time = elapsed time + remaining time | ||
- | * how many sentences my partner has read so far | ||
- | * running pts **???????** | ||
- | * Visualization of the coreference pairs | ||
- | * colors | ||
- | * arrows | ||
- | * ... | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | ===== Tools needed ===== | ||
- | * tagger ^BH^ ## tool_chain (CAC2.0) | ||
- | * Linh's coreference resolution procedure ^PS^ ## What type of input data the Linh's procedure works with? '' | ||
- | * conversion: csts <-> pml m_coref scheme | ||
====== Motivační publikace ====== | ====== Motivační publikace ====== | ||
Line 119: | Line 21: | ||
* [[http:// | * [[http:// | ||
* Projekt anotace rozšířené textové koreference a bridging vztahů v PDT. (Anja Nedolužko: [[http:// | * Projekt anotace rozšířené textové koreference a bridging vztahů v PDT. (Anja Nedolužko: [[http:// | ||
+ | |||
+ | |||
+ | |||
Line 124: | Line 29: | ||
====== Automatické určování koreference v českých datech - přehled ====== | ====== Automatické určování koreference v českých datech - přehled ====== | ||
+ | * Experiments with Czech so far | ||
+ | - Nguy Giang Linh: Návrh souboru pravidel pro analýzu anafor v českém jazyce (A set of rules for anaphora resolution in Czech), MFF UK 2006. **Available: | ||
+ | - Nguy Giang Linh; Žabokrtský, | ||
+ | * Linh's procedure | ||
+ | |||
- | * Dosavadní experimenty | ||
Line 168: | Line 77: | ||
====== Specification ====== | ====== Specification ====== | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
Line 184: | Line 84: | ||
* A game of two players. Players are paired randomly. Computer as a player: automatic coreference resolution **???????** | * A game of two players. Players are paired randomly. Computer as a player: automatic coreference resolution **???????** | ||
* Session time up to **???????** minutes. | * Session time up to **???????** minutes. | ||
- | * At the beginning, | + | * At the beginning |
- | * What my partner is doing? If (s)he hooks up the same pair of words as hooked up then the pair of words starts **??????? | + | * What my partner is doing? If (s)he hooks up the same pair of words as I hooked up then the pair of words starts **??????? |
* The players can re-hook up any word any time in the session. | * The players can re-hook up any word any time in the session. | ||
* To design the game for a particular language the following data and tools are needed (or are welcome): | * To design the game for a particular language the following data and tools are needed (or are welcome): | ||
Line 191: | Line 91: | ||
- POS tagger | - POS tagger | ||
- coreference resolution procedure | - coreference resolution procedure | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
Line 207: | Line 101: | ||
* Anja's data ## // PDT data that are currently being annotated for the extended coreference // | * Anja's data ## // PDT data that are currently being annotated for the extended coreference // | ||
* more ' | * more ' | ||
+ | * **^JM^**: It would be nice if the players could choose a domain of the texts to play on (science-fiction, | ||
* **EN** | * **EN** | ||
* search the data that are available | * search the data that are available | ||
Line 222: | Line 117: | ||
* sentence by sentence | * sentence by sentence | ||
* supervised selection of documents for a session | * supervised selection of documents for a session | ||
- | |||
- | |||
- | |||
===== Scoring ===== | ===== Scoring ===== | ||
- | * '' | + | * '' |
- | // w1 by mela byt nejvyssi; w2 by mela urcite nejak zohlednit uspesnost automaticke procedury - uspesnost merenou na jakych datech?; w3: kdyz hracum budeme zobrazovat i ta slova, ktera oznacil protihrac, a ja je neoznacila, nebudeme je tim tlacit | + | **^JM^**: |
+ | Já myslím, že do shody je tlačit chceme. Je žádoucí, aby anotace byla co nejúplnější. Když druhý hráč uvidí, že první hráč spojil nějaké slovo, vyvíjí to na něj tlak, aby se podíval, jestli to | ||
+ | nepřehlédl a jestli by ho nemohl zapojit také. Neukazuje se mu kam, takže když nenajde žádný cíl, nezapojí ho a bude se radovat, že první hráč udělal nějakou chybu. | ||
+ | Myslím, že ta funkce by měla brát **buď** automatickou anotaci **nebo** manuální, podle toho, co je k dispozici. Rovněž si teď myslím, že manuálně anotovaná data budeme používat minimálně - pouze pro změření úspěšnosti anotace pomocí hry - to ale nemusí být vůbec součástí skóre hry, to se udělá off-line. Manuálně anotovaných dat máme málo, jsou už anotovaná a nejsou zábavná. Z toho mi vyplývá, že bych manuální anotaci pro určování skóre nebral vůbec v úvahu a ze vzorečku nahoře bych první člen vyhodil. | ||
+ | **^BH^**: Jirka ma pravdu. Pocitani skore musi byt objektivni. Proto jsem vzorecek upravila tak, ze nebude pocitat shodu hrace vzhledem k rucni anotaci (je-li k dispozici). | ||
===== Output Data Needed ===== | ===== Output Data Needed ===== | ||
* score list ## // | * score list ## // | ||
- | * documents after the '' | + | * documents after the '' |
* session | * session | ||
* player_A_id, | * player_A_id, | ||
* document(s) | * document(s) | ||
- | * number of corrections by player_A and by player_B | + | * number of corrections by player_A and by player_B |
- | * corrections by player_A and by player_B | + | * corrections by player_A and by player_B |
===== Design ===== | ===== Design ===== | ||
Line 248: | Line 143: | ||
* session time = elapsed time + remaining time | * session time = elapsed time + remaining time | ||
* how many sentences my partner has read so far | * how many sentences my partner has read so far | ||
- | * running pts **???????** | + | * running pts **??????? |
+ | * Format of the text | ||
+ | * JM: nouns and pronouns might be displayed slightly differently so that the user avoids other parts of speech easily; he should not be allowed to use other parts of speech at all | ||
* Visualization of the coreference pairs | * Visualization of the coreference pairs | ||
* colors | * colors | ||
- | * arrows | + | * arrows |
* ... | * ... | ||
+ | |||
Line 260: | Line 158: | ||
===== Tools needed ===== | ===== Tools needed ===== | ||
* tagger ^BH^ ## tool_chain (CAC2.0) | * tagger ^BH^ ## tool_chain (CAC2.0) | ||
- | * Linh's coreference resolution procedure ^PS^ ## What type of input data the Linh's procedure works with? '' | + | * Linh's coreference resolution procedure ^PS^ ## What type of input data the Linh's procedure works with? '' |
* conversion: csts <-> pml m_coref scheme | * conversion: csts <-> pml m_coref scheme | ||
- | |||
- |