Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:hladka:playcoref [2009/02/24 11:42] hladka |
user:hladka:playcoref [2009/02/25 22:22] hladka |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Motivace ===== | + | ====== Motivace ====== |
Na t-rovině PDT 2.0 proběhlo anotování koreference, | Na t-rovině PDT 2.0 proběhlo anotování koreference, | ||
Určitě chceme mít jakékoli anotace, pro které se rozhodneme (tedy i koreferenční), | Určitě chceme mít jakékoli anotace, pro které se rozhodneme (tedy i koreferenční), | ||
+ | |||
Line 10: | Line 12: | ||
- | ===== Strategy ===== | ||
Line 17: | Line 18: | ||
- | ===== Input Texts ===== | ||
- | === Text Selection === | ||
- | * Anja's data ## // PDT data that are currently being annotated for the extended coreference // | ||
- | * more ' | ||
- | === Coding === | ||
- | * utf-8 | ||
- | === Internal format === | ||
- | * sgml ## //propose dtd file: include the element '' | ||
- | * conversion: csts <-> pml m_coref scheme | ||
- | === (Pre)processing | + | ===== Strategy |
+ | * **Hook up the words which refer to the same entity.** | ||
+ | * A game of two players. Players are paired randomly. Computer as a player: automatic coreference resolution **??????? | ||
+ | * Session time up to **???????** minutes. | ||
+ | * At the beginning, two first sentences of the document are displayed to each player. The players hook up the nouns and pronouns which refer to the same object independently of each other. If a player hooks up all the related words in the given sentences (s)he keeps in mind then (s)he asks for the next sentence of the document. The session goes on this way until the end of the session time. (// | ||
+ | * What my partner is doing? If (s)he hooks up the same pair of words as hooked up then the pair of words starts **??????? | ||
+ | * The players can re-hook up any word any time in the session. | ||
+ | * To design the game for a particular language the following data and tools are needed (or are welcome): | ||
+ | - corpus of manually anotated coreference | ||
+ | - POS tagger | ||
+ | - coreference resolution procedure | ||
- | ===== Scoring ===== | ||
- | * tagging | ||
- | * t-parser by Linh | ||
Line 42: | Line 41: | ||
- | ===== Motivační publikace ===== | ||
- | * Návrh projektu na GAČR 2009 // | ||
- | * [[http:// | ||
- | * [[http:// | ||
- | * Barbora Hladká, Kiril Ribarov: //Play the Language: An Alternative Manner of Collecting Annotated data//, 2008, ([[http:// | ||
- | * Luis von Ahn, Laura Dabish: //Labelling Images with a Computer Game//, 2004, ([[http:// | ||
+ | ===== Input Texts ===== | ||
+ | === Text Selection === | ||
+ | * CS data ^JM^ | ||
+ | * Anja's data ## // PDT data that are currently being annotated for the extended coreference // | ||
+ | * more ' | ||
+ | * **EN** | ||
+ | * search the data that are available | ||
+ | === Coding === | ||
+ | * utf-8 | ||
+ | === Internal format === | ||
+ | * sgml ## //propose dtd file: include the element '' | ||
+ | === (Pre)processing === | ||
+ | * tagging ## //see Tools needed below// | ||
+ | * acr by Linh ## // dtto // | ||
+ | === Text handling === | ||
+ | * sentence by sentence | ||
+ | * supervised selection of documents for a session | ||
Line 59: | Line 69: | ||
+ | ===== Scoring ===== | ||
+ | * '' | ||
+ | |||
+ | // w1 by mela byt nejvyssi; w2 by mela urcite nejak zohlednit uspesnost automaticke procedury - uspesnost merenou na jakych datech?; w3: kdyz hracum budeme zobrazovat i ta slova, ktera oznacil protihrac, a ja je neoznacila, nebudeme je tim tlacit do vynucene shody? pro to, aby w3 bylo ' | ||
+ | ===== Output Data Needed ===== | ||
+ | * score list ## // | ||
+ | * documents after the '' | ||
+ | * session | ||
+ | * player_A_id, | ||
+ | * document(s) | ||
+ | * number of corrections by player_A and by player_B | ||
+ | * corrections by player_A and by player_B | ||
+ | ===== Design ===== | ||
+ | * What info to be displayed in the session? | ||
+ | * session time = elapsed time + remaining time | ||
+ | * how many sentences my partner has read so far | ||
+ | * running pts **???????** | ||
+ | * Visualization of the coreference pairs | ||
+ | * colors | ||
+ | * arrows | ||
+ | * ... | ||
Line 69: | Line 100: | ||
+ | ===== Tools needed ===== | ||
+ | * tagger ^BH^ ## tool_chain (CAC2.0) | ||
+ | * Linh's coreference resolution procedure ^PS^ ## What type of input data the Linh's procedure works with? '' | ||
+ | * conversion: csts <-> pml m_coref scheme | ||
+ | ====== Motivační publikace ====== | ||
+ | * Návrh projektu na GAČR 2009 // | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * Barbora Hladká, Kiril Ribarov: //Play the Language: An Alternative Manner of Collecting Annotated data//, 2008, ([[http:// | ||
+ | * Luis von Ahn, Laura Dabish: //Labelling Images with a Computer Game//, 2004, ([[http:// | ||
- | + | ====== Anotování koreference v českých datech | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | ===== Anotování koreference v českých datech ===== | + | |
* PDT 2.0 [[http:// | * PDT 2.0 [[http:// | ||
* rozšířená koreference - viz přehled [[https:// | * rozšířená koreference - viz přehled [[https:// | ||
Line 85: | Line 121: | ||
- | ===== Automatické určování koreference v českých datech - přehled ===== | + | |
+ | ====== =Automatické určování koreference v českých datech - přehled ===== ====== | ||
* Dosavadní experimenty | * Dosavadní experimenty | ||