Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
user:hladka:playcoref [2009/02/24 11:34] hladka |
user:hladka:playcoref [2009/02/25 22:24] hladka |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Motivace ===== | + | ====== Motivace ====== |
Na t-rovině PDT 2.0 proběhlo anotování koreference, | Na t-rovině PDT 2.0 proběhlo anotování koreference, | ||
Line 5: | Line 6: | ||
Určitě chceme mít jakékoli anotace, pro které se rozhodneme (tedy i koreferenční), | Určitě chceme mít jakékoli anotace, pro které se rozhodneme (tedy i koreferenční), | ||
- | ===== Specification ===== | ||
- | ==== Strategy ==== | ||
- | ===== Texts ===== | ||
- | === Text Selection | + | ====== Specification ====== |
- | === Coding === | ||
- | * utf-8 | ||
- | === Internal format === | ||
- | * sgml ## to propose dtd file: //like csts.dtd and include element '' | ||
- | === (Pre)processing === | ||
- | ===== Scoring ===== | ||
- | * tagging | ||
- | * t-parser by Linh | ||
Line 32: | Line 22: | ||
- | ===== Motivační publikace | + | ===== Strategy |
- | | + | * **Hook up the words which refer to the same entity.** |
- | * [[http:// | + | * A game of two players. Players are paired randomly. Computer as a player: automatic coreference resolution **???????** |
- | * [[http:// | + | * Session time up to **???????** minutes. |
- | | + | * At the beginning, two first sentences of the document are displayed to each player. The players hook up the nouns and pronouns which refer to the same object independently |
- | | + | |
+ | * The players can re-hook up any word any time in the session. | ||
+ | * To design the game for a particular language the following data and tools are needed (or are welcome): | ||
+ | - corpus of manually anotated coreference | ||
+ | - POS tagger | ||
+ | - coreference resolution procedure | ||
Line 48: | Line 43: | ||
+ | ===== Input Texts ===== | ||
+ | === Text Selection === | ||
+ | * CS data ^JM^ | ||
+ | * Anja's data ## // PDT data that are currently being annotated for the extended coreference // | ||
+ | * more ' | ||
+ | * **EN** | ||
+ | * search the data that are available | ||
+ | === Coding === | ||
+ | * utf-8 | ||
+ | === Internal format === | ||
+ | * sgml ## //propose dtd file: include the element '' | ||
+ | === (Pre)processing === | ||
+ | * tagging ## //see Tools needed below// | ||
+ | * acr by Linh ## // dtto // | ||
+ | === Text handling === | ||
+ | * sentence by sentence | ||
+ | * supervised selection of documents for a session | ||
Line 57: | Line 69: | ||
+ | ===== Scoring ===== | ||
+ | * '' | ||
+ | |||
+ | // w1 by mela byt nejvyssi; w2 by mela urcite nejak zohlednit uspesnost automaticke procedury - uspesnost merenou na jakych datech?; w3: kdyz hracum budeme zobrazovat i ta slova, ktera oznacil protihrac, a ja je neoznacila, nebudeme je tim tlacit do vynucene shody? pro to, aby w3 bylo ' | ||
+ | ===== Output Data Needed ===== | ||
+ | * score list ## // | ||
+ | * documents after the '' | ||
+ | * session | ||
+ | * player_A_id, | ||
+ | * document(s) | ||
+ | * number of corrections by player_A and by player_B | ||
+ | * corrections by player_A and by player_B | ||
+ | ===== Design ===== | ||
+ | * What info to be displayed in the session? | ||
+ | * session time = elapsed time + remaining time | ||
+ | * how many sentences my partner has read so far | ||
+ | * running pts **???????** | ||
+ | * Visualization of the coreference pairs | ||
+ | * colors | ||
+ | * arrows | ||
+ | * ... | ||
Line 67: | Line 100: | ||
+ | ===== Tools needed ===== | ||
+ | * tagger ^BH^ ## tool_chain (CAC2.0) | ||
+ | * Linh's coreference resolution procedure ^PS^ ## What type of input data the Linh's procedure works with? '' | ||
+ | * conversion: csts <-> pml m_coref scheme | ||
- | ===== Anotování koreference v českých datech ===== | + | ====== Motivační publikace ====== |
+ | |||
+ | * Návrh projektu na GAČR 2009 // | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * Barbora Hladká, Kiril Ribarov: //Play the Language: An Alternative Manner of Collecting Annotated data//, 2008, ([[http:// | ||
+ | * Luis von Ahn, Laura Dabish: //Labelling Images with a Computer Game//, 2004, ([[http:// | ||
+ | |||
+ | |||
+ | ====== Anotování koreference v českých datech | ||
* PDT 2.0 [[http:// | * PDT 2.0 [[http:// | ||
* rozšířená koreference - viz přehled [[https:// | * rozšířená koreference - viz přehled [[https:// | ||
Line 75: | Line 121: | ||
- | ===== Automatické určování koreference v českých datech - přehled ===== | + | |
+ | |||
+ | ====== Automatické určování koreference v českých datech - přehled ====== | ||
* Dosavadní experimenty | * Dosavadní experimenty | ||
Line 84: | Line 133: | ||
- | ===== Návrh hry - brainstorming ===== | + | |
+ | ====== Návrh hry - brainstorming ====== | ||
**26/5/08 Anja, Bára:** | **26/5/08 Anja, Bára:** | ||
* Vstup: Texty v povrchové podobě, tedy NE tektogramatické stromy | * Vstup: Texty v povrchové podobě, tedy NE tektogramatické stromy |