Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
user:zeman:treebank-engineering [2011/06/21 17:32] zeman Making and viewing. |
user:zeman:treebank-engineering [2011/07/01 12:08] (current) zeman References. |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Treebank Engineering ====== | ====== Treebank Engineering ====== | ||
- | This page is a place for notes on the project where we experiment with various dependency constructions and their transformations encountered in treebanks. **Feel free to edit and add new stuff!** | + | This page is a place for notes on the project where we experiment with various dependency constructions and their transformations encountered in treebanks. **Feel free to edit and to add new stuff!** |
The project could eventually lead to a journal article. The SVN storage for the article and related materials is at [[http:// | The project could eventually lead to a journal article. The SVN storage for the article and related materials is at [[http:// | ||
- | Current participants: | + | Current participants: |
Our basic strategy is as follows: | Our basic strategy is as follows: | ||
Line 13: | Line 13: | ||
The special CL issue is on parsing “morphologically rich” languages, so we will have to devote some effort to arguing how our observations relate to that group of languages (however vaguely they are defined). | The special CL issue is on parsing “morphologically rich” languages, so we will have to devote some effort to arguing how our observations relate to that group of languages (however vaguely they are defined). | ||
+ | |||
+ | ===== Some Unsorted References ===== | ||
+ | |||
+ | * Dan's old PBML article about inconsistent annotation rules in PDT 1.0 ("How to Decrease Performance of a Statistical Parser" | ||
+ | * All references required by the providers of the respective treebanks. | ||
+ | * Interset (the LREC paper is better?) | ||
===== Data ===== | ===== Data ===== | ||
Line 36: | Line 42: | ||
The purpose of the initial normalization is to make the treebank look as close to PDT as possible. Normalization involves dependency structure, syntactic tags (afuns), and, if possible, morphological tags (using [[interset|DZ Interset]]). The transformations applied during this process are important inspiration to what various treebanks do differently and what we may want to experiment with later. | The purpose of the initial normalization is to make the treebank look as close to PDT as possible. Normalization involves dependency structure, syntactic tags (afuns), and, if possible, morphological tags (using [[interset|DZ Interset]]). The transformations applied during this process are important inspiration to what various treebanks do differently and what we may want to experiment with later. | ||
- | Unless specified otherwise, normalization is done using Treex (TectoMT). See '' | + | Unless specified otherwise, normalization is done using Treex ([[internal: |
==== Bulgarian ==== | ==== Bulgarian ==== | ||
Line 46: | Line 52: | ||
There is a [[http:// | There is a [[http:// | ||
- | * Coordination is Mel' | + | * Coordination is Mel' |
+ | * Sentence-initial coordinating conjunction (such as in //But he believed that...//) is attached to the verb. In the Prague style this is coordination with a single member: the clause. Thus the conjunction is attached to the root and the verb is attached to the conjunction. | ||
* Preposition governs its noun phrase (so far same as PDT). However, rhematizers are attached to the preposition, | * Preposition governs its noun phrase (so far same as PDT). However, rhematizers are attached to the preposition, | ||
* Final punctuation is attached to the main verb or other “real ROOT” node (not our artificial empty root). | * Final punctuation is attached to the main verb or other “real ROOT” node (not our artificial empty root). |