Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
user:zeman:treebanks [2011/11/01 22:10] zeman vytvořeno |
user:zeman:treebanks [2012/01/28 17:44] zeman Persian (Farsi). |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Treebanks for Various Languages ====== | ====== Treebanks for Various Languages ====== | ||
- | ===== Arabic (ar) ===== | + | * [[user: |
- | + | * [[user: | |
- | Prague Arabic Dependency Treebank | + | * [[user: |
- | + | * [[user: | |
- | ==== Versions ==== | + | * [[user: |
- | + | * [[user: | |
- | * Original PADT 1.0 as distributed by the LDC | + | * [[user: |
- | * CoNLL 2006 | + | * [[user: |
- | * CoNLL 2007 | + | |
- | + | * [[user: | |
- | The CoNLL 2007 version reportedly improves over CoNLL 2006 in quality of morphological annotation. | + | * [[user:zeman: |
- | + | * [[user:zeman: | |
- | ==== Obtaining and License ==== | + | * [[user: |
- | + | * [[user: | |
- | The original PADT 1.0 is distributed by the LDC under the catalogue number | + | * [[user:zeman: |
- | + | * [[user:zeman: | |
- | * non-commercial research usage | + | * [[user:zeman: |
- | * no redistribution | + | * [[user: |
- | * cite [[http:// | + | * [[user: |
- | + | * [[user: | |
- | The CoNLL 2006 and 2007 versions are obtainable upon request under similar license terms. Their publication in the LDC together with the other CoNLL treebanks | + | * [[user:zeman: |
- | + | * [[user: | |
- | ==== Domain ==== | + | * [[user: |
- | + | * [[user: | |
- | Newswire text from press agencies | + | * [[user: |
- | + | * [[user: | |
- | ==== Size ==== | + | |
- | + | ||
- | According to their website, the original PADT 1.0 contains 113,500 tokens annotated analytically. The CoNLL 2007 version contains 116,793 tokens in 3043 sentences, yielding 38.38 tokens per sentence on average | + | |
- | + | ||
- | ==== TO DO ==== | + | |
- | + | ||
- | * Documentation of m-tags, s-tags etc., tokenization issues, vocalization, | + | |
- | * Link to website, citation of the data (LDC), citation of the main publication | + | |
- | * Sample sentence, statistics of nonprojectivity etc. | + | |
- | * Known published parsing accuracies, especially CoNLL 2006 and 2007 | + | |
- | * Missing is_member attribute in CoNLL versions | + |