[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision Both sides next revision
user:zeman:treebanks:fa [2012/01/28 18:45]
zeman vytvořeno
user:zeman:treebanks:fa [2012/01/28 23:04]
zeman Some more changes.
Line 10: Line 10:
 ==== Obtaining and License ==== ==== Obtaining and License ====
  
-The treebank is available for free after completing the [[http://dadegan.ir/en/content/user-agreement-persian-dependency-treebank|license form]]. The license in short:+The treebank is available for free after completing the [[http://dadegan.ir/en/content/user-agreement-persian-dependency-treebank|license form]]. (You may also contact info(at)dadegan(dot)ir or Mohammad Sadegh Rasooli.) The license in short:
  
   * non-commercial research usage   * non-commercial research usage
Line 16: Line 16:
   * citation of publications not specified   * citation of publications not specified
  
-PDT was created by members of the [[http://dadegan.ir/|Dadegan Research Group]] (Секция Лингвистично моделиране)Bulgarian Academy of Sciences (Българска академия на науките), УлАкадГБончев, Бл. 25 А1113 СофияBulgaria.+PDT was created by members of the [[http://dadegan.ir/|Dadegan Research Group]] (دادگانDādegān), Computer Engineering Department, [[http://www.iust.ac.ir/|Iran University of Science and Technology]] (دانشگاه علم و صنعت ایران)Tehrān تهرانIran.
  
 ==== References ==== ==== References ====
  
   * Website   * Website
-    * http://www.bultreebank.org/indexBTB.html+    * http://dadegan.ir/en/persiandependencytreebank
   * Data   * Data
     * //no separate citation//     * //no separate citation//
   * Principal publications   * Principal publications
-    * Kiril SimovPetya OsenovaAlexander SimovMilen Kouylekov: //Design and Implementation of the Bulgarian HPSG-based Treebank.// In: Erhard Hinrichs, Kiril Simov (eds.): Journal of Research on Language and ComputationSpecial Issue, vol2no. 4pp. 495–522Kluwer Academic Publishers, ISSN 1570-7075. 2004.+    * Mohammad Sadegh RasooliAmirsaeid MoloodiManouchehr KouhestaniBehrouz Minaei-Bidgoli: [[http://dadegan.ir/sites/default/files/A%20Syntactic%20Valency%20Lexicon%20for%20Persian%20Verbs%20The%20First%20Steps%20towards%20Persian%20Dependency%20Treebank.pdf|A Syntactic Valency Lexicon for Persian Verbs: The First Steps towards Persian Dependency Treebank]]. In: 5th Language & Technology Conference (LTC): Human Language Technologies as a Challenge for Computer Science and Linguisticspp227-231PoznańPoland2011.
   * Documentation   * Documentation
-    * Kiril Simov, Petya Osenova, Milena Slavcheva: [[http://www.bultreebank.org/TechRep/BTB-TR03.pdf|BTB-TR03: BulTreeBank Morphosyntactic Tagset]]. Technical report, 2004. +    * //none so far//
-    * Petya Osenova, Kiril Simov: [[http://www.bultreebank.org/TechRep/BTB-TR05.pdf|BTB-TR05: BulTreeBank Stylebook]]. Technical report, 2004. +
-    * http://www.bultreebank.org/dpbtb/ provides the list of dependency relation labels (s-tags) with brief description.+
  
 ==== Domain ==== ==== Domain ====
  
-Unknown (“A set of Bulgarian sentences marked-up with detailed syntactic information. These sentences are mainly extracted from authentic Bulgarian texts. They are chosen with regards two criteria. First, they cover the variety of syntactic structures of Bulgarian. Second, they show the statistical distribution of these phenomena in real texts.”) At least part of it is probably news (Novinar, Sega, Standart).+Unknown.
  
 ==== Size ==== ==== Size ====
  
-The CoNLL 2006 version contains 196,151 tokens in 13221 sentences, yielding 14.84 tokens per sentence on average (CoNLL 2006 data split: 190,217 tokens / 12823 sentences training, 5934 tokens / 398 sentences test).+Unknown.
  
 ==== Inside ==== ==== Inside ====
Line 48: Line 46:
  
 ==== Sample ==== ==== Sample ====
- 
-The first three sentences of the CoNLL 2006 training data: 
- 
-| 1 | Глава | _ | N | Nc | _ | 0 | ROOT | 0 | ROOT | 
-| 2 | трета | _ | M | Mo | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 1 | mod | 1 | mod | 
-| |||||||||| 
-| 1 | НАРОДНО | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | 
-| 2 | СЪБРАНИЕ | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | 
-| |||||||||| 
-| 1 | Народното | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod | 2 | mod | 
-| 2 | събрание | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 3 | subj | 3 | subj | 
-| 3 | осъществява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 0 | ROOT | 0 | ROOT | 
-| 4 | законодателната | _ | A | Af | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 5 | mod | 5 | mod | 
-| 5 | власт | _ | N | Nc | _ | 3 | obj | 3 | obj | 
-| 6 | и | _ | C | Cp | _ | 3 | conj | 3 | conj | 
-| 7 | упражнява | _ | V | Vpi | trans=t<nowiki>|</nowiki>mood=i<nowiki>|</nowiki>tense=r<nowiki>|</nowiki>pers=3<nowiki>|</nowiki>num=s | 3 | conjarg | 3 | conjarg | 
-| 8 | парламентарен | _ | A | Am | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 9 | mod | 9 | mod | 
-| 9 | контрол | _ | N | Nc | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 7 | obj | 7 | obj | 
-| 10 | . | _ | Punct | Punct | _ | 3 | punct | 3 | punct | 
- 
-The first three sentences of the CoNLL 2006 test data: 
- 
-| 1 | Единственото | _ | A | An | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=d | 2 | mod | 2 | mod | 
-| 2 | решение | _ | N | Nc | gen=n<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | 
-| |||||||||| 
-| 1 | Ерик | _ | N | Np | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | 
-| 2 | Франк | _ | N | Np | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 1 | mod | 1 | mod | 
-| 3 | Ръсел | _ | H | Hm | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | 
-| |||||||||| 
-| 1 | Пълен | _ | A | Am | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 2 | mod | 2 | mod | 
-| 2 | мрак | _ | N | Nc | gen=m<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 0 | ROOT | 0 | ROOT | 
-| 3 | и | _ | C | Cp | _ | 2 | conj | 2 | conj | 
-| 4 | пълна | _ | A | Af | gen=f<nowiki>|</nowiki>num=s<nowiki>|</nowiki>def=i | 5 | mod | 5 | mod | 
-| 5 | самота | _ | N | Nc | _ | 2 | conjarg | 2 | conjarg | 
-| 6 | . | _ | Punct | Punct | _ | 2 | punct | 2 | punct | 
  
 ==== Parsing ==== ==== Parsing ====

[ Back to the navigation ] [ Back to the content ]