[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision Both sides next revision
user:zeman:treebanks:hu [2011/12/13 13:20]
zeman Sample.
user:zeman:treebanks:hu [2011/12/13 13:32]
zeman Inside.
Line 54: Line 54:
 ==== Inside ==== ==== Inside ====
  
-Both versions (CoNLL 2007 and BDT-IIare in the CoNLL 2006/2007 format.+The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.
  
-The syntactic guidelines (structure and labels) are described in Spanish in this [[http://ixa.si.ehu.es/Ixa/Argitalpenak/Barne_txostenak/1068549887/publikoak/guia.pdf|technical report]]. See Appendix 3 for some lists of tags+Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS columnPOS column and the list of feature-value pairs in the FEAT column.
- +
-Multi-word expressions have been collapsed into one tokenusing underscore as the joining character (e.g. Espainia_Poliziak, iduri_zait).+
  
 ==== Sample ==== ==== Sample ====

[ Back to the navigation ] [ Back to the content ]