[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Last revision Both sides next revision
user:zeman:treebanks:it [2012/01/03 15:38]
zeman Sample.
user:zeman:treebanks:it [2012/01/03 15:43]
zeman Inside.
Line 42: Line 42:
 ==== Inside ==== ==== Inside ====
  
-The original Szeged Treebank is a phrase-based treebank and it is distributed in XML-based, TEI-compliant format. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.+The original ISST is a phrase-based treebank. The CoNLL 2007 version is dependency-based (i.e. the head of each phrase was identified), distributed in the CoNLL 2006/2007 format.
  
-Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. The tagset used in SzTB seems to be same or similar to [[http://nl.ijs.si/ME/V4/msd/html/msd-hu.html|Multext-East]]. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.+Morphological annotation includes lemmas. Morphosyntactic tags were probably disambiguated manually. In the CoNLL version, tags were decomposed into CPOS column, POS column and the list of feature-value pairs in the FEAT column.
  
-Personal names have been collapsed into one token, using underscore as the joining character (e.g. Torgyán_József).+Multi-word expressions have been collapsed into one token, using underscore as the joining character (e.g. a_causa_di).
  
 ==== Sample ==== ==== Sample ====

[ Back to the navigation ] [ Back to the content ]