Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | Next revision Both sides next revision | ||
user:zeman:treebanks:fa [2012/01/29 21:10] zeman Size. |
user:zeman:treebanks:fa [2012/03/10 11:58] zeman Tokenization. |
||
---|---|---|---|
Line 40: | Line 40: | ||
Provided in the [[: | Provided in the [[: | ||
+ | |||
+ | Tokenization is subordinated to the need of displaying syntactic relations. Some orthographic words have been broken into several tokens (e.g. a verb and its object). Elsewhere a tree node (token) consists of two orthographic words (and they are not joined using the underscore character, i.e. there is a space inside the token!) (e.g. the analytical form of subjunctive preterite: " | ||
==== Sample ==== | ==== Sample ==== |