Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
padt:start [2011/05/27 00:34] smrz |
padt:start [2013/05/30 12:34] zeman Zalámání vět. |
||
---|---|---|---|
Line 3: | Line 3: | ||
http:// | http:// | ||
- | ===== Overview | + | ===== Setup ===== |
+ | |||
+ | Install [[http:// | ||
+ | |||
+ | The SVN repository of the PADT project is https:// | ||
+ | |||
+ | The project' | ||
+ | |||
+ | Try opening a PADT file to check if your setup is complete. Run TrEd and open the following files. They should automatically set their editing contexts and stylesheets to PADT:: | ||
+ | |||
+ | <code bash> | ||
+ | tred / | ||
+ | </ | ||
+ | |||
+ | For improved quality of display of the various scripts and trees types, you can use the following setup in TrEd's config file, or similar: | ||
+ | |||
+ | < | ||
+ | Font = " | ||
+ | |||
+ | NodeXSkip = 30; | ||
+ | NodeYSkip = 10; | ||
+ | </ | ||
===== Locations ===== | ===== Locations ===== | ||
Line 23: | Line 44: | ||
data/ | data/ | ||
- | The project' | + | The project' |
There is also the ' | There is also the ' | ||
Line 32: | Line 52: | ||
===== Agenda ===== | ===== Agenda ===== | ||
- | ===== References ===== | + | * Write a block to read the PADT 2.0 data in Treex. An XML schema is needed. |
+ | * Jak je to teď se zalámáním vět? Bude se nějak využívat prvek Unit? Současné stromy zatím pořád odpovídají odstavcům, s průměrným počtem 38 tokenů na strom. Treebank obsahuje 874 souborů (dokumentů), | ||
+ | Focus on paragraphs/ | ||
+ | |||
+ | <code bash> | ||
+ | btred -QTe '@w = $this-> | ||
+ | </ | ||
+ | |||
+ | |||
+ | Focus on nodes in PADT-Syntax that do not have a valid '' | ||
+ | |||
+ | <code bash> | ||
+ | btred -QTNe 'print ThisAddress() . " | ||
+ | </ | ||
+ | |||
+ | |||
+ | There are some other tasks that have been partially solved in PADT, but need to be refreshed and completed: | ||
+ | |||
+ | * Retrain the CRF++ model for tagging selected morphological categories and apply it to prune remaining morphological ambiguities. | ||
+ | * Refresh and improve the code and rules for converting PATB phrase syntax trees into dependency trees a la PADT. | ||
+ | * Update PADT:: | ||
+ | * Update PADT:: | ||
+ | * Improve documentation. | ||
+ | |||
+ | ===== References ===== |