[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
padt:start [2011/06/30 23:59]
smrz
padt:start [2013/05/30 12:34]
zeman Zalámání vět.
Line 2: Line 2:
  
 http://ufal.mff.cuni.cz/padt/online/ http://ufal.mff.cuni.cz/padt/online/
- 
-===== Overview ===== 
  
 ===== Setup ===== ===== Setup =====
Line 9: Line 7:
 Install [[http://ufal.mff.cuni.cz/~pajas/tred/|TrEd]] including the [[http://ufal.mff.cuni.cz/~pajas/tred/extensions/padt/documentation/|padt]] and [[http://ufal.mff.cuni.cz/~pajas/tred/extensions/elixir/documentation/|elixir]] extensions from the default TrEd repository http://ufal.mff.cuni.cz/~pajas/tred/extensions/. Install [[http://ufal.mff.cuni.cz/~pajas/tred/|TrEd]] including the [[http://ufal.mff.cuni.cz/~pajas/tred/extensions/padt/documentation/|padt]] and [[http://ufal.mff.cuni.cz/~pajas/tred/extensions/elixir/documentation/|elixir]] extensions from the default TrEd repository http://ufal.mff.cuni.cz/~pajas/tred/extensions/.
  
-The SVN repository of the PADT project is https://svn.ms.mff.cuni.cz/svn/padt/. A working copy is accessible at /net/projects/ace/data/arabic/PADT/ on the UFAL network.+The SVN repository of the PADT project is https://svn.ms.mff.cuni.cz/svn/padt/ (see also [[https://svn.ms.mff.cuni.cz/trac/padt|Trac]]). A working copy is accessible at ''/net/projects/padt'' on the ÚFAL network.
  
 The project's data are stored in the main subdirectory ''data'', which is split further into ''Prague'', ''Penn'', and ''ElixirFM'', explained below. The project's data are stored in the main subdirectory ''data'', which is split further into ''Prague'', ''Penn'', and ''ElixirFM'', explained below.
Line 46: Line 44:
 data/Prague/XIN/ data/Prague/XIN/
  
-The project's contributors are ''smrz'', ''bielicky'', and ''zabokrtsky'', the rest of ''ufal'' have just the read rights.+The project's contributors are ''smrz'', ''bielicky'', ''zabokrtsky'' and ''zeman'', the rest of ''ufal'' have just the read rights.
  
 There is also the 'tools' directory which contains some useful scripts. There is also the 'tools' directory which contains some useful scripts.
  
 The code base for the PADT project, i.e. for annotation, display, and processing of the data, is the TrEd's ''padt'' extension, and its ''elixir'' extension that is a dependency for ''padt''. The code base for the PADT project, i.e. for annotation, display, and processing of the data, is the TrEd's ''padt'' extension, and its ''elixir'' extension that is a dependency for ''padt''.
 +
 ===== Agenda ===== ===== Agenda =====
 +
 +  * Write a block to read the PADT 2.0 data in Treex. An XML schema is needed.
 +  * Jak je to teď se zalámáním vět? Bude se nějak využívat prvek Unit? Současné stromy zatím pořád odpovídají odstavcům, s průměrným počtem 38 tokenů na strom. Treebank obsahuje 874 souborů (dokumentů), 7664 stromů (odstavců) a 289910 tokenů (nekořenových uzlů). Token je menší jednotka než slovo, přičemž je možné dohledat, které tokeny tvořily dohromady jedno slovo (týká se druhé tokenizace v rámci morfologické analýzy; odlepení interpunkce od slov je něco jiného).
  
 Focus on paragraphs/sentences that miss PADT-Morpho annotation, esp. non-annotated headlines: Focus on paragraphs/sentences that miss PADT-Morpho annotation, esp. non-annotated headlines:
Line 67: Line 69:
  
  
-There are some other task that have been partially solved, but need to be refreshed and completed:+There are some other tasks that have been partially solved in PADT, but need to be refreshed and completed:
  
-* Retrain the CRF++ model for tagging selected morphological categories and apply it to prune remaining morphological ambiguities. +  * Retrain the CRF++ model for tagging selected morphological categories and apply it to prune remaining morphological ambiguities. 
-* Refresh and improve the code and rules for converting PATB phrase syntax trees into dependency trees a la PADT. +  * Refresh and improve the code and rules for converting PATB phrase syntax trees into dependency trees a la PADT. 
-* Update PADT::Syntax annotation context (level synchronization, non-conflicting bindings).  +  * Update PADT::Syntax annotation context (level synchronization, non-conflicting bindings).  
-* Update PADT::Deeper annotation context (level synchronization, working schemas, modern stylesheets, non-conflicting bindings). +  * Update PADT::Deeper annotation context (level synchronization, working schemas, modern stylesheets, non-conflicting bindings). 
-* Improve documentation.+  * Improve documentation.
  
 ===== References ===== ===== References =====

[ Back to the navigation ] [ Back to the content ]