This is an old revision of the document!
Table of Contents
Prague Arabic Dependency Treebank
Overview
Setup
Install TrEd including the padt and elixir extensions from the default TrEd repository http://ufal.mff.cuni.cz/~pajas/tred/extensions/.
The SVN repository of the PADT project is https://svn.ms.mff.cuni.cz/svn/padt/. A working copy is accessible at /net/projects/ace/data/arabic/PADT/ on the UFAL network.
The project's data are stored in the main subdirectory data
, which is split further into Prague
, Penn
, and ElixirFM
, explained below.
Try opening a PADT file to check if your setup is complete. Run TrEd and open the following files. They should automatically set their editing contexts and stylesheets to PADT::Morpho and PADT::Syntax, respectively:
tred /net/projects/ace/data/arabic/PADT/data/Prague/AEP/UMH_ARB_20040407.0001.{morpho,syntax}.pml
Locations
The SVN repository of the PADT project is https://svn.ms.mff.cuni.cz/svn/padt/. The main subdirectory data
is split into ElixirFM
, Prague
, and Penn
. Further:
data/ElixirFM/
data/Penn/1v3/
data/Penn/2v2/
data/Penn/3v2/
data/Penn/4v1/
data/Prague/AEP/
data/Prague/ASB/
data/Prague/EAT/
data/Prague/HYT/
data/Prague/NHR/
data/Prague/XIN/
The project's contributors are smrz
, bielicky
, and zabokrtsky
, the rest of ufal
have just the read rights.
There is also the 'tools' directory which contains some useful scripts.
The code base for the PADT project, i.e. for annotation, display, and processing of the data, is the TrEd's padt
extension, and its elixir
extension that is a dependency for padt
.
Agenda
Focus on paragraphs/sentences that miss PADT-Morpho annotation, esp. non-annotated headlines:
btred -QTe '@w = $this->children(); @n = grep { $_->children() } @w; print ThisAddress() . "\n" if @n < 0.9 * @w' Penn/???/*.morpho*.pml
Focus on nodes in PADT-Syntax that do not have a valid afun
annotation:
btred -QTNe 'print ThisAddress() . "\n" if exists $this->{"afun"} and $this->{"afun"} eq "???"' Prague/???/*.syntax*.pml