Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
pml-haters [2007/05/02 04:43] bojar |
pml-haters [2007/06/05 06:35] (current) bojar jen formatovani |
||
|---|---|---|---|
| Line 3: | Line 3: | ||
| (Stránku zkusím psát anglicky. Myslím, že by se mohla hodit i mezinárodnímu publiku, jestli s PML prorazíme.) | (Stránku zkusím psát anglicky. Myslím, že by se mohla hodit i mezinárodnímu publiku, jestli s PML prorazíme.) | ||
| - | Inspired by [[http:// | + | Inspired by [[http:// |
| Links to additional tools are at the bottom of the page. | Links to additional tools are at the bottom of the page. | ||
| + | |||
| + | I strongly recommend [[http:// | ||
| ===== In Spite of some Common Assumptions... ===== | ===== In Spite of some Common Assumptions... ===== | ||
| Line 16: | Line 18: | ||
| ===== Validation ===== | ===== Validation ===== | ||
| - | Given a PML file, how do I validate it? I always forget... Please provide me with the one-liner to do the validation. | + | Given a PML file, how do I validate it? |
| + | For most purposes, a libxml2 (DOM) based validator | ||
| + | < | ||
| + | For huge files, use< | ||
| + | Both scripts have decent user documentation. See inside the scripts if interested in the implementation details. | ||
| ===== XSH Won't Work: Blame XML Namespaces ===== | ===== XSH Won't Work: Blame XML Namespaces ===== | ||
| Line 39: | Line 45: | ||
| The reason why '' | The reason why '' | ||
| - | < | + | < |
| $f/>cd pml:tdata | $f/>cd pml:tdata | ||
| $f/ | $f/ | ||
| </ | </ | ||
| + | |||
| + | Hint: add the regns command to your ~/.xsh2rc. | ||
| You will have to write the '' | You will have to write the '' | ||
| - | Most probably you'll still face problems when accessing attributes of XML elements, because namespacing rules apply differently to attributes and elements. | + | Most probably you'll still face problems when accessing attributes of XML elements, because namespacing rules apply differently to attributes and elements. |
| - | + | ||
| - | + | ||
| - | + | ||
| - | + | ||
| - | + | ||
| - | + | ||
| ===== Number of Sentences ===== | ===== Number of Sentences ===== | ||
| Line 63: | Line 64: | ||
| This XPath would quickly give you the number of sentences: | This XPath would quickly give you the number of sentences: | ||
| - | < | + | < |
| </ | </ | ||
| Line 72: | Line 73: | ||
| < | < | ||
| | xsh -I - -C "regns pml http:// | | xsh -I - -C "regns pml http:// | ||
| + | </ | ||
| + | |||
| + | or just the following, if you have the regns command in your ~/.xsh2rc: | ||
| + | |||
| + | < | ||
| + | | xsh -I - -C " | ||
| </ | </ | ||
| Line 87: | Line 94: | ||
| </ | </ | ||
| + | Here is a one-liner in Perl that does not load the whole file into memory: | ||
| + | |||
| + | < | ||
| + | </ | ||
| ===== Restricting a Suite of PML Files to Contain only a Specific Sentence ===== | ===== Restricting a Suite of PML Files to Contain only a Specific Sentence ===== | ||
| Line 92: | Line 103: | ||
| Let's assume there is a bug in a script (a bug? impossible!) that handles a suite of files (file-w.xml, | Let's assume there is a bug in a script (a bug? impossible!) that handles a suite of files (file-w.xml, | ||
| - | How do I create a suite of files with just the problematic sentence 345, i.e. files test-w.xml, test-m.xml, test-a.xml and test-t.xml, all properly referenced? | + | How do I create a suite of files with just the problematic sentence 345, i.e. files test-w.xml, test-m.xml, test-a.xml and test-t.xml, all properly referenced? |
| + | < | ||
| + | </ | ||
| + | |||
| + | Creating such a suite is problematic because there can exist links from sentence 345 to previous sentences (from t-layer to a-layer for elided words, within t-layer for coreference). The above mentioned script does not take this issue into account. | ||
| ===== Links to Useful Tools ===== | ===== Links to Useful Tools ===== | ||
| Line 100: | Line 115: | ||
| [[http:// | [[http:// | ||
| - | |||
