Differences

This shows you the differences between two versions of the page.

--- courses:rg:2012:segments [2012/12/29 15:22]
bilek
+++ courses:rg:2012:segments [2013/01/03 22:38] (current)
popel
@@ Line 1: / Line 1: @@
-===Introduction, Motivation, Segments===
+=====Introduction, Motivation, Segments=====
 We introduced the basic idea of Czech sentence segmentation and the Czech sentence boundaries. We showed the segmentation chart on an example.
-===Experiments with Automatic Identification of Segmentation Charts===
+=====Experiments with Automatic Identification of Segmentation Charts=====
-==How to Obtain Segments from Syntactic Tree?==
+====How to Obtain Segments from Syntactic Tree?====
 We are unsure of the exact definition of Edge and Path between the segments in this part.
@@ Line 21: / Line 21: @@
 The most important question, though, is why do we do all this, because the data from the PDT tree are more thorough than the segments that we want to create! So, what is the exact reason?
-) it is not because there are the training data
+) To prepare training data?
-) it is not because we use it as a testing data, because it has only 70% accuracy
+> Probably no, because they don't use any machine learning approach.
-) It can be to show that it is difficult to create from the analytical tree, too
+) To prepare testing data?
-) We can use it as a "oracle experiment" - how far can we go with plaintext?
+> No. Because they already have some manually annotated sentences. Moreover, the described approach (using PDT gold a-trees on input) has only 70% accuracy.
+) As an "oracle experiment" - using gold a-trees is an upper bound for using plaintext only.
+> Probably no. There are better algorithms (with higher precision than 70%) exploiting gold a-trees.
+) To show some difficult cases with creating segmentation charts (even when gold a-trees are available).
+> Maybe.
 ) It can be just to fill up the space :)
+> ?
-==How to Obtain Segments from Plain Text?==
+====How to Obtain Segments from Plain Text?====
 On the beginning we talk about the basic set of rules for subordination. They are some that could be made better; for example, the quotes for highlightning.
@@ Line 36: / Line 41: @@
-===Evaluation and Analysis of the Results===
+=====Evaluation and Analysis of the Results=====
-==Evaluation of Rules for Syntactic Trees==
+====Evaluation of Rules for Syntactic Trees====
 Is 57% enough? 73% sounds like a more important number, but it is still not enough.
@@ Line 49: / Line 54: @@
 "Kočka, která honila myš zemřela" --> it is not a correct Czech, tectomt parser could live with that, but segmentation according to this article wouldn't.
-==Evaluation of Rules for Plain Text==
+====Evaluation of Rules for Plain Text====
 The question - why are these results better than the first experiment?
@@ Line 58: / Line 63: @@
 What means "ambiguity 1.32"? That's another, confusing name for path number for one sentence.
-===Conclusion===
+=====Conclusion=====
 Nice idea - we can do some quick, but reliable preprocessing. However, the authors don't show how much it's helping the parsers (if it does). We don't even see the precision written.
@@ Line 64: / Line 69: @@
 It is slightly light on information, and it's strange, that it is continuing a paper from the year 2001.
-===Questions===
+=====Questions=====
 We can do step with the size 2 with indirect speech.
 řekl, že když se budeš modlit, tak se ti přání splní

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences