Differences

This shows you the differences between two versions of the page.

--- courses:rg:2012:distributed-perceptron [2012/12/16 23:02]
machacek
+++ courses:rg:2012:distributed-perceptron [2012/12/16 23:44] (current)
machacek
@@ Line 10: / Line 10: @@
   * You can use any structured input x (not just vector, sentence for example) and any structured output y (not just binary value, parse tree for example)
   * You need to have fuction f(x,y) which returns feature representation of candidate input-output pair
-  * Using the Theorem 1, you can bound the number of mistakes made during the training (the computational time is therefore also bounded)
+  * Using the Theorem 1, you can bound the number of mistakes made during the training
+    * The computational time is therefore also bounded.
+    * This holds only for linearly separable sets.
+  * Other remarks and discussed issues
+    * The perceptron training algorithm does not always return the same weights (unlike maximal margin). It depends on order of training data.
+    * How the inference is done in the difficult tasks like parsing? Iterating all possible y? Approximation?
 ==== 4 Distributed Structured Perceptron ====
@@ Line 17: / Line 22: @@
 === 4.1 Parameter Mixing ===
+  * First attempt to map-reduce algorithm
+  * Divide the training data into shards
+  * In MAP step, train a perceptron on each shard
+  * In REDUCE step, average the trained weight vectors
 === 4.2 Iterative Parameter Mixing ===

Institute of Formal and Applied Linguistics Wiki