Differences

This shows you the differences between two versions of the page.

--- courses:rg:2012:distributed-perceptron [2012/12/16 16:46]
machacek
+++ courses:rg:2012:distributed-perceptron [2012/12/16 21:07]
machacek
@@ Line 1: / Line 1: @@
 ====== Distributed Training Strategies for the Structured Perceptron - RG report ======
+===== Presentation =====
+==== 3 Structured Perceptron ====
+==== 4 Distributed Structured Perceptron ====
+  * Motivation: There is no straightforward way to make the standard perceptron algorithm parallel.
+=== 4.1 Parameter Mixing ===
+=== 4.2 Iterative Parameter Mixing ===
+==== 5 Experiments ====
 ===== Questions =====
@@ Line 58: / Line 74: @@
 ==== Question 3 ====
 In figure 4, why do you think that the F-measure for Regular Perceptron (first column) learned by the Serial (All Data) algorithm is worse than the Parallel (Iterative Parametere Mix)?
+**Answer:**
+  * Iterative Parameter Mixing is just a form of parameter averaging, which has the same effect as the averaged perceptron.
+    * F-measures for seral (All Data) and Paralel (Iterative Parameter Mix) are very similar in the second column. It is because the both methods are already averaged.
+  * Bagging like effect
 ==== Question 4 ====
@@ Line 72: / Line 95: @@
 N = argmax_N f(N, T, F, ...)
 f = ?
+**Answer:**
+We have not concluded on a particular formula.
+  * It also depends on convergence criteria.
+  * With no time limitation, the serial algorithm would have the least energy consumption.
+  * With time limitation, we should use as least shards to meet the time limitation.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences