Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
courses:rg:2012:distributed-perceptron [2012/12/16 17:08] machacek |
courses:rg:2012:distributed-perceptron [2012/12/16 21:59] machacek |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Distributed Training Strategies for the Structured Perceptron - RG report ====== | ====== Distributed Training Strategies for the Structured Perceptron - RG report ====== | ||
+ | |||
+ | ===== Presentation ===== | ||
+ | |||
+ | ==== 3 Structured Perceptron ==== | ||
+ | |||
+ | * In unstructured perceptron, you are trying to separate two sets with hyperplane. See Question 1 for the algorithm. In training phase, you iterate your training data and adjust the hyperplane every time you make a mistake. [[http:// | ||
+ | |||
+ | * Structured (or multiclass) perceptron is generalization of the unstructured perceptron. See figure 1 in the paper for the algorithm. | ||
+ | |||
+ | ==== 4 Distributed Structured Perceptron ==== | ||
+ | |||
+ | * Motivation: There is no straightforward way to make the standard perceptron algorithm parallel. | ||
+ | |||
+ | === 4.1 Parameter Mixing === | ||
+ | |||
+ | === 4.2 Iterative Parameter Mixing === | ||
+ | |||
+ | ==== 5 Experiments ==== | ||
+ | |||
+ | |||
===== Questions ===== | ===== Questions ===== | ||
Line 38: | Line 58: | ||
w = [0, 0.6] | w = [0, 0.6] | ||
+ | |||
+ | | ||
==== Question 2 ==== | ==== Question 2 ==== | ||
Line 79: | Line 101: | ||
N = argmax_N f(N, T, F, ...) | N = argmax_N f(N, T, F, ...) | ||
f = ? | f = ? | ||
+ | |||
+ | **Answer:** | ||
+ | |||
+ | We have not concluded on a particular formula. | ||
+ | * It also depends on convergence criteria. | ||
+ | * With no time limitation, the serial algorithm would have the least energy consumption. | ||
+ | * With time limitation, we should use as least shards to meet the time limitation. |