Differences

This shows you the differences between two versions of the page.

--- courses:rg:2013:memm [2013/03/18 20:27]
vandemos vytvořeno
+++ courses:rg:2013:memm [2014/10/12 15:03]
popel
@@ Line 1: / Line 1: @@
-===== Maximum Entropy Markov Models =====
+===== Maximum Entropy Markov Models - Questions =====
-**1.** Explain (roughly) how the new formula for α_t+1(s) is derived (i.e. formula 1 in the paper).
+**1. Explain (roughly) how the new formula for α_t+1(s) is derived (i.e. formula 1 in the paper).**
-**2.** Section 2.1 states "we will split P(s|s',o) into |S| separately trained transition functions". What are the advantages and disadvantages of this approach?
+**2. Section 2.1 states "we will split P(s|s',o) into |S| separately trained transition functions". What are the advantages and disadvantages of this approach?**
-**3.** Let S= {V,N} (verb and non-verb)
+**3. Let S= {V,N} (verb and non-verb)
 Training data = he/N can/V can/V a/N can/N
-Observation features are:
+//Observation features// are:
 b1 = current word is “he”
 b2 = current word is “can”
 b3 = current word is “a” and next word is “can”
 When implementing MEMM you need to define s_0, i.e. the previous state before the first token. It may be a special NULL, but for simplicity let’s define it as N.
-a) What are the states (s) and observations (o) for this training data?
+a) What are the states (s) and observations (o) for this training data?**
-b) Equation (2) defines features fa based on “observation features” b. How many features do we have?
-c) Equation (3) defines constraints. How many such constraints do we have?
-d) List all the constraints involving feature b2, i.e. substitute (whenever possible) concrete numbers into Equation (3).
-e) In step 3 of the GIS algorithm you need to compute P_s’(j)(s|o). Compute P_N(0)(N|can) and P_N(0)(V|can).
-**Hint** : You might be confused about the m_s' variable (and  t_1, …, tm_s') in Equation (3).
+**b) Equation (2) defines features f_a based on //observation features// b. How many such f_a features do we have?**
-For a given s', t_1, …, t_ms' are the time stamps where the previous state (with time stamp ti - 1) is s'. For example, in our training data:
+**c) Equation (3) defines constraints. How many such constraints do we have?**
+**d) List all the constraints involving feature b2, i.e. substitute (whenever possible) concrete numbers into Equation (3).**
+**e) In step 3 of the GIS algorithm you need to compute <latex>P_{s’}^{(j)}(s|o)</latex>. Compute <latex>P_N^{(0)}(N|can)</latex> and <latex>P_N^{(0)}(V|can)</latex>.**
+**Hint** : You might be confused about the m_s' variable (and  t_1, …, <latex>t_{m_{s'}}</latex>) in Equation (3).
+For a given s', t_1, …, <latex>t_{m_{s'}}</latex> are the time stamps where the previous state (with time stamp t_i - 1) is s'. For example, in our training data:
 for s'=N, t1=1 (because s0=N), t2=2 (because s1=N) and t3=5 (because s4=N), i.e. m_s'=3;
 for s'=V, t1=3 (because s2=V), t2=4 (because s3=V), i.e. m_s'=2.

[ Back to the navigation ] [ Back to the content ]

Institute of Formal and Applied Linguistics Wiki

Differences