[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revision Both sides next revision
courses:rg:2013:memm [2013/03/18 20:27]
vandemos vytvořeno
courses:rg:2013:memm [2014/10/12 15:03]
popel
Line 1: Line 1:
-===== Maximum Entropy Markov Models =====+===== Maximum Entropy Markov Models - Questions =====
  
-**1.** Explain (roughly) how the new formula for α_t+1(s) is derived (i.e. formula 1 in the paper).+**1. Explain (roughly) how the new formula for α_t+1(s) is derived (i.e. formula 1 in the paper).**
  
-**2.** Section 2.1 states "we will split P(s|s',o) into |S| separately trained transition functions". What are the advantages and disadvantages of this approach?+**2. Section 2.1 states "we will split P(s|s',o) into |S| separately trained transition functions". What are the advantages and disadvantages of this approach?**
  
-**3.** Let S= {V,N} (verb and non-verb)+**3. Let S= {V,N} (verb and non-verb)
 Training data = he/N can/V can/V a/N can/N Training data = he/N can/V can/V a/N can/N
-Observation features are:+//Observation features// are:
 b1 = current word is “he” b1 = current word is “he”
 b2 = current word is “can” b2 = current word is “can”
 b3 = current word is “a” and next word is “can” b3 = current word is “a” and next word is “can”
 When implementing MEMM you need to define s_0, i.e. the previous state before the first token. It may be a special NULL, but for simplicity let’s define it as N. When implementing MEMM you need to define s_0, i.e. the previous state before the first token. It may be a special NULL, but for simplicity let’s define it as N.
-a) What are the states (s) and observations (o) for this training data? +a) What are the states (s) and observations (o) for this training data?**
-b) Equation (2) defines features fa based on “observation features” b. How many features do we have? +
-c) Equation (3) defines constraints. How many such constraints do we have? +
-d) List all the constraints involving feature b2, i.e. substitute (whenever possible) concrete numbers into Equation (3). +
-e) In step 3 of the GIS algorithm you need to compute P_s’(j)(s|o). Compute P_N(0)(N|can) and P_N(0)(V|can).+
  
-**Hint** : You might be confused about the m_s' variable (and  t_1, …, tm_s') in Equation (3). +**b) Equation (2) defines features f_a based on //observation features// b. How many such f_a features do we have?** 
-For a given s', t_1, …, t_ms' are the time stamps where the previous state (with time stamp ti - 1) is s'. For example, in our training data:+ 
 +**c) Equation (3) defines constraints. How many such constraints do we have?** 
 + 
 +**d) List all the constraints involving feature b2, i.e. substitute (whenever possible) concrete numbers into Equation (3).** 
 + 
 +**e) In step 3 of the GIS algorithm you need to compute <latex>P_{s’}^{(j)}(s|o)</latex>. Compute <latex>P_N^{(0)}(N|can)</latex> and <latex>P_N^{(0)}(V|can)</latex>.** 
 + 
 +**Hint** : You might be confused about the m_s' variable (and  t_1, …, <latex>t_{m_{s'}}</latex>) in Equation (3). 
 +For a given s', t_1, …, <latex>t_{m_{s'}}</latex> are the time stamps where the previous state (with time stamp t_i - 1) is s'. For example, in our training data:
 for s'=N, t1=1 (because s0=N), t2=2 (because s1=N) and t3=5 (because s4=N), i.e. m_s'=3; for s'=N, t1=1 (because s0=N), t2=2 (because s1=N) and t3=5 (because s4=N), i.e. m_s'=3;
 for s'=V, t1=3 (because s2=V), t2=4 (because s3=V), i.e. m_s'=2. for s'=V, t1=3 (because s2=V), t2=4 (because s3=V), i.e. m_s'=2.
  

[ Back to the navigation ] [ Back to the content ]