[ Skip to the content ]

Institute of Formal and Applied Linguistics Wiki


[ Back to the navigation ]

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
courses:rg:2013:memm [2013/03/18 20:27]
vandemos vytvořeno
courses:rg:2013:memm [2014/10/12 15:04] (current)
popel
Line 1: Line 1:
-===== Maximum Entropy Markov Models =====+===== Maximum Entropy Markov Models - Questions =====
  
-**1.** Explain (roughly) how the new formula for α_t+1(s) is derived (i.e. formula 1 in the paper).+1. Explain (roughly) how the new formula for α_t+1(s) is derived (i.e. formula 1 in the paper).
  
-**2.** Section 2.1 states "we will split P(s|s',o) into |S| separately trained transition functions". What are the advantages and disadvantages of this approach?+2. Section 2.1 states "we will split P(s|s',o) into |S| separately trained transition functions". What are the advantages and disadvantages of this approach?
  
-**3.** Let S= {V,N} (verb and non-verb)+3. Let S= {V,N} (verb and non-verb)
 Training data = he/N can/V can/V a/N can/N Training data = he/N can/V can/V a/N can/N
-Observation features are:+//Observation features// are:
 b1 = current word is “he” b1 = current word is “he”
 b2 = current word is “can” b2 = current word is “can”
Line 13: Line 13:
 When implementing MEMM you need to define s_0, i.e. the previous state before the first token. It may be a special NULL, but for simplicity let’s define it as N. When implementing MEMM you need to define s_0, i.e. the previous state before the first token. It may be a special NULL, but for simplicity let’s define it as N.
 a) What are the states (s) and observations (o) for this training data? a) What are the states (s) and observations (o) for this training data?
-b) Equation (2) defines features fa based on observation features” b. How many features do we have?+ 
 +b) Equation (2) defines features f_a based on //observation features// b. How many such f_a features do we have? 
 c) Equation (3) defines constraints. How many such constraints do we have? c) Equation (3) defines constraints. How many such constraints do we have?
 +
 d) List all the constraints involving feature b2, i.e. substitute (whenever possible) concrete numbers into Equation (3). d) List all the constraints involving feature b2, i.e. substitute (whenever possible) concrete numbers into Equation (3).
-e) In step 3 of the GIS algorithm you need to compute P_s’(j)(s|o). Compute P_N(0)(N|can) and P_N(0)(V|can). 
  
-**Hint** : You might be confused about the m_s' variable (and  t_1, …, tm_s') in Equation (3). +e) In step 3 of the GIS algorithm you need to compute <latex>P_{s’}^{(j)}(s|o)</latex>. Compute <latex>P_N^{(0)}(N|can)</latex> and <latex>P_N^{(0)}(V|can)</latex>
-For a given s', t_1, …, t_ms' are the time stamps where the previous state (with time stamp ti - 1) is s'. For example, in our training data:+ 
 +**Hint** : You might be confused about the m_s' variable (and  t_1, …, <latex>t_{m_{s'}}</latex>) in Equation (3). 
 +For a given s', t_1, …, <latex>t_{m_{s'}}</latex> are the time stamps where the previous state (with time stamp t_i - 1) is s'. For example, in our training data:
 for s'=N, t1=1 (because s0=N), t2=2 (because s1=N) and t3=5 (because s4=N), i.e. m_s'=3; for s'=N, t1=1 (because s0=N), t2=2 (because s1=N) and t3=5 (because s4=N), i.e. m_s'=3;
 for s'=V, t1=3 (because s2=V), t2=4 (because s3=V), i.e. m_s'=2. for s'=V, t1=3 (because s2=V), t2=4 (because s3=V), i.e. m_s'=2.
  

[ Back to the navigation ] [ Back to the content ]