Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
courses:rg:2013:memm [2013/03/18 20:27] vandemos vytvořeno |
courses:rg:2013:memm [2014/10/12 15:04] (current) popel |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Maximum Entropy Markov Models ===== | + | ===== Maximum Entropy Markov Models |
- | **1.** Explain (roughly) how the new formula for α_t+1(s) is derived (i.e. formula 1 in the paper). | + | 1. Explain (roughly) how the new formula for α_t+1(s) is derived (i.e. formula 1 in the paper). |
- | **2.** Section 2.1 states "we will split P(s|s', | + | 2. Section 2.1 states "we will split P(s|s', |
- | **3.** Let S= {V,N} (verb and non-verb) | + | 3. Let S= {V,N} (verb and non-verb) |
Training data = he/N can/V can/V a/N can/N | Training data = he/N can/V can/V a/N can/N | ||
- | Observation features are: | + | //Observation features// are: |
b1 = current word is “he” | b1 = current word is “he” | ||
b2 = current word is “can” | b2 = current word is “can” | ||
Line 13: | Line 13: | ||
When implementing MEMM you need to define s_0, i.e. the previous state before the first token. It may be a special NULL, but for simplicity let’s define it as N. | When implementing MEMM you need to define s_0, i.e. the previous state before the first token. It may be a special NULL, but for simplicity let’s define it as N. | ||
a) What are the states (s) and observations (o) for this training data? | a) What are the states (s) and observations (o) for this training data? | ||
- | b) Equation (2) defines features | + | |
+ | b) Equation (2) defines features | ||
c) Equation (3) defines constraints. How many such constraints do we have? | c) Equation (3) defines constraints. How many such constraints do we have? | ||
+ | |||
d) List all the constraints involving feature b2, i.e. substitute (whenever possible) concrete numbers into Equation (3). | d) List all the constraints involving feature b2, i.e. substitute (whenever possible) concrete numbers into Equation (3). | ||
- | e) In step 3 of the GIS algorithm you need to compute P_s’(j)(s|o). Compute P_N(0)(N|can) and P_N(0)(V|can). | ||
- | **Hint** : You might be confused about the m_s' variable (and t_1, …, tm_s') in Equation (3). | + | e) In step 3 of the GIS algorithm you need to compute < |
- | For a given s', t_1, …, t_ms' are the time stamps where the previous state (with time stamp ti - 1) is s'. For example, in our training data: | + | |
+ | **Hint** : You might be confused about the m_s' variable (and t_1, …, < | ||
+ | For a given s', t_1, …, < | ||
for s'=N, t1=1 (because s0=N), t2=2 (because s1=N) and t3=5 (because s4=N), i.e. m_s' | for s'=N, t1=1 (because s0=N), t2=2 (because s1=N) and t3=5 (because s4=N), i.e. m_s' | ||
for s'=V, t1=3 (because s2=V), t2=4 (because s3=V), i.e. m_s' | for s'=V, t1=3 (because s2=V), t2=4 (because s3=V), i.e. m_s' | ||