## L0 0AinU 0v n dVjj

ijes2

The log-likelihood function can be written as follows:

IW = E (Vii ' log(^ii) + Vi2 ■ log(0i2) + ■ ■ ■ + Vi.i-1 ■ log(0i,z-i) + Viz ■ log(0«)) •

The log-likelihood function for the Markov chain model is obtained by restricting this function to the set ©i of positive l x l matrices whose row sums are all equal to one. Therefore, 1(0) is the sum over all i € £ of the expressions z-i ii ■ log(0ii) + Vi2 ■ log(0i2) + ■ ■ ■ + Vi,z-i ■ log(0i,z-i) + Viz ■ log(1 - ^ (1.52)

These expressions have disjoint sets of unknowns for different values of the index i € £. To maximize 1(0) over ©i, it hence suffices to maximize the concave function (1.52) over the (l — 1)-dimensional simplex consisting of all non-negative vectors (0ii, 0i2,..., 0i,z-i) of coordinate sum at most one. By equating the partial derivatives of (1.52) to zero, we see that the unique critical point has coordinates 0^ = vij/(vn + + • • • + vu) as desired. □

We next introduce the fully observed Markov model that underlies the hidden Markov model considered in Subsection 1.4.3. We fix the sequence length n and we consider a first alphabet £ with l letters and a second alphabet £' with l' letters. The observable states in this model are pairs (a, t) € £n x (£')n of words of length n. A sequence of N observations in this model is summarized in a matrix u € Nz"x(zwhere U(0-T) is the number of times the pair (a, t) was observed. Hence, in this model, m = (l ■ l')n.

The fully observed Markov model is parameterized by a pair of matrices (0, 0') where 0 is an l x l matrix and 0' is an l x l' matrix. The matrix 0 encodes a Markov chain as before: the entry 0j represents the probability of transitioning from state i € £ to j € £. The matrix 0' encodes the interplay between the two alphabets: the entry 0j represents the probability of out-putting symbol j € £' when the Markov chain is in state i € £. As before in the Markov chain model, we restrict ourselves to positive matrices whose rows sum to one. To be precise, ©i now denotes the set of pairs of matrices (0,0') € R>XoZ x RXf whose row sums are equal to one. Hence d = l(l +1' — 2).

The fully observed Markov model is the restriction to ©i of the toric model