## I9 x A Uj ij

The double-sum in the middle equals lhid(9*) — lhid(9). This difference is nonnegative because the parameter vector 9* was chosen so as to maximize the log-likelihood function for the hidden model with data (uij). We next show that the last sum is non-negative as well. The parenthesized expression equals log( MO) > - E Ui M Me) ) M m ) + E /i((9) M^)-We rewrite this expression as follows

V ^^ • 10,® + V ^^ • log( Aj(0) ) ¿i mo) sl m > + ^ m og[Me*))

This last expression is non-negative. This can be seen as follows. Consider the non-negative quantities fij(9) J fij (9*) r - if,

We have n1 +-----h nn = a1 +-----+ an = 1, so the vectors n and a can be regarded as probability distributions on the set [n]. The expression (1.38) equals the Kullback-Leibler distance between these two probability distributions:

The inequality follows from (1.36).

If lobs(\$) = lobs(0*), then the two terms in (1.37) are both zero. Since equality holds in (1.39) if and only if n = a, it follows that