## Dispersion parameter

As in GLMMs, variance at the residual level can be increased (or decreased) by using a dispersion parameter. The residual variance is multiplied by the dispersion parameter, so that

We suggest that it is usually beneficial to fit a dispersion parameter in random effects models as in GLMMs; however, this is not possible in some software packages.

4.2.3 Alternative specification for random effects models

In Section 3.3.5 we described how random effects shrinkage could cause non-convergence or variance parameter bias in GLMMs, particularly when there were uniform random effects categories present. These problems can occur for the same reason in mixed categorical models and we again suggest that they can sometimes be avoided by reparameterising random effects models as covariance pattern models. As in the GLMM, a general form for a random effects model reparameterised as a covariance pattern model can be written y = ^ + B1/2(ZP + e), g(^) = Xa, Zp + e - N(0, ZGZ' + P), where Z defines the random effects levels and G is a matrix of variance parameters corresponding to the random effects. The random effects are now not included in the linear part of the model but are assumed to be incorporated into the error term. As with GLMMs, although we have not had the opportunity to explore fully such models, we believe they may have the potential to avoid the problems with bias that can occur in random effects models.

### 4.2.4 Likelihood and quasi-likelihood functions

The model we have defined, based on binary observations, is now in the form of a GLMM and a quasi-likelihood function can be defined in the same way as described in Section 3.2.2. A general form for the log quasi-likelihood for a GLMM which may contain random effects, coefficients and/or covariance patterns is again log{QL(a, y; y)} = log{QL(a, yr; y|p)} - 1/2log|G| - 1/2p'G-1p + K, where

This function will correspond to a true log likelihood function whenever the residuals from the original observations are uncorrelated (i.e. no yr parameters are included) since QL(a, yr; y|P) will then follow a multinomial distribution.

### 4.2.5 Model fitting methods

Now that the model is in the form of a GLMM, it can be fitted using the approaches suggested in Section 3.2.3. However, it is now necessary to accommodate the multinomial within-observation covariances and this adds a further degree of complexity to the computation. Several published examples have used generalised estimating equations to fit covariance pattern models (e.g. Lipsitz eta!., 1994; Liang eta!., 1992; Kenward eta!., 1994). Lipsitz eta!. provide a SAS macro and for this reason we have used their approach to analyse some of the examples in this book. The pseudo-likelihood approach can be used and random effects models are available in the experimental SAS procedure PROC GLIMMIX. However, the procedure is not at present adapted to fit covariance pattern models. Hedeker and Gibbons (1994) and Goldstein (2003) have also both suggested approaches for fitting random effects (and coefficients) models. Hedeker and Gibbons (1994) have made available Fortran-based software to implement their method (see Section 9.1). Goldstein's method can be implemented with a macro for use with the package MLwiN (see Section 9.1).

Alternatively, the Bayesian approach (see Sections 2.3 and 3.2.3) can be used for analysing random effects and coefficients models. For this approach it is not necessary formally to redefine the observations in the extended binary form. A method such as the Gibbs sampler (available in the package BUGS, see Section 9.1) can be used to simulate the posterior distribution from the categorical mixed model defined as follows:

yt ~ multinomial^^, /j,i2, ..., /iic), rci g(i\j ) = Ij + xia + ztP, P - N(0, G), where yt = (y1j, yt 2,..., ytn),

¡Xjj = probability observation i is in category j, if = probability (yt < j) = J2'k=1 iik, Ij = intercept term for category j, xt = the ith row of fixed effects design matrix X, zt = the ith row of random effects design matrix Z, G = covariance matrix.

Non-informative prior distributions can again be used for all parameters: for example, normal distributions with very large variances for fixed effects and inverse gamma distributions with very small parameters for variance components.