Mixed Models For Unordered Categorical Data

So far in this chapter we have only considered models for ordered categorical data. Although less frequent, unordered categorical variables are sometimes encountered in medicine. Blood group and colour are examples, since there is no natural ordering of their categories. A mixed model for these types of data can be defined in a very similar way to the ordinal mixed model. Again, the data can be re-expressed in extended binary form so that they are in the form of a GLMM. The main difference from the ordinal mixed model is that the multinomial probabilities, are now linked to the model parameters using 'generalised logits' rather than the logits of the cumulative probabilities used for ordinal data. The generalised logits can be calculated as the logs of the ratios of the probabilities of being in each category to that of being in the last category, i.e. by log(^/^L), where is a vector containing the multinomial probabilities of each observation being in the last category. The model can be specified by y = ^ + e, log(^L) = Xa + ZP, P - N(0, G), var(e) = R-If there were four categories we could write

^4 = (M-14, M-14, M-14, M-24, M-24, M-24, ■ ■ ■) , and the vector of generalised logits as log(^/^4) = (^11/^14, M12/M14, M13/M14, M21/M24, M22/M24, M23/M24, ■ ■ ■)'■

The choice of the last category for the denominator is arbitrary. Any of the categories can, in fact, be used and sometimes convergence will be more likely if the largest category is chosen. a and P are again vectors containing the fixed and random effects. However, a separate parameter is now needed for each category (except the last) because the proportional odds assumption used for ordinal data does not hold. We illustrate this model using the following hypothetical dataset which contains the first five observations from a repeated measures trial in which y is an unordered categorical variable.

 Patient Visit Treatment y 1 1 A 2 1 2 A 1 1 3 A 4 2 1 B 3 2 2 B 1

In a simple model ignoring the effect of visits and fitting treatments as fixed and patients as random, we could write a = (I1, I2, 13, Ta,1, Ta,2, Ta,3, Tb,1, Tb,2, Tb,3) , P = (P1,1, P12, P1,3, P2,1, P22 P2,3)', where

Ij = intercept for the jth category, Tkj = effect for treatment k, category j, P, = effect for patient i, category j.

Each treatment and patient effect now has a separate parameter corresponding to each category of the data (except the last). The X and Z design matrices also have extra columns corresponding to the extra parameters and have the form

 II I2 I3 Ta, 1 tA,2 Ta, 3 Tb,1 TB,2 tb, i 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 P1,1 P1,2 P1,3 P2,1 P2,2 P23 1 0 0 0 0 0 \ 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 /