## Analysis Of Categorical Data

Apart from the case of binary data, response variables which are purely categorical, without an underlying scale, are extremely rare. We will therefore only consider data on ordinal scales in this section. Variables classified as none, mild, moderate and severe will arise in a variety of contexts.

To illustrate techniques, we will again take an example from Jones and Kenward (1989) and delete five observations from the second treatment period. The example is a placebo-controlled trial of a treatment for primary dysmenorrhoea. Thirty patients entered the trial and in each treatment period the amount of relief obtained was recorded as none or minimal (1), moderate (2) and complete (3). The data as we analyse them are summarised in Table 7.13.

Taking a fixed effects approach, a test of significance is most readily obtained using methods based on the analysis of an appropriate contingency table. A simple but inefficient way of producing such a contingency table would be to categorise the changes in the outcome variable from the first treatment period to the second as 'worse', 'no change' and 'better', and to tabulate this variable against the treatment sequence. The significance of the treatment effect could then be determined from this 3 x 2 contingency table, as in Prescott's test. However, this configuration does not use the information that observations of 'none' and 'complete' in the two treatment periods represent a larger difference than between 'none' and 'moderate', or 'moderate' and 'complete'. If we arbitrarily assign numbers of 1, 2 and 3 to the outcome categories, we can generate a 5 x 2 contingency table based on the change scores. The 'obvious' approach is to then apply a permutation t test (test for trend) to this table to assess the significance of the treatment effect. Application to the change scores presented in Table 7.13 gives p = 0.005.

In applying this test it should be appreciated that the scores of -2, -1, 0, + 1 and +2 are arbitrary. They should not be taken to imply that the difference between 'none' and 'complete' is twice as large as the difference between 'none' and 'moderate', nor that the difference between 'none' and 'moderate' is the same as the difference between 'moderate' and 'complete'. For this reason, some statisticians may wish to replace the change scores with ranks and apply an (exact)

Sequence |
(l.D |
(1.2) |
(1.3) |
(I.*) |
(2,1) (2,2) |
(2,3) |
(2,.) |
(3,1) |
(3,.) |
Total |

AB |
2 |
3 |
5 |
1 |
1 1 |
2 |
1 |
0 |
0 |
16 |

BA |
3 |
2 |
0 |
0 |
1 0 |
1 |
1 |
4 |
2 |
14 |

Total |
5 |
5 |
5 |
1 |
2 1 |
3 |
2 |
4 |
2 |
30 |

Change score | ||||||||||

Sequence |
-2 |
-1 |
0 |
1 |
2 |
Total | ||||

AB |
5 |
5 |
3 |
1 |
0 |
14 | ||||

BA |
0 |
3 |
3 |
1 |
4 |
11 | ||||

Total |
5 |
8 |
6 |
2 |
4 |
25 |

Wilcoxon rank sum test. The choice will rarely make any practical difference, but it is clearly good practice to make this choice prior to analysis, rather than reporting the more favourable result! Note that the situation becomes more complicated when there are more than three categories for the outcome variable. Analysis could still be based on the change scores, but there would be an implicit strong assumption about the meaning of the intervals between the categories. Without such strong assumptions, many of the categories of change would be indistinguishable from each other and a simplified 5 x 2 contingency table would result. Such an example is presented by Senn (2002).

The mixed models approach with random patient effects and fixed period and treatment effects, based on carrying out ordinal logistic regression, is now available through PROCGLIMMIX. The patient variance component is, surprisingly for a cross-over trial, estimated to be negative and therefore set to zero. The coefficient for the treatment effect, on the logistic scale, is 1.92 with a standard error of 0.57 (p = 0.003). On exponentiation the estimate of the odds ratio is 6.8, with 95% confidence limits of 2.1 and 22.1. The interpretation of the odds ratio in this situation is that the estimated odds of being in a favourable outcome category when treated with the analgesic compared with placebo is 6.8, whether favourable is defined as complete relief or moderate/complete relief.

sas code and output

PROC GLIMMIX; CLASS patient period treat;

MODEL outc=period treat/DIST=MULT LINK=CLOGIT SOLUTION OR; RANDOM patient;

Note that, in this example, if the option DDFM=KR is used in the model statement, the denominator degrees of freedom erroneously appear as 1.

Number of Observations Read Number of Observations Used

60 55

Response Profile

Ordered Value

Total outc Frequency

26 15 14

The GLIMMIX procedure is modeling the probabilities of levels of outc having lower Ordered Values in the Response Profile table.

G-side Cov. Parameters |
1 |

Columns in X |
6 |

Columns in Z |
30 |

Subjects (Blocks in V) |
1 |

Max Obs per Subject |
55 |

Optimization Information

Optimization Technique

Parameters in Optimization Lower Boundaries Upper Boundaries Fixed Effects Starting From

Dual Quasi-Newton

Profiled Data

Iteration History

Objective

Iteration Restarts Subiterations Function Change

0 |
0 |
1 |
3 |
71. |
01856686 |
2. |
00000000 |

1 |
0 |
0 |
3 |
6 8 |
34874615 |
0. |
07614525 |

2 |
0 |
0 |
386 |
.2767642 |
0. |
. 00860819 | |

3 |
0 |
0 |
3 |
6 8 |
30470584 |
0. |
00062015 |

4 |
0 |
0 |
3 |
6 8 |
30608754 |
0. |
00005055 |

5 |
0 |
0 |
3 |
6 8 |
30626243 |
0. |
00000363 |

6 |
0 |
0 |
3 |
6 8 |
30627135 |
0. |
00000029 |

7 |
0 |
0 |
3 |
6 8 |
30627234 |
0. |
00000002 |

8 |
0 |
0 |
386 |
.3062724 |
0. |
00000000 |

Convergence criterion (PCONV=1.11022E-8) satisfied. Estimated G matrix is not positive definite. NOTE: The covariance matrix is the null matrix.

Fit Statistics -2 Res Log Pseudo-Likelihood 386.31

Covariance Parameter Estimates Cov Standard

Parm Estimate Error patient 0 .

Max Gradient

2. |
187345 |

1. |
541061 |

1. |
571373 |

1. |
571598 |

1. |
571655 |

1. |
571655 |

1. |
571655 |

1. |
571655 |

1. |
571655 |

Solutions for Fixed Effects

Effect

Standard outc period treat Estimate Error DF t Value Pr > |t|

Intercept 0

Intercept 1

period period treat treat

-1.3774 0.09043 0.4195 0

1.9162 0

0.5041 29

0.4577 29

0.5359 22

0.5679 22

Odds Ratio Estimates

95%Confidence

Effect period treat _period _treat Estimate DF Limits period 1 2 1.52 22 0.501 4.62

Type III Tests of Fixed Effects Num Den

Effect DF DF F Value Pr > F

period 1 22 0.61 0.4421

treat 1 22 11.39 0.0027

## Post a comment