## Generalised Linear Models 311 Introduction

GLMs can be used to fit fixed effects models to certain types of non-normal data: those with a distribution from the exponential family. Consider the following example. We wish to conduct a clinical trial to investigate the effect of a new treatment for epilepsy. A suitable variable for assessing efficacy is the number of seizures which occur during a predetermined period of time. Thus, the response variable is a count. Such variables are often found to follow a Poisson distribution. This is a member of the exponential family and GLMs or GLMMs can be considered, depending on the details of how the trial is designed. Such an example is considered in Section 6.4. As a second example, consider the analysis of a particular adverse event in a clinical trial. In some situations, a simple contingency-table-based analysis will be sufficient. If, however, there are baseline effects or if the trial design is more complicated, a GLM may be preferred. In the multi-centre trial which we are regularly revisiting in this book, the occurrence of cold feet was such an adverse event, and could be reported at any of the follow-up visits. As a binary outcome, this is also from the exponential family, and in Section 3.4 we will show how GLMMs can be applied to these data.

As with the models we have met for normally distributed data, the models use a linear combination of variables to 'predict' the response. In the case of normally distributed data the fixed effects model is y = Xa + e. That is, the response is determined by the linear component, Xa, which gives the expected response, which we will denote by and by a randomly determined error term. In a somewhat convoluted way we could write the model as y = ^ + e,

The GLM can easily be specified from this artificial-looking model by allowing ^ and Xa to be related by a 'link function', g, so that g(^) = Xa.

Thus, normal models are a special case of GLMs in which the link function is the identity function. In general, the link function is not the identity function but takes a form suitable for the distribution of the data.

An alternative, less mathematical way of familiarisation with the concept of the GLM is to think of the link function as a method of mapping the response data from their scale of observation to the real scale (—m, +<x). For example, binomial probabilities have a range 0-1 and the logit link function, log(x/(1 — ¡)), will translate this range to the real scale. This is necessary because fitting a linear model directly to the binomial parameter could lead to estimates of probabilities which were negative or greater than one. Use of the link function allows the model parameters to be included in the model linearly, just as in the models we have described for normal data. This often gives the GLM an advantage over contingency table methods, which are sometimes used to analyse binary data (e.g. chi-squared tests), because these methods cannot incorporate several fixed effects simultaneously.

Here, we will give only a brief introduction to GLMs. However, more detail can be found in McCullagh and Nelder (1989). Before defining the GLM, basic details of the binomial and Poisson distributions will be given for those who are not completely familiar with these distributions, and the general form for distributions from the exponential family will be specified. This general distributional form will be needed for setting a particular form of link function known as the 'canonical' link.