## Normal data

In Section 5.2 the variance of the difference between a pair of treatments in a balanced dataset was given as var(ii - tj) = 2 (a2/rc + ac2t/c), where, r = number of patients per treatment per centre (replicates), c = number of centres, ti = the ith treatment effect, a2 = residual variance, ac2t = centre-treatment variance component.

Estimates for the number of centres (c) and number of patients per treatment per centre (r) can be obtained from the usual sample size estimation equation:

a = (£df,i-«/2 + tDF,p) x SE(£; - tj), where a = significance, 3 = power,

A = difference to be detected, t = number of treatments DF = (c — 1) x (t — 1), the centres-treatment DF.

One difficulty is that estimates of both the patients and centres-treatment variance components are required. Unless multi-centre data are available from a previous study, it is likely that only an estimate of the between-patient residual variance will be available. However, it may still be preferable to use the above formulae with a guessed value for the treatment-centre variance component, rather than assuming it is zero.

In this section we consider the situation where an equal number of patients will be used per treatment per centre. Some inflation to the calculated sample sizes will be appropriate when there will be varying numbers per centre, but these calculations will provide a reasonable first 'ballpark' estimate. In this case, r is taken to be the average number of replicates, ^ri/c. There are three ways in which a sample size can be calculated:

1. Number of centres (c) specified This approach wouldbe applicable if a decision had been made to use a specific number of centres. After substitution of the formula for SE(ti — tj) in the sample size estimation equation, with some reorganisation we find that the number of patients per replicate (i.e. per treatment per centre) required is given by

Therefore, t X r X c patients are required in total. If this formula gives a negative value for r, then it is not possible to detect the specified difference with the required power unless more centres are used. Either c should be increased or, alternatively, the power could be decreased or A increased.

2. Number of patients per centre (t x r) specified This approach might be appropriate if the duration of the trial is limited and there is only time to recruit a specified number of patients per centre. The number of centres required is given by

Obviously, DF = (c — 1) x (t — 1) will not be known in advance. z-values from the normal distribution can be used instead of values from the t distribution to obtain an initial estimate of c. A more accurate value can then be calculated by using the DF obtained for this value of c in the above formula, and re-estimating c. This can be repeated until convergence is obtained, but changes are usually minimal after the first iteration.

3. Neither number of centres nor average patients per centre specified In this situation an optimal sample size can only be calculated by specifying the relative cost of sampling centres compared with sampling patients. The cost of sampling centres will depend on the type of centre being used. For example, the cost of a centre in an international study would be extremely expensive, but centres would be much cheaper in a study using local practitioners. The cost of sampling patients relates to the amount to be paid to the investigator per patient plus the cost of monitoring, validating and processing each patient's data. If we denote the relative cost by g, then the total cost is proportional to c x r x t + c x g. This is minimised when

c is then obtained by substituting r into the formula given earlier:

Sometimes the values calculated might appear impracticable. For example, if the relative cost, g, of sampling a centre were set to be not much higher than that of sampling a patient (i.e. g close to one), then the number of centres estimated would likely be very high. In this situation, g has clearly been set too low and shouldbe increased.