# The Squared Correlation Coefficient R2

The "quality" of a simple or multiple linear regression can be assessed in a number of ways. The most common of these is to calculate the squared correlation coefficient, or R2 value (also written r2). This has a value between zero and one and indicates the proportion of the variation in the dependent variable that is explained by the regression equation. Suppose ycalc>i are the values obtained by feeding the relevant independent variables into the regression equation and yi are the corresponding experimental observations. The following quantities can then be calculated:

Explained Sum of Squares, ESS = ^ (ycalc>i — {y})2 (4.13)

Residual Sum of Squares, RSS = ^ (yi ycalc,i (4.14)

Thus,

R2 is given by the following relationships:

TSS TSS TSS

An R2 of zero corresponds to a situation in which none of the variation in the observations is explained by the variation in the independent variables whereas a value of one corresponds to a perfect explanation. The R2 statistic is very useful but taken in isolation it can be misleading. By way of illustration, Figure 4-2 shows five data sets for which the value of R2 is approximately the same (0.7). With one exception (the data set shown top left) the single "best fit" straight line obtained from regression is clearly inappropriate, due to the presence of outliers or some other trend within the data.