Another approach to estimating the incubation period distribution uses population AIDS incidence data and estimates of the HIV infection rates in the population (Bacchetti and Moss, 1989; Bacchetti, 1990). The expected cumulative AIDS incidence up to calendar time /, A(t), is related to infection rates at calendar times g(s), and the incubation period distribution, by the convolution equation

The basic idea is to use data on A(t) and an estimate of^(i) to glean information about F. These methods are closely related to the back-calculation methodology described in Chapter 8, which uses data on A(t) and an estimate of F, to estimate historical infection rates £(j).

The usefulness of this method depends upon the availability of accurate information on the infection rates in the population. This approach has been used successfully in the population of homosexual men in San Francisco, where there is detailed information from epidemiologic surveys on the historical infection rates.

The statistical framework is as follows. Letj¡ represent the number of AIDS cases diagnosed in the calendar interval [í¡_ t, í¡) i = 1,. .., m. Suppose, JV, the cumulative number of infections that occurred before tm is known, and the probability density of infection times of the JV individuals, g*, is known. Then, y = (jy1,... ,j/m) have a multinomial distribution with sample size N and cell probabilities pi = Jo" g*(s) • - S) - F{ti.1 - s)]ds, where the incubation distribution F(t) is defined to be 0 for t ^ 0. In this formulation, g* and JV are assumed known, and a parametric model for F is postulated. Then, apart from an additive constant, the log-likelihood function for the observed AIDS incidence data (jl> • • • ,J>m) is ll = f^-logA + £y)jlog(l - £ p^j. (4.8)

The log-likelihood function is maximized over the parameters of F.

Bacchetti (1990) applied these ideas to estimate F from data on AIDS incidence and infection rates in gay men in San Francisco. Using data from three cohorts of gay men in San Francisco, he first estimated g*{s), the probability density of dates of seroconversion among all those infected before 1989. Semiparametric methods with a penalty function like that in equation (4.6) estimate g* (s) as a discrete probability mass function on each month from January, 1978, to December, 1988. In order to estimate F from the likelihood (4.8), it was necessary to estimate the total number of infections in gay men in San Francisco before 1989. This was done by rescaling the estimate of8760 AIDS-free seropositive gay men obtained from a population-based probability sample in the San Francisco Men's Health Study (Winkelstein, Lyman, Padian, et al., 1987). Because the estimate of8760 pertained to those recruited through September 1984 in an area of San Francisco that had contributed 45.5% of all AIDS cases reported, the estimate was rescaled to 8760/0.455 = 19,253. Addition of some patients with AIDS who were not included in the original survey estimate and a further rescaling based on g*{s) to account for infections occurring between October 1984 and December 1988 yielded an estimate of JV = 22,030. Uncertainties in JV are important, because JV varies inversely with F, as is seen by setting g(s) = Ng*(s) in equation (4.7).

Both the estimation of g*(s) and JV are based on the assumption that the studied cohorts and the population-based sample in one part of San Francisco are representative of the epidemic among all gay men in San Francisco.

To estimate F, a semiparametric model was used that included a separate discrete time hazard for each month. The hazard estimates were smoothed using a penalized likelihood like that in equation (4.6). The hazard of AIDS is negligible for the first several years after seroconversion and then rises sharply (Figure 4.7). There is considerable uncertainty about the shape of the hazard after 7 years, as reflected in the sensitivity of the estimated hazard to the degree of smoothing used.

Two other sources of uncertainty, apart from choice of the tuning parameter (degree of smoothing) are important. First, an assessment of the effects of random variability in estimates of JV, g*(s) and the incubation times themselves yields wide confidence intervals on the estimated hazard after 7 years (figure 6 in Bacchetti, 1990). Continued increases in the hazard after 7 years and slight decreases in the hazard after 7 years both fall within these confidence intervals. Second, the estimate of^*(i) indicates that the HIV infection rate peaked in late

Figure 4.7 Hazard functions of progression to AIDS based on deconvolution of AIDS incidence data in San Francisco with four choices of the smoothing parameter. (Source: Bacchetti, 1990. Reprinted with permission from the Journal of the American Statistical Association. Copyright 1990 by the American Statistical Association.)

Figure 4.7 Hazard functions of progression to AIDS based on deconvolution of AIDS incidence data in San Francisco with four choices of the smoothing parameter. (Source: Bacchetti, 1990. Reprinted with permission from the Journal of the American Statistical Association. Copyright 1990 by the American Statistical Association.)

1981 (see Figure 1.5). Because clinical trials of zidovudine were ongoing in 1986 and because zidovudine and other treatments were introduced in 1987, only 5 or 6 years after the estimated peak infection rates, it is possible that the leveling of the hazard beyond year 7 (Figure 4.7) reflects the effect of treatment (Bacchetti, 1990).

Estimates of the incubation distribution based on equation (4.7) with g(j) assumed known are more precise than analogous back-calculated estimates of the infection curve g(s) with F assumed known (Chapter 8). This is because the sharp peak in the infection curve for San Francisco (Figure 1.5) reduces uncertainty about when infections occurred. In contrast, the incubation distribution, F, is diffuse, making it more difficult to extract information about g(s) by deconvolving equation (4.7).

Was this article helpful?

## Post a comment