Disaggregation of incidence

HIV-positive TB incidence

In this report, TB incidence is disaggregated by HIV-infection status at country level. TB incidence was disaggregated by HIV and CD4 status using the Spectrum software\cite{Stover2012}. WHO estimates of TB incidence were used as inputs to the Spectrum HIV model. The model was fitted to WHO estimates of TB incidence, and then used to produce estimates of TB incidence among people living with HIV disaggregated by CD4 category\cite{Pretorius2014}. A regression method was used to estimate the relative risk (RR) for TB incidence according to the CD4 categories used by Spectrum for national HIV projections\cite{J2010}. Spectrum data were based on the national projections prepared for the UNAIDS Report on the global AIDS epidemic 2013. The model can also be used to estimate TB mortality among HIV-positive people, the resource requirements associated with recently updated guidance on ART and the impact of ART expansion.

A flexible and relatively simple way of modelling TB incidence (or any time-dependent function) is to represent it as k time-dependent m’th order spline functions

\[\begin{aligned} I(x) = \sum_{i=1 .. k} \beta_i B^m i(x)\end{aligned}\]

where \(\beta_i\) is the i’th spline coefficient and \(B^m i(x)\) represents the evaluation of the i-th basis function at time (year) \(x\). The order of each basis function is m and cubic splines are used, i.e. \(m=3\). The equation simply states that any time-dependent function, such as incidence, can be represented as a linear combination of cubic-spline basis functions.

The values of the cubic-spline coefficients \(\beta\) were determined by an optimization routine that minimizes the least squares error between incidence data (\(I_{obs}\)) and the estimated incidence curve \(I(x)\)

\[\begin{aligned} I(x) = \sum_{x = 1990 .. 2012} | I(x) - I_{obs} (x)|^2 + \lambda \beta^T S \beta\end{aligned}\]

Here \(|I - I_{obs}|^2\) is the sum of squared errors in estimated incidence and \(S\) is a difference penalty matrix applied directly to the parameters \(\beta\) to control the level of variation between adjacent coefficients of the cubic-spline, and thus control (through a choice of \(\lambda\)) the smoothness of the time-dependent case incidence curve. Another important purpose of the use of the smoothness penalty matrix \(S\) is to regularize (by creating smoothness dependencies between adjacent parameters) the ill-conditioned inverse problem (more unknown parameters than the data can resolve) that would tend to over fit the data when left ill-conditioned.

The cubic-spline method was then used to fit an indicator (incidence or notification) to a set of bootstrapped data, obtained by sampling from the normal error distribution with zero mean and a standard deviation of the residuals of the spline regression. This bootstrap method produced a sample of projected cubic-spline curves that inherits the temporal biases and systematic errors of the data. Confidence intervals based on the bootstrapped data, namely \(2.5^{th}\) and \(97.5^{th}\) percentile of projected cubic splines, were typically narrow in the years where the model had data to utilize, and ‘spread out’ after that, according to a Gaussian process with a linearly increasing variance.

The disaggregation of TB incidence by CD4 category among people living with HIV was based on the idea that an increase in the relative risk for TB incidence is a function of CD4 decline. Williams et al captured this idea in a model for the relationship between the RR for TB and CD4 decline.\cite{20974976} They suggested a 42% (+/- 17%) increase in RR for TB for each unit of \(100\mu l\) CD4 decline.

The Spectrum-TB model’s disaggregation method is based on the Williams et al. model. The model first estimates incidence among people living with HIV, and then calculates the ‘risk of TB’ \(F=I^- / P^-\), where \(I^-\) is TB incidence among people living with HIV and \(P^-\) is the number of people living with HIV who are susceptible to TB.

An assumption is made that the risk of TB infection among people living with HIV with CD4 count \(>\) 500 μL is proportional to F (it was assumed that it was higher by a factor of 2.5).\cite{15609223} For each \(100\mu l\) CD4 decline in the remaining categories (350-499, 250-349, 200-249, 100-199, 50-99 CD4 cells/μL, and CD4 count less than 50 cells μL), the risk of infection is represented as

\[\begin{aligned} F(c<500) = F(c>500).p(1).p(2)dc\end{aligned}\]

where \(p(1)\) is a parameter that is used to recognize that people living with HIV and who have high CD4 counts could be at higher risk of TB infection relative to those who are HIV-negative, and \(p(2)\) controls the exponential increase in RR that occurs with CD4 decline. \(dc\) is the number of \(100\mu l\) CD4 decline associated with the midpoint of each CD4 category relative to 500: \(dc= (3.0, 4.4, 8.6, 12.9, 19.2, 28.6, 37.3)\) for the six CD4 categories.

A reduction in RR is applied for those who have been on ART for more than one year.

To match total TB incidence and estimates of the number of HIV-positive TB cases from HIV testing data where available, it was assumed that p(1)=2.5 and p(2) was fitted accordingly.

In the RR-approach, the ‘biological meaning’ that should be attached to the parameters and the interpretation of these parameters as regression coefficients need careful consideration. Both parameters can be fitted or both can be fixed. Varying at least p(2) captures the variation among countries that is expected due to variation in the baseline (HIV-negative) CD4 count, and it strikes a balance between the biological and regression mechanisms.

The RR model approach to estimation of TB incidence was used for people on ART. Although an estimate of TB incidence among people on ART could be obtained from surveillance data reported to WHO (such that it is arguably not necessary to use the RR model), limitations of the ART data (in particular that some countries appear to report cumulative totals of people on ART) meant that the RR approach needed to be used.

Hazard ratios (HR) of 0.35 were assumed for all CD4 at ART initiation categories. Suthar et al have reported HRs of 0.16, 0.35 and 0.43 for those on ART with CD4 count \(<\) 200, 200-350 and \(>\) 350,\cite{22911011} and these values could in principle be used. However, Spectrum tracks only CD4 at initiation, thus limiting the use of CD4-specific HRs for people on ART.

It was further assumed that the HR of 0.35 applies only to patients on ART for more than six months. Spectrum’s ART-mortality estimates, derived mostly from ART cohorts in Sub-Saharan Africa, suggest that mortality remains very high in the first six months of ART. Since TB is a leading contributor to mortality among HIV-positive people, it was judged that the HR for patients on ART for less than 6 months is likely to remain high; therefore, a reduction factor due to ART was not applied for this subset of patients.

A simple least squares approach was used to fit the model to total TB incidence, and to all available estimates of TB incidence among people living with HIV. These estimates of TB incidence among people living with HIV were obtained by three sampling methods: population surveys of the prevalence of HIV among TB cases (least biased, but scarce due to logistical constraints), sentinel HIV data (biases include more testing of people with advanced HIV-related disease) and routine HIV testing of reported TB patients (variable coverage). To increase the influence of survey data, replicas of the survey data were included in the likelihood function. In other words, for years for which data from HIV testing were available, identical copies of the HIV-test data were added to the likelihood function. The estimate of total TB incidence was based on much more data, evenly spread out in the estimation period 1990–2015.

Model testing showed that using two replicates of the HIV survey data (i.e. duplicating the survey data) and two replicates of the routine testing data with coverage greater than 90%, was the best approach to disaggregating TB incidence: the fit passed close to the survey or high-coverage routine testing data points that were available. For each of a) HIV sentinel and b) routine testing with coverage between 50–90%, data were not used.

As with the main indicators, confidence intervals are based on a bootstrap method. For this method all data sources are sampled from assumed underlying distributions: total incidence data is sampled from normally bootstrap result set for total incidence, survey and sentinel HIV data are sampled from beta distributions and routine HIV testing among reported cases are sampled from a normal distribution, using the variance of the sample number of the number who tested HIV positive.

For countries with no data, a range for p(2) was estimated from countries with survey or testing data, which suggest that \(p(2) = 1.96 [1.8-2.1]\). The RR-model was then fitted to total TB incidence only. There is no satisfactory way to verify results for TB incidence among people living with HIV when no HIV-testing data are available. However, comparison of the global estimate for TB incidence among people living with HIV produced by Spectrum and estimates based on a different method using HIV prevalence instead of CD4 distributions and using HIV-test data in a different way suggests that the RR-model works reasonably well. The comparative method to disaggregate TB incidence by HIV is derived as follows, where the \(I\) and \(N\) denote incident cases and the total population, respectively, superscripts + and - denote HIV status, \(\vartheta\) is the prevalence of HIV among new TB cases, \(h\) is the prevalence of HIV in the general population and \(\rho\) is the incidence rate ratio (HIV-positive over HIV-negative).

\[\begin{align} \rho &= \frac{I^{\textrm{+}}/N^{\textrm{+}}}{I^{\textrm{-}}/N^{\textrm{-}}} > 1 \\ \rho \frac{I^{\textrm{-}}}{I^{\textrm{+}}} &= \frac{N^{\textrm{-}}}{N^{\textrm{+}}} \\ \rho \frac{I - I^{\textrm{+}}}{I^{\textrm{+}}} &= \frac{N - N^{\textrm{+}}}{N^{\textrm{+}}} \\ \frac{I^{\textrm{+}}}{I} &= \frac{\rho \frac{N^{\textrm{+}}}{N}}{1 + (\rho - 1)\frac{N^{\textrm{+}}}{N}} = \vartheta \\ \vartheta &= \frac{h \rho}{1 + h(\rho - 1)} \label{eqn:rho} \end{align}\]

The TB incidence rate ratio \(\rho\) is estimated by fitting the following linear model with a slope constrained to 1

\[\begin{aligned} \log(\hat{\rho}) = \log \left(\frac{\vartheta}{1-\vartheta}\right) - \log \left(\frac{h}{1-h}\right), (\vartheta, h) \in {]0,1[} \label{eqn:irr}\end{aligned}\]

Provider-initiated testing and counselling with at least 50% HIV testing coverage is the most widely available source of information on the prevalence of HIV in TB patients (Table 8). However, this source of data is affected by biases, particularly when coverage is closer to 50% than to 100%. In all countries with repeat data from testing, the relationship between the prevalence of HIV in TB patients and the coverage of HIV testing was examined graphically. In some countries, the prevalence of HIV in TB patients was found to decrease with increasing HIV testing coverage while in others it increased with increasing HIV testing coverage; in most countries, the prevalence of HIV followed highly inconsistent patterns (with repeat changes in direction) as HIV testing coverage increased. Therefore, it was not possible to adjust for the effect of incomplete coverage of HIV testing on estimates of the prevalence of HIV among TB patients. The assumption was thus made that TB patients with an HIV test result were statistically representative of all TB cases. As coverage of HIV testing continues to increase globally, biases will decrease.