Accurate estimates of incidence rates of infectious diseases are important for monitoring trends and for designing and evaluating disease prevention and control programs. Traditionally, incidence has been estimated using cohort studies, which are costly, slow, and vulnerable to selection biases in both recruitment and attrition. Cross-sectional incidence estimation is an alternative approach that can avoid some of these problems. This approach involves collecting blood samples from a single representative cross-sectional survey of a target population, and analyzing the samples using multi-biomarker assay algorithms (MAAs) to detect recent infections. Under some assumptions about the dynamics of the epidemic, incidence is estimated from the prevalence of MAA-positive individuals, where MAA positive refers to a state defined by levels of biomarkers that is associated with recent acquisition of infection. A training data set is required to define and evaluate characteristics of the MAA positive state. In order to achieve accurate estimates, cross-sectional incidence estimation analyses should be tailored to the population of scientific interest and to the data-generating process. This dissertation develops approaches for three challenges encountered in cross-sectional incidence estimation: analyzing incomplete or missing biomarker data; calibrating the cross-sectional estimation procedure for a specific target population; and accounting for interval-censored infection dates in longitudinal biomarker data.
The training data sets are used to operationally define the MAA positive state and to estimate the probabilities of being in the MAA positive state as a function of duration of time since acquisition of infection. The training data sets include longitudinal biomarker measurements on a sample of individuals. We first consider the challenge of missing biomarker data in the training data sets. We examine two na�ve approaches, one using all samples that can be classified by the MAA and another using all samples with complete biomarker data, and we show that each of these approaches can lead to biased estimators of the mean window period. We propose a conditional approach for handling the missing data. We show that this method performs well in simulation studies. We then consider missing data in the context of cross-sectional surveys of biomarker prevalence. Again, we show that na�ve approaches produce biased estimates, and we propose a conditional approach that performs well in simulation studies. We apply these methods to a training data set of biomarkers in HIV Subtype C infections collected from over two thousand individuals from multiple countries.
The target population refers to the population in which we wish to estimate incidence of infection. In order for a training data set to be useful for model calibration, any systematic differences between the training data set and the target population must be addressed. We consider a scenario in which there is one covariate whose distribution differs between the training data set and the target population, and we propose a range of methods for correcting such a difference. Using simulation studies, we examine the performance characteristics of these methods under a range of analysis conditions and determine their sensitivity to model misspecifications.
Since infection status is usually only tested periodically in longitudinal studies, infection dates and durations of infection are typically interval-censored in MAA calibration data sets. We present a joint model of infection dates and subsequent biomarker values and an estimation procedure for this model, and we compare this approach with na�ve methods assuming a uniform or symmetric distribution over the censoring intervals for the infection dates. We show that the joint modelling approach performs well in many situations compared to midpoint imputation and uniform imputation.
The methods presented in this dissertation were developed for the purpose of calibrating and performing HIV incidence estimation using cross-sectional surveys of biomarker prevalence. However, the cross-sectional survey-based approach to incidence estimation has applicability to infectious diseases other than HIV. This approach may be especially useful when it is crucial to rapidly detect changes in infection incidence to inform public health policies including epidemic control programs. We hope that the methods presented in this dissertation will encourage the use of the cross-sectional approach to incidence estimation in a variety of contexts and will help address the inevitable real-world complications in the data collection process