A Genome-To-Genome Analysis of Associations Between Human Genetic Variation, HIV-1 Sequence Diversity, and Viral Control
A Genome-To-Genome Analysis of Associations Between Human Genetic Variation, HIV-1 Sequence Diversity, and Viral Control
elife.elifesciences.org
A genome-to-genome analysis of
associations between human genetic
variation, HIV-1 sequence diversity, and
viral control
István Bartha1,2,3,4, Jonathan M Carlson5†, Chanson J Brumme6†,
Paul J McLaren1,2,4†, Zabrina L Brumme6,7, Mina John8, David W Haas9,
Javier Martinez-Picado10,11, Judith Dalmau10, Cecilio López-Galíndez12,
Concepción Casado12, Andri Rauch13, Huldrych F Günthard14, Enos Bernasconi15,
Pietro Vernazza16, Thomas Klimkait17, Sabine Yerly18, Stephen J O’Brien19,
Jennifer Listgarten5, Nico Pfeifer5‡, Christoph Lippert5, Nicolo Fusi5,
Zoltán Kutalik4,20, Todd M Allen21, Viktor Müller3, P Richard Harrigan6,22,
David Heckerman5, Amalio Telenti2*, Jacques Fellay1,2,4*, for the HIV
Genome-to-Genome Study and the Swiss HIV Cohort Study
1
School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne,
Switzerland; 2Institute of Microbiology, University Hospital and University of
Lausanne, Lausanne, Switzerland; 3Research Group of Theoretical Biology and
*For correspondence: Amalio.
Evolutionary Ecology, Eötvös Loránd University and the Hungarian Academy of
Telenti@chuv.ch (AT); jacques. Sciences, Budapest, Hungary; 4Swiss Institute of Bioinformatics, Lausanne,
fellay@epfl.ch (JF) Switzerland; 5eScience Group, Microsoft Research, Los Angeles, United States;
†
These authors contributed
6
BC Centre for Excellence in HIV/AIDS, Vancouver, Canada; 7Faculty of Health
equally to this work Sciences, Simon Fraser University, Burnaby, Canada; 8Institute of Immunology and
Infectious Diseases, Murdoch University, Murdoch, Australia; 9Vanderbilt University
‡
Present address: Department
of Computational Biology and
Medical Center, Nashville, United States; 10AIDS Research Institute IrsiCaixa, Institut
Applied Algorithmics, Max d’Investigació en Ciències de la Salut Germans Trias i Pujol, Universitat Autònoma de
Planck Institute for Informatics, Barcelona, Badalona, Spain; 11Institució Catalana de Recerca i Estudis Avançats
Saarbrücken, Germany (ICREA), Barcelona, Spain; 12Centro Nacional de Microbiología, Instituto de Salud
Competing interests: The Carlos III, Madrid, Spain; 13Clinic of Infectious Diseases, University of Bern &
authors declare that no Inselspital, Bern, Switzerland; 14Division of Infectious Diseases and Hospital
competing interests exist. Epidemiology, University Hospital and University of Zürich, Zürich, Switzerland;
Funding: See page 13
15
Division of Infectious Diseases, Regional Hospital of Lugano, Lugano, Switzerland;
16
Division of Infectious Diseases and Hospital Epidemiology, Cantonal Hospital,
Received: 01 July 2013
St. Gallen, Switzerland; 17Department of Biomedicine, University of Basel, Basel,
Accepted: 26 September 2013
Published: 29 October 2013
Switzerland; 18Laboratory of Virology, Geneva University Hospitals, Geneva,
Switzerland; 19Theodosius Dobzhansky Center for Genome Bioinformatics,
Reviewing editor: Gil McVean,
St. Petersburg State University, St. Petersburg, Russia; 20Institute of Social and
Oxford University, United
Kingdom
Preventive Medicine, University Hospital and University of Lausanne, Lausanne,
Switzerland; 21Ragon Institute of MGH, MIT, and Harvard, Massachusetts General
Copyright Bartha et al. This Hospital, Boston, United States; 22Faculty of Medicine, University of British Columbia,
article is distributed under the
Vancouver, Canada
terms of the Creative Commons
Attribution License, which
permits unrestricted use and
redistribution provided that the
original author and source are
credited.
Abstract HIV-1 sequence diversity is affected by selection pressures arising from host genomic
factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans,
testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma
viral load (VL), while considering human and viral population structure. We observed significant
human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10−12). All associated
SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was
assessed using VL results. We identified two critical advantages to the use of viral variation for
identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than
VL, reflecting the ‘intermediate phenotype’ nature of viral variation; (2) association testing can be
run without any clinical data. The proposed genome-to-genome approach highlights sites of
genomic conflict and is a strategy generally applicable to studies of host–pathogen interaction.
DOI: 10.7554/eLife.01123.001
eLife digest Developing treatments or vaccines for HIV is challenging because the genetic
makeup of the virus is constantly changing in an effort to outwit the human immune system.
Moreover, the immune system is highly variable as a result of the long-standing co-evolution of
humans and microbes. Each individual will try to oppose the invading virus in a unique way, forcing
the virus to acquire specific mutations that can be interpreted as the genetic signature of this
one-against-one battle.
To explore the influence of co-evolution on HIV, Bartha et al. took samples of both human and
viral genomes from 1071 individuals infected with HIV, the AIDS virus, and used genotyping and
sequencing technology to obtain a comprehensive description of the genetic variation in both.
Computational techniques were then used to search for links between variants in the human DNA
sequences and variants in the viral sequences.
The most common type of genetic variation found in the human genome is a single nucleotide
polymorphism, or SNP for short: a SNP is produced when a single nucleotide – an A, C, G or T – is
replaced by a different nucleotide. Bartha et al. found that SNPs within the human DNA sequences in
their study were linked to variations in 48 amino acids in HIV. Moreover, all these SNPs were found within
a group of genes known as the HLA (human leukocyte antigen) system, which encodes for proteins
that play a vital role in the immune response. This work identified the areas of the human genome
that put pressure on the AIDS virus, and the regions of the virus that serve to escape human control.
The approach developed by Bartha et al. allows the interactions between a microbe and a human
host to be studied by looking at the genome of the microbe and the genome of the infected
person. It also differentiates host-induced mutations that limit the capacity of the virus to do harm
from those that are tolerated by the pathogen. A similar strategy could be used to study other
infectious diseases.
DOI: 10.7554/eLife.01123.002
Introduction
Through multiple rounds of selection and escape, host and pathogen genomes are imprinted with
signatures of co-evolution that are governed by Darwinian forces. On the host side, well-characterized
anti-retroviral restriction factors, such as TRIM5α, APOBEC3G and BST2, harbor strong signals of
selection in primate genomes, clear examples of retroviral pressure (Ortiz et al., 2009). On the virus
side, obvious signs of selection are observable in the HIV-1 genome: escape mutations and reversions
have been described in epitopes restricted by human leukocyte antigen (HLA) class I molecules and
targeted by cytotoxic T lymphocyte (CTL) responses (Goulder et al., 2001; Kawashima et al., 2009).
Sequence polymorphisms have also been reported recently in regions targeted by killer immunoglobulin-
like receptors (KIR), suggesting evasion from immune pressure by natural killer (NK) cells (Alter et al.,
2011). Evidence for the remodeling of retroviral genomes by host genetic pressure also comes from
simian immunodeficiency virus (SIV) infection studies in rhesus macaques, where escape from restric-
tive TRIM5α alleles has been observed in the viral capsid upon cross-species transmission of SIVsm
(Kirmaier et al., 2010). In contrast, human alleles of TRIM5α do not result in escape mutations, likely
because of adaptation of the pathogen to the host (Rahm et al., 2013). Sequence adaptation is also
a known feature of cross-species transmission. For example, a methionine in the matrix protein (Gag-30)
in SIVcpzPtt changed to arginine in lineages leading to HIV-1 and reverted to methionine when HIV-1 was
passaged through chimpanzees (Wain et al., 2007).
To date, combined analyses of human and HIV-1 genetic data have addressed the association of
HLA and KIR genes with variants in the retroviral genome (Moore et al., 2002; Brumme et al., 2007;
Bhattacharya et al., 2007; Kawashima et al., 2009; Alter et al., 2011; Carlson et al., 2012; Wright
et al., 2012). Additionally, genome-wide association studies (GWAS) performed in the host have
focused on various HIV-related clinical phenotypes (Fellay et al., 2007; Fellay et al., 2009; Pereyra
et al., 2010). In parallel, large amounts of HIV-1 sequence data have been generated for phylogenetic
studies, which shed new light on viral transmission and evolution (Kouyos et al., 2010; Alizon et al.,
2010; Von Wyl et al., 2011), or allow clinically driven analyses of viral genes targeted by antiretroviral
drugs (resistance testing) (Von Wyl et al., 2009).
Building on the unprecedented possibility to acquire and combine paired human and viral genomic
information from the same infected individuals; we employ an innovative strategy for global genome-to-
genome host–pathogen analysis. By simultaneously testing for associations between genome-wide human
variation, HIV-1 sequence diversity, and plasma viral load (VL), our approach allows the mapping of all sites
of host–pathogen genomic interaction, the correction for both host and viral population stratification,
and the assessment of the respective impact of human and HIV-1 variation on a clinical outcome (Figure 1).
Results
Study participants, host genotypes, and HIV-1 sequence variation
Full-length HIV-1 genome sequence and human genome-wide SNP data were obtained from seven
studies or institutions on a total of 1071 antiretroviral naive patients of Western European ancestry,
infected with HIV-1 subtype B. The homogeneity of the study population was confirmed by principal
component analysis of the genotype matrix: together, the first five principal components explained 1%
of total genotypic variation. After quality control of the human genotype data, imputation and filtering,
∼7 million SNPs were available for association testing. The full-length HIV-1 sequence is approximately
9.5 Kb long, corresponding to over 3000 encoded amino acids. Not all sequences were complete; on
an average, viral residues were covered in 85% of the study population (range: 75% in Tat to 95% in
Gag). Due to its hypervariable nature, the portion of the HIV-1 envelope gene that encodes the gp120
protein was not sequenced in most study samples and was therefore excluded. Overall 1126 residues
of the HIV-1 proteome were found to be variable in at least 10 samples, for a total of 3381 different
viral amino acids that could be represented by 3007 distinct binary variables.
Host VL GWAS
We first performed a classical GWAS of host
determinants of HIV-1 VL (Figure 2A, Study A)
using data from 698 patients (65% of the study
population) for whom a VL phenotype could be
reliably estimated. The top associations were
observed in the HLA class I region on chromo-
some 6 and were highly consistent with results
observed previously (Fellay et al., 2009; Pereyra
et al., 2010). The strongest associated SNP,
rs9267454 (p = 1.5 × 10−8), is in partial linkage dis-
equilibrium (LD) with HLA-B*57:01 (r2 = 0.47, D′ =
0.92), HLA-B*14:01 (r2 = 0.12, D′ = 1.0), HLA-
Figure 1. A triangle of association testing. The
B*27:05 (r2 = 0.01, D′ = 0.99), and the HLA-C -35
following association analyses were performed: [Study
A] human SNPs vs plasma viral load (1 GWAS); [Study rs9264942 SNP (r2 = 0.07, D′ = 0.77), and thus
B] human SNPs vs variable HIV-1 amino acids (3007 reflects these well-known associations with HIV-1
GWAS); and [Study C] variable HIV-1 amino acids vs control. These results confirm the quality of the
plasma viral load (1 proteome-wide association study). study population for the purpose of genome anal-
DOI: 10.7554/eLife.01123.003 ysis of determinants of HIV-1-related outcomes.
Figure 2. Results of the genome-wide association analyses. (A) Associations between human SNPs and HIV-1 plasma viral load. The dotted line shows the
Bonferroni-corrected significance threshold (p-value < 7.25 × 10−9). (B) Associations between human SNPs and HIV-1 amino acid variants, with 3007 GWAS
collapsed in a single Manhattan plot. The dotted line shows the Bonferroni-corrected significance threshold (p-value < 2.4 × 10−12). (C) Schematic representation
Figure 2. Continued on next page
Figure 2. Continued
of the HLA class I genes and of the SNPs associated with HIV-1 amino acid variants in the region. (D) Same association results as in panel B, projected on the
HIV-1 proteome. Only the strongest association is shown for each amino acid. Significant associations are indicated by a blue dot. The gp120 part of the HIV-1
proteome was not tested. The colored bar below the plot area shows the positions of the optimally defined CD8+ T cell epitopes. An interactive version of this
figure can be found at http://g2g.labtelenti.org (which is also available to download from Zenodo, http://dx.doi.org/10.5281/zenodo.7138).
DOI: 10.7554/eLife.01123.004
Genome-to-genome analyses
3007 genome-wide analyses of associations between human SNPs and HIV-1 amino acid variants were
performed in the full sample of 1071 individuals (Figure 2B, Study B) using logistic regression cor-
rected for viral phylogeny (Carlson et al., 2008; Carlson et al., 2012). Highly significant associations
were observed between SNPs in the major histocompatibility complex (MHC) region and multiple
amino acids throughout the HIV-1 proteome (except in Vpu, Rev and the RNaseH subunit of RT)
(Figure 2C), with Gag and Nef having a significantly higher density of associated variable sites than the
rest of the proteome (Gag: 6.8% vs 2.6% p=0.001; Nef: 11% vs 2.6% p = 1.2 × 10−5, binomial tests).
Using Bonferroni correction for multiple testing (threshold p = 2.4 × 10−12), significant human SNP
associations were observed with 48 viral amino acids (Figure 2 and Table 1). None of these 48 amino
acids mapped to known sites of major antiretroviral drug resistance mutations (Hirsch et al., 2008).
The strongest association found was between rs72845950 and Nef position 135 (p = 2.7 × 10−66).
Associations were much stronger between human SNPs and HIV-1 amino acids than with VL. For
example, the SNP rs2395029, a proxy for HLA-B*57:01 (r2 = 0.93), has a p-value of 1.21 × 10−6 for
association with VL, while it reaches a p-value of 4 × 10−59 for association with amino acid variation
in Gag at position 242 (a well known position of escape from HLA-B*57:01). No significant signals
were identified outside the MHC. A link to the complete set of association results can be found at
http://g2g.labtelenti.org (which is also available to download from Zenodo, http://dx.doi.org/10.5281/
zenodo.7139). These results demonstrate the feasibility and improved power of performing association
testing using viral genetic variation as outcome, independent of clinical phenotype.
gene HIV position SNP CTL epitope (codons) Tagging HLA (D’/r2) SNP vs aa (p) SNP vs VL (p) aa vs VL (p)
GAG 12 chr6:31285512 – B*49:01 (1.00/1.00) 2.20E-13 6.70E-01 5.60E-01
GAG 26 rs12524487 – B*15:01 (1.00/0.82) 6.10E-19 2.10E-01 1.40E-01
GAG 28 rs1655912 RLRPGGKKK (20–28) A*03:01 (1.00/0.81) 2.70E-55 5.60E-01 2.00E-02
GAG 79 chr6:31267544 LYNTVATL (78-85) C*14:02 (1.00/0.96) 2.40E-12 3.50E-01 2.80E-01
GAG 147 rs1055821 – C*06:02 (0.95/0.71) 3.10E-17 3.30E-07 2.90E-05
GAG 242 rs73392116 TSTLQEQIGW (240–249) B*57:01 (1.00/0.98) 2.40E-62 1.90E-06 1.70E-05
GAG 248 rs41557213 TSTLQEQIGW (240-249) B*57:01 (1.00/0.97) 4.80E-15 2.00E-06 5.30E-03
6 of 16
Genomics and evolutionary biology | Microbiology and infectious disease
Table 1. Continued
HIV
Research article
gene HIV position SNP CTL epitope (codons) Tagging HLA (D’/r2) SNP vs aa (p) SNP vs VL (p) aa vs VL (p)
NEF 81 rs9295987 RPMTYKAAL (77–85) B*07:02 (1.00/0.01) 4.80E-36 2.50E-01 9.50E-02
– C*04:01 (0.90/0.63)
NEF 83 rs34768512 – B*15:01 (1.00/0.47) 2.20E-17 2.80E-01 1.50E-02
– C*03:04 (0.96/0.54)
NEF 85 rs2395475 RPMTYKAAL (77–85) B*07:02 (1.00/0.29) 1.90E-24 8.10E-01 1.30E-03
– B*08:01 (1.00/0.22)
– C*07:02 (0.97/0.30)
7 of 16
Genomics and evolutionary biology | Microbiology and infectious disease
Research article
Table 1. Continued
HIV
gene HIV position SNP CTL epitope (codons) Tagging HLA (D’/r2) SNP vs aa (p) SNP vs VL (p) aa vs VL (p)
Significant associations (p < 2.4 × 10-12) were observed for 48 HIV-1 amino acid variants. The table shows the major amino acid variants present at each specific HIV-1 position, the strongest
associated SNP and its linked HLA class I allele(s), if applicable. The column ‘CTL Epitope (codons)’ lists published, optimally described CTL epitopes (available at http://www.hiv.lanl.gov/
content/immunology/tables/optimal_ctl_summary.html and in [Carlson et al., 2012]) restricted by the tagged HLA class I allele(s) specified, and their positions within the protein. Where
multiple overlapping epitopes restricted by the same HLA class I allele have been described, only one is shown. Associations where no relevant CTL epitope has been described are
indicated with a dash. The last three columns give association p-values for comparisons between human SNPs and viral amino acids, human SNPs and plasma VL and viral amino acids and
plasma VL, respectively. For tests involving viral amino acids accommodating more than 1 alternate allele, the smallest association p-value observed at that position is reported.
DOI: 10.7554/eLife.01123.005
8 of 16
Genomics and evolutionary biology | Microbiology and infectious disease
Research article Genomics and evolutionary biology | Microbiology and infectious disease
Figure 3. Association of HIV-1 amino acid variants with plasma viral load. (A) Changes in VL (slope coefficients from the univariate regression model and
standard error, log10 copies/ml) for the 48 HIV-1 amino acids that are associated with host SNPs in the genome-to-genome analysis. (B) rs2395029, a
marker of HLA-B*57:01 is associated with a 0.38 log10 copies/ml lower VL (black bar) in comparison to the population mean. Gray bars represent changes
in VL for amino acid variants associated with rs2395029 (p<0.001). In case of multiallelic positions, the change in VL is shown for all minor amino acids
combined vs the major amino acid (e.g., GAG147 not I).
DOI: 10.7554/eLife.01123.006
HLA-B*57:01) in the genome-to-genome analysis (selection cutoff: p<0.001). The marker rs2395029
was associated with a 0.38 log decrease in viral RNA copies/ml. The univariate effect on VL for each of
the 23 viral amino acids targeted by this allele ranged from –0.16 to +0.12 (Figure 3B). These results
suggest that the genome-to-genome approach can be linked to clinical/laboratory phenotypes,
allowing for detailed understanding of the distribution and relative contribution of sites of host–
pathogen interaction to disease outcome.
Discussion
HIV-1 host genomic studies performed so far have focused on clinically defined outcomes (resistance
to infection, clinical presentation, disease progression or death) or on pathogen-related laboratory
results (such as CD4+ T cell counts and VL set point). While useful, these phenotypes have significant
drawbacks. First, consistency of phenotypic determination can be hard to achieve, and such inconsist-
ency can adversely affect power in large-scale genetic studies performed across multiple centers
(Evangelou et al., 2011). Second, a relatively long follow-up in the absence of antiretroviral treatment
is necessary to obtain informative data about the natural history of infection. However, international
guidelines now propose an early start of antiretroviral therapy in most HIV-1 infected individuals
(Thompson et al., 2012), making the collection of large numbers of long-term untreated patients not
only unrealistic but also ethically questionable.
To overcome these limitations, we developed a novel approach for host genetic studies of infec-
tious diseases, built on the unprecedented possibility to obtain paired genome-wide information from
hosts and pathogens. We combined human polymorphism and HIV-1 sequence diversity in the same
analytical framework to search for sites of human-virus genomic conflict, effectively using variation in
HIV-1 amino acids as an ‘intermediate phenotype’ for association studies. Intermediate phenotypes
have recently been shown to be useful in uncovering association signals that are not detectable using
more complex clinical endpoints: illustrative examples include metabolomic biomarkers in cardiovascular
research (Suhre et al., 2011), serum IgE concentration in the study of asthma (Moffatt et al., 2010),
or neuroimaging-based phenotypes in psychiatry genetics (Rasetti and Weinberger, 2011). Variation
in the pathogen sequence is an as-yet-untapped intermediate phenotype, specific by nature to
genomic research in infectious diseases. Importantly, it depends on sequencing the pathogen,
which could prove in many cases easier and more standardized than obtaining detailed clinical
phenotypes.
Our approach allowed the mapping of host genetic pressure on the HIV-1 genome. The strongest
association signals genome-wide were observed between human SNPs tagging HLA class I alleles
and viral mutations in their corresponding CTL epitopes. Additional association signals were observed
outside of optimally defined CTL epitopes, which could indicate novel epitopes, or represent
secondary (compensatory) mutations. In a single experiment, these results recapitulate extensive
epidemiological and immunogenetic research and represent a proof-of-concept that biologically
meaningful association signals are identifiable using a hypothesis-free strategy. Indeed, host factors
leading to viral adaptation can be uncovered by searching for associated imprints in the viral genome.
Of note, the International HIV Controllers Study demonstrated the importance of specific amino acid
positions in the HLA-B binding groove on a clinical outcome (elite control) (Pereyra et al., 2010). We
here extend this observation to the HLA-A and C grooves, emphasizing the similarity in mechanism
of host pressure on the viral proteome that is not necessarily translated into observable clinical
outcomes.
We found a higher density of amino acid positions under selection in Gag and Nef compared with
the rest of the HIV proteome. This is consistent with earlier findings that indicate the importance of
Gag p24-specific CTL responses in slower progression to AIDS (Borghans et al., 2007; Brennan et al.,
2012) or controller status (Dyer et al., 2008). Moreover, this further demonstrates that mapping host
pressure on the pathogen proteome can reveal biologically relevant effects.
Analyses were performed using samples from clinically well-characterized patients, most of them
with repeated and reliable HIV-1 VL measurements in the absence of antiretroviral therapy. We were
thus able to compare the results of GWAS assessing human genetic determinants of mean VL,
a standard clinical correlate of HIV-1 control, and genome-to-genome GWAS on amino acid variants
in the viral proteome. The use of HIV-1 variation as outcome resulted in a considerable gain in power
to detect host factors: the lowest p-values were observed for SNPs mapping to the HLA class I
region in both approaches, but associations were much stronger with HIV-1 amino acid variation
than for HIV-1 VL (2.7 × 10−66 vs 1 × 10−08), even when accounting for the increased number of multiple
tests.
In addition to identifying sites of interaction between the host and the pathogen, the study design
allowed the scoring of biological consequences of such interaction, by assessing associations between
host-driven escape at viral sites and an in vivo phenotype (VL). For example, we decomposed the
effect of rs2395029 (a marker of HLA-B*57:01) on VL to the effects of the multiple viral amino acid vari-
ants that are associated with that SNP. While some HIV-1 amino acid changes individually associate
with decrease in VL, the compound image that emerges is one of a multiplicity of modest effects
distributed across many residues. Correlations between host-associated variants and VL are difficult
to interpret, because they may reflect fitness costs or compensation, the existence of strong (Iversen
et al., 2006; Carlson et al.,, 2012) or novel (Almeida et al., 2011) immune responses, or the indirect
impact of specific HLA class I alleles. Nevertheless, the observation that the majority of host-associated
HIV-1 mutations do not correlate with any detectable change in VL confirms HIV’s remarkable capacity
to adapt and compensate to immune pressure, often without measurable fitness cost.
A significant confounder in both human and viral genomic analyses is the existence of population
stratification, where shared ancestry between infected individuals, stratification by ethnic groups, non-
random distribution of HIV-1 subtypes, or clusters of viral transmission can all have an influence on the
population frequencies of specific mutations, and thus create spurious associations if not carefully
controlled for. Previous studies usually controlled for viral population substructure but were limited in
the control of human population stratification (Moore et al., 2002; Bhattacharya et al., 2007). Our
approach offers the opportunity to correct for both factors, thanks to the availability of extensive host
and viral genomic information.
The present sample size provided approximately 80% power to detect a common human variant
(minor allele frequency of 10%) with an odds ratio of 4.2 in the genome-to-genome analysis (Study B)
and a viral amino acid explaining approximately 4% of the variation in plasma viral load (Study C) at
the respective significance thresholds (Purcell et al., 2003). Consistent with most studies performed
in HIV-1 host genetics over the past few years (reviewed in Telenti and Johnson (2012)), we did not
identify previously unknown host genetic loci involved in host-viral interaction and HIV-1 restriction.
The proposed approach can only detect polymorphic host factors that leave an imprint on the virus,
which may exclude mediators of immunopathogenesis or genes involved in the establishment of
tolerance (Medzhitov et al., 2012). An additional limitation is the incomplete nature of genomic
information available both on the host side (common genotypes from GWAS) and on the viral side
(near full-length consensus sequence; gp120 was not included in the analyses). Finally, the multiple
hypothesis burden of a genome-to-genome scan is extremely high. It is conceivable that larger studies,
or studies that focus on a subgroup of predefined host genes, would have power to detect novel asso-
ciations. A comprehensive, but computationally challenging description of host–pathogen genomic
interactions would require human genome sequencing, coupled with deep sequencing of intra-host
retroviral subpopulations.
In summary, we used a genome-to-genome, hypothesis-free approach to identify associations
between host polymorphisms and HIV-1 genomic variation. This strategy allows a global assessment
of host–pathogen interactions at the genome level and reveals sites of genomic conflict. Comparable
approaches are immediately applicable to explore other important infectious diseases, as long as poly-
morphic host factors exert sufficient selective pressure to trigger escape mutations in the pathogen.
The observation that pathogen sequence variation, used as an intermediate phenotype, is more
powerful than clinical and laboratory outcomes to identify some host factors allows smaller-scale
studies and encourages analyses of less prevalent infectious diseases. Researchers involved in pathogen
genome studies and host genetic studies should strongly consider the gathering of paired host–pathogen
data.
Participants
Study participants are treatment-naïve individuals followed in one of the following cohorts or institu-
tions: the Swiss HIV Cohort Study (SHCS, www.shcs.ch, [Schoeni-Affolter et al., 2010]); the HAART
Observational Medical Evaluation and Research (HOMER) study in Vancouver, Canada (www.cfenet.
ubc.ca/our-work/initiatives/homer); the AIDS Clinical Trials Group (ACTG) Network in the USA
(actgnetwork.org); the International HIV Controllers Study in Boston, USA (IHCS, www.hivcontrollers.
org); Western Australian HIV Cohort Study, Perth, Australia; the AIDS Research Institute IrsiCaixa in
Badalona, Spain; and the Instituto de Salud Carlos III in Madrid, Spain. To reduce noise due to host and
viral diversity, we only included individuals of recent Western European ancestry (confirmed by clustering
with HapMap CEU individuals in principal component analysis of the genotype data [Price et al.,
2006]), and infected with HIV-1 subtype B (as assessed by the REGA Subtyping Tool [De Oliveira
et al., 2005]). Plasma VL determinations in the absence of antiretroviral therapy were available from
patients from the SHCS and the HOMER study. The VL phenotype was defined as the average of
the log10-transformed numbers of HIV-1 RNA copies per ml of plasma, excluding measurements
obtained in the first 6 months after seroconversion and during advanced immunosuppression (i.e.,
with <100 CD4+ T cells per ml of blood). Consequently, 698 study participants were eligible for VL
analysis.
performed three series of analyses (Figure 1): [A] human SNPs vs VL; [B] human SNPs vs HIV-1
amino acids; and [C] HIV-1 amino acids vs VL. To test for association between human SNPs and
HIV-1 amino acids, we used phylogenetically corrected logistic regression (Carlson et al., 2008;
Carlson et al., 2012). For association testing between polymorphic amino acids in human HLA
genes and HIV sequence variation, we used standard logistic regression (for a binary HLA amino
acid) or a multivariate omnibus test (when more than one alternate allele was present) including
sex, cohort, and the coordinates of the first two principal component axes as covariates. We used
linear regression models in PLINK to test for association between human SNPs and VL, and between
HIV-1 amino acids and VL (Purcell et al., 2007), including sex, cohort, and the coordinates of the
first two principal component axes as covariates (Price et al., 2006). An additive genetic model was
used for all analyses involving human SNPs. Significance was assessed using Bonferroni correction
(significance thresholds of 7.25 × 10−9, 2.4 × 10−12, and 1.6 × 10−5 for analyses A, B, and C, respectively,
Figure 1).
Acknowledgements
We would like to thank all the patients participating in these genetic studies, the many study nurses,
physicians, data managers and laboratories involved in all the cohorts; Tanja Stadler and Sebastian
Bonhoeffer (at ETH Zürich, Switzerland) and Samuel Alizon (at MIGEVEC, Montpellier, France) for
helpful discussions; and Jennifer Troyer (at the Laboratory for Genomic Diversity, NCI) for her work on
the HOMER genotyping data.
Additional information
Funding
Grant reference
Funder number Author
Swiss National Science 33CS30_134277/Swiss Amalio Telenti,
Foundation HIV Cohort Study, Jacques Fellay
31003A_132863/1,
PP00P3_133703/1
Santos Suarez Foundation, Amalio Telenti,
Lausanne Jacques Fellay
Hungarian Academy of Bolyai János Research Viktor Müller
Sciences Fellowship
Michael Smith Foundation Zabrina L Brumme
for Health Research
Canadian Institutes of Zabrina L Brumme
Health Research
Sciex-NMS Program 10.267 István Bartha
Spanish Ministry of Science SAF 2007-61036, Cecilio López-Galíndez,
and Innovation 2010-17226 and Concepción Casado
2010-18917
Fundacion para la 36558/06, 36641/07, Cecilio López-Galíndez,
investigacion y prevencion 36779/08, 360766/09 Concepción Casado
del SIDA en Espana
RETIC de Investigacion RD06/006/0036 Cecilio López-Galíndez,
en SIDA Concepción Casado
National Institute of P01-AI074415 Todd M Allen
Allergy and Infectious
Diseases (NIAID)
Bill and Melinda Gates Todd M Allen
Foundation
SNF Professorship PP00P3_133703/1 Jacques Fellay
The funders had no role in study design, data collection and interpretation, or the
decision to submit the work for publication.
Author contributions
IB, JMC, CJB, JL, NP, CL, NF, ZK, VM, Analysis and interpretation of data, Drafting or revising the
article; PJM, AT, JF, Conception and design, Analysis and interpretation of data, Drafting or revising
the article; ZLB, Acquisition of data, Analysis and interpretation of data, Drafting or revising the arti-
cle, Contributed unpublished essential data or reagents; MJ, Acquisition of data, Drafting or revising
the article, Contributed unpublished essential data or reagents; DWH, JM-P, JD, CL-G, CC, AR, HFG,
EB, PV, TK, SY, SJO’B, PRH, Acquisition of data, Drafting or revising the article; TMA, Acquisition of
data, Analysis and interpretation of data, Drafting or revising the article; DH, Analysis and interpreta-
tion of data, Drafting or revising the article, Contributed unpublished essential data or reagents
Ethics
Human subjects: Participating centers provided local Institutional Review Board approval for genetic
analysis. Study participants provided informed consent for genetic testing, with the exception of
a subset where a procedure approved by the relevant Research Ethics Board allowed the use of
anonymized historical specimens in the absence of a specific informed consent.
Additional files
Major dataset
References
Alizon S, von Wyl V, Stadler T, Kouyos RD, Yerly S, Hirschel B, Böni J, et al. 2010. Phylogenetic approach reveals
that virus genotype largely determines HIV set-point viral load. PLOS Pathogens 6:e1001123. doi: 10.1371/
journal.ppat.1001123.
Almeida CA, Bronke C, Roberts SG, McKinnon E, Keane NM, Chopra A, Kadie C, et al. 2011. Translation
of HLA-HIV associations to the cellular level: HIV adapts to inflate CD8 T cell responses against Nef and
HLA-adapted variant epitopes. J Immunol 187:2502–13. doi: 10.4049/jimmunol.1100691.
Alter G, Heckerman D, Schneidewind A, Fadda L, Kadie CM, Carlson JM, Oniangue-Ndza C, et al. 2011. HIV-1
adaptation to NK-cell-mediated immune pressure. Nature 476:96–100. doi: 10.1038/nature10237.
Bhattacharya T, Daniels M, Heckerman D, Foley B, Frahm N, Kadie C, Carlson J, et al. 2007. Founder effects in
the assessment of HIV polymorphisms and HLA allele associations. Science 315:1583–6. doi: 10.1126/
science.1131528.
Borghans JA, Mølgaard A, de Boer RJ, Keşmir C. 2007. HLA alleles associated with slow progression to AIDS
truly prefer to present HIV-1 P24. PLOS ONE 2:e920. doi: 10.1371/journal.pone.0000920.
Brennan CA, Ibarrondo FJ, Sugar CA, Hausner MA, Shih R, Ng HL, Detels R, et al. 2012. Early HLA-B*57-
restricted CD8+ T lymphocyte responses predict HIV-1 disease progression. J Virol 86:10505–16. doi: 10.1128/
JVI.00102-12.
Brumme ZL, Brumme CJ, Heckerman D, Korber BT, Daniels M, Carlson J, Kadie C, et al. 2007. Evidence of
differential HLA class i-mediated viral evolution in functional and accessory/regulatory genes of HIV-1. PLOS
Pathogens 3:e94. doi: 10.1371/journal.ppat.0030094.
Carlson JM, Brumme CJ, Martin E, Listgarten J, Brockman MA, Le AQ, Chui CK, et al. 2012. Correlates of
protective cellular immunity revealed by analysis of population-level immune escape pathways in HIV-1. J Virol
86:13202–16. doi: 10.1128/JVI.01998-12.
Carlson JM, Brumme ZL, Rousseau CM, Brumme CJ, Matthews P, Kadie C, Mullins JI, et al. 2008. Phylogenetic
dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag. Edited by Rob J
De Boer. PLOS Comput Biol 4:23. doi: 10.1371/journal.pcbi.1000225.
Carlson JM, Listgarten J, Pfeifer N, Tan V, Kadie C, Walker BD, Ndung’u T, et al. 2012. Widespread impact of
HLA restriction on immune control and escape pathways of HIV-1. J Virol 86:5230–43. doi: 10.1128/
JVI.06728-11.
Telenti A, Johnson WE. 2012. Host genes important to HIV replication and evolution. Cold Spring Harbor
Perspectives in Medicine 2:a007203. doi: 10.1101/cshperspect.a007203.
Thompson MA, Aberg JA, Hoy JF, Telenti A, Benson C, Cahn P, Eron JJ, et al. 2012. Antiretroviral treatment of
adult HIV infection: 2012 recommendations of the International Antiviral Society-USA panel. J Am Med Assoc
308:387–402. doi: 10.1001/jama.2012.7961.
Von Wyl V, Kouyos RD, Yerly S, Böni J, Shah C, Bürgisser P, Klimkait T, et al. 2011. The Role of migration and
domestic transmission in the spread of HIV-1 non-B subtypes in Switzerland. J Infect Dis 204:1095–103.
doi: 10.1093/infdis/jir491.
Von Wyl V, Yerly S, Bürgisser P, Klimkait T, Battegay M, Bernasconi E, Cavassini M, et al. 2009. Long-term trends
of HIV type 1 drug resistance prevalence among antiretroviral treatment-experienced patients in Switzerland.
Clin Infect Dis 48:979–87. doi: 10.1086/597352.
Wain LV, Bailes E, Bibollet-Ruche F, Decker JM, Keele BF, Van Heuverswyn F, Li Y, et al. 2007. Adaptation of HIV-1
to its human host. Mol Biol Evol 24:1853–60. doi: 10.1093/molbev/msm110.
Wright JK, Brumme ZL, Julg B, van der Stok M, Mncube Z, Gao X, Carlson JM, et al. 2012. Lack of association
between HLA class II alleles and in vitro replication capacities of recombinant viruses encoding HIV-1 subtype C
Gag-protease from chronically infected individuals. J Virol 86:1273–6. doi: 10.1128/JVI.06533-11.