Search

Scholarly Works (37 results)

Sort By:

Show:

Article
Peer Reviewed

An evolutionary compass for detecting signals of polygenic selection and mutational bias

UCLA Previously Published Works (2019)

Selection and mutation shape the genetic variation underlying human traits, but the specific evolutionary mechanisms driving complex trait variation are largely unknown. We developed a statistical method that uses polarized genome-wide association study (GWAS) summary statistics from a single population to detect signals of mutational bias and selection. We found evidence for nonneutral signals on variation underlying several traits (body mass index [BMI], schizophrenia, Crohn's disease, educational attainment, and height). We then used simulations that incorporate simultaneous negative and positive selection to show that these signals are consistent with mutational bias and shifts in the fitness-phenotype relationship, but not stabilizing selection or mutational bias alone. We additionally replicate two of our top three signals (BMI and educational attainment) in an external cohort, and show that population stratification may have confounded GWAS summary statistics for height in the GIANT cohort. Our results provide a flexible and powerful framework for evolutionary analysis of complex phenotypes in humans and other species, and offer insights into the evolutionary mechanisms driving variation in human polygenic traits.

Cover page: An evolutionary compass for detecting signals of polygenic selection and mutational bias

Article
Peer Reviewed

Transcriptome-wide association study of breast cancer risk by estrogen-receptor status

California Breast Cancer Research Program Funded Publications (2020)

Article
Peer Reviewed

Leveraging local ancestry to detect gene-gene interactions in genome-wide data

UCLA Previously Published Works (2015)

Background

Although genome-wide association studies have successfully identified thousands of variants associated to complex traits, these variants only explain a small amount of the entire heritability of the trait. Gene-gene interactions have been proposed as a source to explain a significant percentage of the missing heritability. However, detecting gene-gene interactions has proven to be very difficult due to computational and statistical challenges. The vast number of possible interactions that can be tested induces very stringent multiple hypotheses corrections that limit the power of detection. These issues have been mostly highlighted for the identification of pairwise effects and are even more challenging when addressing higher order interaction effects. In this work we explore the use of local ancestry in recently admixed individuals to find signals of gene-gene interaction on human traits and diseases.

Results

We introduce statistical methods that leverage the correlation between local ancestry and the hidden unknown causal variants to find distant gene-gene interactions. We show that the power of this test increases with the number of causal variants per locus and the degree of differentiation of these variants between the ancestral populations. Overall, our simulations confirm that local ancestry can be used to detect gene-gene interactions, solving the computational bottleneck. When compared to a single nucleotide polymorphism (SNP)-based interaction screening of the same sample size, the power of our test was lower on all settings we considered. However, accounting for the dramatic increase in sample size that can be achieve when genotyping only a set of ancestry informative markers instead of the whole genome, we observe substantial gain in power in several scenarios.

Conclusion

Local ancestry-based interaction tests offer a new path to the detection of gene-gene interaction effects. It would be particularly useful in scenarios where multiple differentiated variants at the interacting loci act in a synergistic manner.

Cover page: Leveraging local ancestry to detect gene-gene interactions in genome-wide data

Article
Peer Reviewed

Methods for fine-mapping with chromatin and expression data

UCLA Previously Published Works (2018)

Recent studies have identified thousands of regions in the genome associated with chromatin modifications, which may in turn be affecting gene expression. Existing works have used heuristic methods to investigate the relationships between genome, epigenome, and gene expression, but, to our knowledge, none have explicitly modeled the chain of causality whereby genetic variants impact chromatin, which impacts gene expression. In this work we introduce a new hierarchical fine-mapping framework that integrates information across all three levels of data to better identify the causal variant and chromatin mark that are concordantly influencing gene expression. In simulations we show that our method is more accurate than existing approaches at identifying the causal mark influencing expression. We analyze empirical genetic, chromatin, and gene expression data from 65 African-ancestry and 47 European-ancestry individuals and show that many of the paths prioritized by our method are consistent with the proposed causal model and often lie in likely functional regions.

Cover page: Methods for fine-mapping with chromatin and expression data

Article
Peer Reviewed

Probabilistic fine-mapping of transcriptome-wide association studies

UCLA Previously Published Works (2019)

Transcriptome-wide association studies using predicted expression have identified thousands of genes whose locally regulated expression is associated with complex traits and diseases. In this work, we show that linkage disequilibrium induces significant gene-trait associations at non-causal genes as a function of the expression quantitative trait loci weights used in expression prediction. We introduce a probabilistic framework that models correlation among transcriptome-wide association study signals to assign a probability for every gene in the risk region to explain the observed association signal. Importantly, our approach remains accurate when expression data for causal genes are not available in the causal tissue by leveraging expression prediction from other tissues. Our approach yields credible sets of genes containing the causal gene at a nominal confidence level (for example, 90%) that can be used to prioritize genes for functional assays. We illustrate our approach by using an integrative analysis of lipid traits, where our approach prioritizes genes with strong evidence for causality.

Cover page: Probabilistic fine-mapping of transcriptome-wide association studies

Article
Peer Reviewed

Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies.

UCLA Previously Published Works (2022)

Transcriptome-wide association studies (TWASs) are a powerful approach to identify genes whose expression is associated with complex disease risk. However, non-causal genes can exhibit association signals due to confounding by linkage disequilibrium (LD) patterns and eQTL pleiotropy at genomic risk regions, which necessitates fine-mapping of TWAS signals. Here, we present MA-FOCUS, a multi-ancestry framework for the improved identification of genes underlying traits of interest. We demonstrate that by leveraging differences in ancestry-specific patterns of LD and eQTL signals, MA-FOCUS consistently outperforms single-ancestry fine-mapping approaches with equivalent total sample sizes across multiple metrics. We perform TWASs for 15 blood traits using genome-wide summary statistics (average nEA = 511 k, nAA = 13 k) and lymphoblastoid cell line eQTL data from cohorts of primarily European and African continental ancestries. We recapitulate evidence demonstrating shared genetic architectures for eQTL and blood traits between the two ancestry groups and observe that gene-level effects correlate 20% more strongly across ancestries than SNP-level effects. Lastly, we perform fine-mapping using MA-FOCUS and find evidence that genes at TWAS risk regions are more likely to be shared across ancestries than they are to be ancestry specific. Using multiple lines of evidence to validate our findings, we find that gene sets produced by MA-FOCUS are more enriched in hematopoietic categories than alternative approaches (p = 2.36 × 10-15). Our work demonstrates that including and appropriately accounting for genetic diversity can drive more profound insights into the genetic architecture of complex traits.

Cover page: Multi-ancestry fine-mapping improves precision to identify causal genes in transcriptome-wide association studies.

Article
Peer Reviewed

Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.

UCLA Previously Published Works (2023)

Polygenic scores (PGSs) have emerged as a standard approach to predict phenotypes from genotype data in a wide array of applications from socio-genomics to personalized medicine. Traditional PGSs assume genotype data to be error-free, ignoring possible errors and uncertainties introduced from genotyping, sequencing, and/or imputation. In this work, we investigate the effects of genotyping error due to low coverage sequencing on PGS estimation. We leverage SNP array and low-coverage whole-genome sequencing data (lcWGS, median coverage 0.04×) of 802 individuals from the Dana-Farber PROFILE cohort to show that PGS error correlates with sequencing depth (p = 1.2 × 10-7). We develop a probabilistic approach that incorporates genotype error in PGS estimation to produce well-calibrated PGS credible intervals and show that the probabilistic approach increases classification accuracy by up to 6% as compared to traditional PGSs that ignore genotyping error. Finally, we use simulations to explore the combined effect of genotyping and effect size errors and their implication on PGS-based risk-stratification. Our results illustrate the importance of considering genotyping error as a source of PGS error especially for cohorts with varying genotyping technologies and/or low-coverage sequencing.

Cover page: Genotype error due to low-coverage sequencing induces uncertainty in polygenic scoring.

Article
Peer Reviewed

Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits

UCLA Previously Published Works (2017)

Although genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. Here, we introduce a method for estimating the local genetic correlation between gene expression and a complex trait and utilize it to estimate the genetic correlation due to predicted expression between pairs of traits. We integrated gene expression measurements from 45 expression panels with summary GWAS data to perform 30 multi-tissue transcriptome-wide association studies (TWASs). We identified 1,196 genes whose expression is associated with these traits; of these, 168 reside more than 0.5 Mb away from any previously reported GWAS significant variant. We then used our approach to find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, eight were not found through genetic correlation at the SNP level. Finally, we used bi-directional regression to find evidence that BMI causally influences triglyceride levels and that triglyceride levels causally influence low-density lipoprotein. Together, our results provide insight into the role of gene expression in the susceptibility of complex traits and diseases.

Cover page: Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits

Article
Peer Reviewed

Genetic Mechanisms Leading to Sex Differences Across Common Diseases and Anthropometric Traits

UC San Francisco Previously Published Works (2017)

Common diseases often show sex differences in prevalence, onset, symptomology, treatment, or prognosis. Although studies have been performed to evaluate sex differences at specific SNP associations, this work aims to comprehensively survey a number of complex heritable diseases and anthropometric traits. Potential genetically encoded sex differences we investigated include differential genetic liability thresholds or distributions, gene-sex interaction at autosomal loci, major contribution of the X-chromosome, or gene-environment interactions reflected in genes responsive to androgens or estrogens. Finally, we tested the overlap between sex-differential association with anthropometric traits and disease risk. We utilized complementary approaches of assessing GWAS association enrichment and SNP-based heritability estimation to explore explicit sex differences, as well as enrichment in sex-implicated functional categories. We do not find consistent increased genetic load in the lower-prevalence sex, or a disproportionate role for the X-chromosome in disease risk, despite sex-heterogeneity on the X for several traits. We find that all anthropometric traits show less than complete correlation between the genetic contribution to males and females, and find a convincing example of autosome-wide genome-sex interaction in multiple sclerosis (P = 1 × 10^-9). We also find some evidence for hormone-responsive gene enrichment, and striking evidence of the contribution of sex-differential anthropometric associations to common disease risk, implying that general mechanisms of sexual dimorphism determining secondary sex characteristics have shared effects on disease risk.

Cover page: Genetic Mechanisms Leading to Sex Differences Across Common Diseases and Anthropometric Traits

Article
Peer Reviewed

Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

UCLA Previously Published Works (2014)

MOTIVATION: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. RESULTS: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. AVAILABILITY AND IMPLEMENTATION: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.

Cover page: Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.