Search

Article
Peer Reviewed

BioMake: a GNU make-compatible utility for declarative workflow management.

Environ Genomics & Systems Bio (2017)

Motivation

The Unix 'make' program is widely used in bioinformatics pipelines, but suffers from problems that limit its application to large analysis datasets. These include reliance on file modification times to determine whether a target is stale, lack of support for parallel execution on clusters, and restricted flexibility to extend the underlying logic program.

Results

We present BioMake, a make-like utility that is compatible with most features of GNU Make and adds support for popular cluster-based job-queue engines, MD5 signatures as an alternative to timestamps, and logic programming extensions in Prolog.

Availability and implementation

BioMake is available for MacOSX and Linux systems from https://github.com/evoldoers/biomake under the BSD3 license. The only dependency is SWI-Prolog (version 7), available from http://www.swi-prolog.org/.

Contact

ihholmes + biomake@gmail.com or cmungall + biomake@gmail.com.

Supplementary information

Feature table comparing BioMake to similar tools. Supplementary data are available at Bioinformatics online.

Cover page: BioMake: a GNU make-compatible utility for declarative workflow management.

Article
Peer Reviewed

The genomic landscape of Neanderthal ancestry in present-day humans

UCLA Previously Published Works (2014)

Genomic studies have shown that Neanderthals interbred with modern humans, and that non-Africans today are the products of this mixture. The antiquity of Neanderthal gene flow into modern humans means that genomic regions that derive from Neanderthals in any one human today are usually less than a hundred kilobases in size. However, Neanderthal haplotypes are also distinctive enough that several studies have been able to detect Neanderthal ancestry at specific loci. We systematically infer Neanderthal haplotypes in the genomes of 1,004 present-day humans. Regions that harbour a high frequency of Neanderthal alleles are enriched for genes affecting keratin filaments, suggesting that Neanderthal alleles may have helped modern humans to adapt to non-African environments. We identify multiple Neanderthal-derived alleles that confer risk for disease, suggesting that Neanderthal alleles continue to shape human biology. An unexpected finding is that regions with reduced Neanderthal ancestry are enriched in genes, implying selection to remove genetic material derived from Neanderthals. Genes that are more highly expressed in testes than in any other tissue are especially reduced in Neanderthal ancestry, and there is an approximately fivefold reduction of Neanderthal ancestry on the X chromosome, which is known from studies of diverse species to be especially dense in male hybrid sterility genes. These results suggest that part of the explanation for genomic regions of reduced Neanderthal ancestry is Neanderthal alleles that caused decreased fertility in males when moved to a modern human genetic background.

Cover page: The genomic landscape of Neanderthal ancestry in present-day humans

Article
Peer Reviewed

ndexr-an R package to interface with the network data exchange.

UC San Diego Previously Published Works (2018)

Motivation:Seamless exchange of biological network data enables bioinformatic algorithms to integrate networks as prior knowledge input as well as to document resulting network output. However, the interoperability between pathway databases and various methods and platforms for analysis is currently lacking. The Network Data Exchange (NDEx) is an open-source data commons that facilitates the user-centered sharing and publication of networks of many types and formats. Results:Here, we present a software package that allows users to programmatically connect to and interface with NDEx servers from within R. The network repository can be searched and networks can be retrieved and converted into igraph-compatible objects. These networks can be modified and extended within R and uploaded back to the NDEx servers. Availability and implementation:ndexr is a free and open-source R package, available via GitHub (https://github.com/frankkramer-lab/ndexr) and Bioconductor (http://bioconductor.org/packages/ndexr/). Contact:florian.auer@med.uni-goettingen.de. Supplementary information:Supplementary data are available at Bioinformatics online.

Cover page: ndexr-an R package to interface with the network data exchange.

Article
Peer Reviewed

CONICS integrates scRNA-seq with DNA sequencing to map gene expression to tumor sub-clones

UC San Francisco Previously Published Works (2018)

Motivation

Single-cell RNA-sequencing (scRNA-seq) has enabled studies of tissue composition at unprecedented resolution. However, the application of scRNA-seq to clinical cancer samples has been limited, partly due to a lack of scRNA-seq algorithms that integrate genomic mutation data.

Results

To address this, we present.

Conics

COpy-Number analysis In single-Cell RNA-Sequencing. CONICS is a software tool for mapping gene expression from scRNA-seq to tumor clones and phylogenies, with routines enabling: the quantitation of copy-number alterations in scRNA-seq, robust separation of neoplastic cells from tumor-infiltrating stroma, inter-clone differential-expression analysis and intra-clone co-expression analysis.

Availability and implementation

CONICS is written in Python and R, and is available from https://github.com/diazlab/CONICS.

Supplementary information

Supplementary data are available at Bioinformatics online.

Article
Peer Reviewed

Accelerating open modification spectral library searching on tensor core in high-dimensional space

UC San Diego Previously Published Works (2023)

Motivation

Driven by technological advances, the throughput and cost of mass spectrometry (MS) proteomics experiments have improved by orders of magnitude in recent decades. Spectral library searching is a common approach to annotating experimental mass spectra by matching them against large libraries of reference spectra corresponding to known peptides. An important disadvantage, however, is that only peptides included in the spectral library can be found, whereas novel peptides, such as those with unexpected post-translational modifications (PTMs), will remain unknown. Open modification searching (OMS) is an increasingly popular approach to annotate modified peptides based on partial matches against their unmodified counterparts. Unfortunately, this leads to very large search spaces and excessive runtimes, which is especially problematic considering the continuously increasing sizes of MS proteomics datasets.

Results

We propose an OMS algorithm, called HOMS-TC, that fully exploits parallelism in the entire pipeline of spectral library searching. We designed a new highly parallel encoding method based on the principle of hyperdimensional computing to encode mass spectral data to hypervectors while minimizing information loss. This process can be easily parallelized since each dimension is calculated independently. HOMS-TC processes two stages of existing cascade search in parallel and selects the most similar spectra while considering PTMs. We accelerate HOMS-TC on NVIDIA's tensor core units, which is emerging and readily available in the recent graphics processing unit (GPU). Our evaluation shows that HOMS-TC is 31× faster on average than alternative search engines and provides comparable accuracy to competing search tools.

Availability and implementation

HOMS-TC is freely available under the Apache 2.0 license as an open-source software project at https://github.com/tycheyoung/homs-tc.

Cover page: Accelerating open modification spectral library searching on tensor core in high-dimensional space

Article
Peer Reviewed

A fast data-driven method for genotype imputation, phasing and local ancestry inference: MendelImpute.jl

UCLA Previously Published Works (2021)

Motivation

Current methods for genotype imputation and phasing exploit the volume of data in haplotype reference panels and rely on hidden Markov models (HMMs). Existing programs all have essentially the same imputation accuracy, are computationally intensive and generally require prephasing the typed markers.

Results

We introduce a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for HMM calculations. This strategy, embodied in our Julia program MendelImpute.jl, avoids explicit assumptions about recombination and population structure while delivering similar prediction accuracy, better memory usage and an order of magnitude or better run-times compared to the fastest competing method. MendelImpute operates on both dosage data and unphased genotype data and simultaneously imputes missing genotypes and phase at both the typed and untyped SNPs (single nucleotide polymorphisms). Finally, MendelImpute naturally extends to global and local ancestry estimation and lends itself to new strategies for data compression and hence faster data transport and sharing.

Availability and implementation

Software, documentation and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelImpute.jl.

Supplementary information

Supplementary data are available at Bioinformatics online.

Cover page: A fast data-driven method for genotype imputation, phasing and local ancestry inference: MendelImpute.jl

Article
Peer Reviewed

Analysis of Human Accelerated DNA Regions Using Archaic Hominin Genomes

UC San Francisco Previously Published Works (2012)

Several previous comparisons of the human genome with other primate and vertebrate genomes identified genomic regions that are highly conserved in vertebrate evolution but fast-evolving on the human lineage. These human accelerated regions (HARs) may be regions of past adaptive evolution in humans. Alternatively, they may be the result of non-adaptive processes, such as biased gene conversion. We captured and sequenced DNA from a collection of previously published HARs using DNA from an Iberian Neandertal. Combining these new data with shotgun sequence from the Neandertal and Denisova draft genomes, we determine at least one archaic hominin allele for 84% of all positions within HARs. We find that 8% of HAR substitutions are not observed in the archaic hominins and are thus recent in the sense that the derived allele had not come to fixation in the common ancestor of modern humans and archaic hominins. Further, we find that recent substitutions in HARs tend to have come to fixation faster than substitutions elsewhere in the genome and that substitutions in HARs tend to cluster in time, consistent with an episodic rather than a clock-like process underlying HAR evolution. Our catalog of sequence changes in HARs will help prioritize them for functional studies of genomic elements potentially responsible for modern human adaptations.

Cover page: Analysis of Human Accelerated DNA Regions Using Archaic Hominin Genomes

Article
Peer Reviewed

Neandertal Introgression Sheds Light on Modern Human Endocranial Globularity.

UC Irvine Previously Published Works (2019)

One of the features that distinguishes modern humans from our extinct relatives and ancestors is a globular shape of the braincase [1-4]. As the endocranium closely mirrors the outer shape of the brain, these differences might reflect altered neural architecture [4, 5]. However, in the absence of fossil brain tissue, the underlying neuroanatomical changes as well as their genetic bases remain elusive. To better understand the biological foundations of modern human endocranial shape, we turn to our closest extinct relatives: the Neandertals. Interbreeding between modern humans and Neandertals has resulted in introgressed fragments of Neandertal DNA in the genomes of present-day non-Africans [6, 7]. Based on shape analyses of fossil skull endocasts, we derive a measure of endocranial globularity from structural MRI scans of thousands of modern humans and study the effects of introgressed fragments of Neandertal DNA on this phenotype. We find that Neandertal alleles on chromosomes 1 and 18 are associated with reduced endocranial globularity. These alleles influence expression of two nearby genes, UBR4 and PHLPP1, which are involved in neurogenesis and myelination, respectively. Our findings show how integration of fossil skull data with archaic genomics and neuroimaging can suggest developmental mechanisms that may contribute to the unique modern human endocranial shape.

Cover page: Neandertal Introgression Sheds Light on Modern Human Endocranial Globularity.

Article
Peer Reviewed

Genomic basis for skin phenotype and cold adaptation in the extinct Steller’s sea cow

UC Santa Cruz Previously Published Works (2022)

Steller's sea cow, an extinct sirenian and one of the largest Quaternary mammals, was described by Georg Steller in 1741 and eradicated by humans within 27 years. Here, we complement Steller's descriptions with paleogenomic data from 12 individuals. We identified convergent evolution between Steller's sea cow and cetaceans but not extant sirenians, suggesting a role of several genes in adaptation to cold aquatic (or marine) environments. Among these are inactivations of lipoxygenase genes, which in humans and mouse models cause ichthyosis, a skin disease characterized by a thick, hyperkeratotic epidermis that recapitulates Steller's sea cows' reportedly bark-like skin. We also found that Steller's sea cows' abundance was continuously declining for tens of thousands of years before their description, implying that environmental changes also contributed to their extinction.

Cover page: Genomic basis for skin phenotype and cold adaptation in the extinct Steller’s sea cow

Article
Peer Reviewed

Neandertal Introgression Sheds Light on Modern Human Endocranial Globularity

UC Irvine Previously Published Works (2019)

One of the features that distinguishes modern humans from our extinct relatives and ancestors is a globular shape of the braincase [1-4]. As the endocranium closely mirrors the outer shape of the brain, these differences might reflect altered neural architecture [4, 5]. However, in the absence of fossil brain tissue, the underlying neuroanatomical changes as well as their genetic bases remain elusive. To better understand the biological foundations of modern human endocranial shape, we turn to our closest extinct relatives: the Neandertals. Interbreeding between modern humans and Neandertals has resulted in introgressed fragments of Neandertal DNA in the genomes of present-day non-Africans [6, 7]. Based on shape analyses of fossil skull endocasts, we derive a measure of endocranial globularity from structural MRI scans of thousands of modern humans and study the effects of introgressed fragments of Neandertal DNA on this phenotype. We find that Neandertal alleles on chromosomes 1 and 18 are associated with reduced endocranial globularity. These alleles influence expression of two nearby genes, UBR4 and PHLPP1, which are involved in neurogenesis and myelination, respectively. Our findings show how integration of fossil skull data with archaic genomics and neuroimaging can suggest developmental mechanisms that may contribute to the unique modern human endocranial shape.