High throughput, massively parallel DNA sequencing provides a powerful technology to study the human genome and to identify variations in DNA that cause disease. Sequencing the protein coding region of the genome (`whole-exome sequencing') is a cost effective method to search the part of the genome that is most likely to harbor disease related mutations.
We developed software methods to process sequencing data and to annotate variants with data on genes, function, conservation, expression, diseases, pathways, and protein structure. We applied whole-exome sequencing to search for the molecular basis of disease in three projects: 1) a cohort of patients with congenital diarrheal disorders (CDDs); 2) a cohort of patients with congenital chronic intestinal pseudo-obstruction (CIPO) or the related disease, megacystis-microcolon-intestinal hypoperistalsis syndrome (MMIH); and 3) four siblings with infantile pontocerebellar hypoplasia and spinal motor neuron degeneration.
We sequenced 45 probands from diverse ethnic backgrounds who were diagnosed with a variety of CDDs of probable, but unknown genetic cause. Patients had been diagnosed with generalized malabsorptive diarrhea, selective nutrient malabsorption, secretory diarrhea, and infantile IBD. We found homozygous or compound heterozygous mutations, 25 of them novel, in genes known to be associated with CDDs in 27 cases (60%). The genes implicated were ADAM17, DGAT1, EPCAM, IL10RA, MALT1, MYO5B, NEUROG3, PCSK1, SI, SKIV2L, SLC26A3, and SLC5A.
With whole-exome sequencing in a cohort of 20 patients with congenital CIPO or MMIH, we identified a subset of 10 cases with potentially damaging de-novo dominant acting mutations at highly conserved loci in the ACTG2 gene, encoding actin, gamma-enteric smooth muscle precursor, a protein essential to the functioning of muscle cells in the intestinal wall.
By exome sequencing, we discovered rare recessive mutations in EXOSC3 (encoding exosome component 3) that were responsible for pontocerebellar hypoplasia and spinal motor neuron degeneration in the four probands, and identified identical and additional novel mutations in a large percentage of other children with the same disorder.
In conclusion, we demonstrated that whole-exome sequencing is an effective approach for the identification of casual mutations in that may escape detection with standard practice involving a complex diagnostic workup and targeted gene sequencing.