methylGSA is an R package for DNA methylation data length bias adjustment in gene set testing. Now it has been extended with EPICv2 array.
The package tutorial with EPICv2 array can be found here.
The methylGSA paper can be found here.
The original methylGSA GitHub page can be found here.
The original Bioconductor package can be found here.
The original Bioconductor package vignette with 450k array can be found here.
To install and load methylGSA
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("Leesure001/methylGSA")
library(methylGSA)
Depending on the DNA methylation array type, other packages may be needed before running the analysis.
If analyzing 450K array, IlluminaHumanMethylation450kanno.ilmn12.hg19 needs to be loaded.
library(IlluminaHumanMethylation450kanno.ilmn12.hg19)
If analyzing EPIC array, IlluminaHumanMethylationEPICanno.ilm10b4.hg19 needs to be loaded.
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
If analyzing EPICv2 array, IlluminaHumanMethylationEPICv2anno.20a1.hg38 needs to be loaded.
library(IlluminaHumanMethylationEPICv2anno.20a1.hg38)
If analyzing user-supplied mapping between CpG ID and gene name, no additional package is needed.
data(cpgtoy_v2)
head(cpg.pval_v2, 20)
Please note that the p-values presented here in cpg.pval_v2 is for illustration purpose only. They are used to show how to utilize the functions in methylGSA. It does not represent p-values from any real data analysis. The actual p-values in differential methylation analysis may be quite different from the p-values in cpg.pval_v2 in terms of the magnitude.
Run methylglm:
res1 = methylglm(cpg.pval = cpg.pval_v2, array.type = "EPIC_V2", minsize = 200,
maxsize = 500, GS.type = "KEGG")
head(res1, 15)
Result is a data frame ranked by p-values of gene sets. The meaning of each column is given below.
glm_res = data.frame(
"Column" = c("ID",
"Description (N/A for user supplied gene set)",
"Size", "pvalue", "padj"),
"Explanation" = c("Gene set ID", "Gene set description",
"Number of genes in gene set",
"p-value in logistic regression", "Adjusted p-value")
)
knitr::kable(glm_res)
Bioconductor package org.Hs.eg.db can be used to pull out the genes corresponding a specific pathway. For instance, one of the pathways in the methylglm output above is "Neuroactive ligand-receptor interaction" with KEGG ID 04080. The following code can be used to obtain its genes.
library(org.Hs.eg.db)
genes_04080 = select(org.Hs.eg.db, "04080", "SYMBOL", keytype = "PATH")
head(genes_04080)
The following code can be used to get the genes in all the pathways in methylglm output.
# include all the IDs as the 2nd argument in select function
genes_all_pathway = select(org.Hs.eg.db, as.character(res1$ID),
"SYMBOL", keytype = "PATH")
head(genes_all_pathway)
To apply ORA approach, we use argument method = "ORA" (default) in methylRRA
res2 = methylRRA(cpg.pval = cpg.pval_v2, array.type = "EPIC_V2", method = "ORA",
minsize = 200, maxsize = 210)
head(res2, 15)
The meaning of each column in the output is given below.
ora_res = data.frame(
"Column" = c("ID",
"Description (N/A for user supplied gene set)", "Count",
"overlap", "Size", "pvalue", "padj"),
"Explanation" = c("Gene set ID", "Gene set description",
"Number of significant genes in the gene set",
"Names of significant genes in the gene set",
"Number of genes in gene set",
"p-value in ORA", "Adjusted p-value")
)
knitr::kable(ora_res)
To apply GSEAPreranked approach, we use argument method = "GSEA" in methylRRA
res3 = methylRRA(cpg.pval = cpg.pval_v2, array.type = "EPIC_V2", method = "GSEA",
minsize = 200, maxsize = 210)
head(res3, 10)
The meaning of each column in the output is given below.
gsea_res = data.frame(
"Column" = c("ID",
"Description (N/A for user supplied gene set)",
"Size", "enrichmentScore", "NES",
"pvalue", "padj"),
"Explanation" = c("Gene set ID", "Gene set description",
"Number of genes in gene set",
"Enrichment score (see [3] for details)",
"Normalized enrichment score (see [3] for details)",
"p-value in GSEA", "Adjusted p-value")
)
knitr::kable(gsea_res)
res4 = methylgometh(cpg.pval = cpg.pval_v2, array.type = "EPIC_V2", sig.cut = 0.001,
minsize = 200, maxsize = 210)
head(res4, 15)
An example of user supplied gene sets is given below. The gene ID type is gene symbol.
data(GSlisttoy_v2)
## to make the display compact, only a proportion of each gene set is shown
head(lapply(GS.list_v2, function(x) x[1:30]), 3)
Below is an example of running methylglm with parallel option
library(BiocParallel)
res_p = methylglm(cpg.pval = cpg.pval_v2, array.type = "EPIC_V2", minsize = 200,
maxsize = 500, GS.type = "KEGG", parallel = TRUE)
methylglm and methylRRA support user supplied CpG ID to gene mapping. The mapping is expected to be a matrix, or a data frame or a list. For a matrix or data frame, 1st column should be CpG ID and 2nd column should be gene name. For a list, entry names should be gene names and elements correpond to CpG IDs. Below an example of user supplied CpG to gene mapping.
data(CpG2Genetoy_v2)
head(CpG2Gene_v2)
To use user supplied mapping in methylglm or methylRRA, first preprocess the mapping by prepareAnnot function
FullAnnot = prepareAnnot(CpG2Gene_v2)
Test the gene sets using "ORA" in methylRRA, use FullAnnot argument to provide the preprocessed CpG ID to gene mapping
GS.list_v2 = GS.list_v2[1:10]
res5 = methylRRA(cpg.pval = cpg.pval_v2, array.type = "EPIC_V2", FullAnnot = FullAnnot, method = "ORA",
GS.list = GS.list_v2, GS.idtype = "SYMBOL",
minsize = 100, maxsize = 300)
head(res5, 10)
Below is another example testing Reactome pathways using methylglm.
res6 = methylglm(cpg.pval = cpg.pval_v2, array.type = "EPIC_V2",
GS.type = "Reactome", minsize = 100, maxsize = 110)
head(res6, 10)
Below is an example of using barplot to visualize the result of methylglm.
barplot(res1, num = 8, colorby = "pvalue")
If you use methylGSA or the extended EPIC v2 functionality provided in this GitHub version, please cite the original methylGSA paper and also cite this GitHub repository:
Ren, X., & Kuan, P. F. (2019). methylGSA: a Bioconductor package and Shiny app for DNA methylation data length bias adjustment in gene set testing. Bioinformatics, 35(11), 1958-1959.
Ren, X., & Li, S., & Kuan, P. F. (2025). methylGSA EPIC v2 Extension. GitHub. https://github.com/Leesure001/methylGSA
@article{ren2019methylgsa,
title={methylGSA: a Bioconductor package and Shiny app for DNA methylation data length bias adjustment in gene set testing},
author={Ren, Xu and Kuan, Pei Fen},
journal={Bioinformatics},
volume={35},
number={11},
pages={1958--1959},
year={2019},
publisher={Oxford University Press}
}
@misc{methylGSA_epicv2,
author = {Ren, Xu and Li, Shuo and Kuan, Pei Fen},
title = {methylGSA EPIC v2 Extension},
year = {2025},
howpublished = {\url{https://github.com/Leesure001/methylGSA}},
note = {Accessed: 2025-12-13}
}