The bridge between the NCBI Gene Expression Omnibus (GEO) and Bioconductor.
GEOquery downloads and parses data from the NCBI Gene Expression Omnibus — a public repository of high-throughput functional genomics data — into Bioconductor objects, so you can go from a GEO accession to an analysis-ready object in one call.
- Series, Samples, Platforms, DataSets. Parse any GEO entity (
GSE,GSM,GPL,GDS) from either the compact Series Matrix or the full SOFT format. - Modern object model. GSE Series Matrix records return
SummarizedExperimentobjects by default (orExpressionSetviareturnType = "ExpressionSet"). - RNA-seq. Retrieve NCBI's uniformly-computed RNA-seq quantifications with
getRNASeqData(). - Single-cell. Inventory, group, and load single-cell supplementary data
(10x Matrix Market, 10x HDF5, AnnData
.h5ad, Seurat.rds) intoSingleCellExperiment(orSeurat) objects. - Supplementary files. List and download any attached files with
getGEOSuppFiles(). - Search. Query GEO programmatically with
searchGEO(). - Robust downloads. Streaming downloads with retries, an optional persistent
BiocFileCachecache, and typed error conditions fortryCatch().
# from Bioconductor (recommended)
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("GEOquery")
# or the development version from GitHub
BiocManager::install("seandavi/GEOquery")library(GEOquery)
# A GSE via the fast Series Matrix path -> a list of SummarizedExperiment,
# one per platform.
gse <- getGEO("GSE2553")
se <- gse[[1]]
assay(se) # expression matrix
colData(se) # sample metadata
rowData(se) # feature annotation
# Other entity types parse to GEOquery's S4 classes:
getGEO("GSM11805") # a sample
getGEO("GPL96") # a platform
getGEO("GDS507") # a curated dataset
# See what supplementary files a study has, without downloading:
getGEOSuppFiles("GSE63137", fetch_files = FALSE)The package vignette is a quick-start; the in-depth, narrative articles cover the why and the downstream workflows:
- Getting started
- Understanding GEO data formats
- RNA-seq quantifications
- Single-cell data from GEO
- From GEO to downstream analysis
Bioconductor landing pages: release · devel
- Usage questions: the Bioconductor support
site, tagged
geoquery. - Bugs and feature requests: the issue
tracker — please include a GEO
accession and
sessionInfo().
Contributions are welcome as pull requests or issues. See CONTRIBUTING.md for the development workflow, and follow the Bioconductor coding standards where possible.
If you use GEOquery, please cite:
Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007;23(14):1846–1847.
citation("GEOquery")