Skip to content

seandavi/GEOquery

GEOquery

The bridge between the NCBI Gene Expression Omnibus (GEO) and Bioconductor.

R-CMD-check Bioc release Bioc devel Downloads Years in Bioc License: MIT

GEOquery downloads and parses data from the NCBI Gene Expression Omnibus — a public repository of high-throughput functional genomics data — into Bioconductor objects, so you can go from a GEO accession to an analysis-ready object in one call.

Capabilities

  • Series, Samples, Platforms, DataSets. Parse any GEO entity (GSE, GSM, GPL, GDS) from either the compact Series Matrix or the full SOFT format.
  • Modern object model. GSE Series Matrix records return SummarizedExperiment objects by default (or ExpressionSet via returnType = "ExpressionSet").
  • RNA-seq. Retrieve NCBI's uniformly-computed RNA-seq quantifications with getRNASeqData().
  • Single-cell. Inventory, group, and load single-cell supplementary data (10x Matrix Market, 10x HDF5, AnnData .h5ad, Seurat .rds) into SingleCellExperiment (or Seurat) objects.
  • Supplementary files. List and download any attached files with getGEOSuppFiles().
  • Search. Query GEO programmatically with searchGEO().
  • Robust downloads. Streaming downloads with retries, an optional persistent BiocFileCache cache, and typed error conditions for tryCatch().

Installation

# from Bioconductor (recommended)
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("GEOquery")

# or the development version from GitHub
BiocManager::install("seandavi/GEOquery")

Quick start

library(GEOquery)

# A GSE via the fast Series Matrix path -> a list of SummarizedExperiment,
# one per platform.
gse <- getGEO("GSE2553")
se <- gse[[1]]
assay(se)      # expression matrix
colData(se)    # sample metadata
rowData(se)    # feature annotation

# Other entity types parse to GEOquery's S4 classes:
getGEO("GSM11805")   # a sample
getGEO("GPL96")      # a platform
getGEO("GDS507")     # a curated dataset

# See what supplementary files a study has, without downloading:
getGEOSuppFiles("GSE63137", fetch_files = FALSE)

Documentation

The package vignette is a quick-start; the in-depth, narrative articles cover the why and the downstream workflows:

Bioconductor landing pages: release · devel

Getting help

Contributing

Contributions are welcome as pull requests or issues. See CONTRIBUTING.md for the development workflow, and follow the Bioconductor coding standards where possible.

Citation

If you use GEOquery, please cite:

Davis S, Meltzer PS. GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007;23(14):1846–1847.

citation("GEOquery")

About

The bridge between the NCBI Gene Expression Omnibus and Bioconductor

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors