Skip to content

kacparas/meta_geofetch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

meta-geofetch

meta-geofetch is a bioinformatics semi-automatic pipeline to fetch metadata from Gene Expression Omnibus (GEO) database. This tool is particularly useful when planning a complex analysis with various samples. Especially if:

  • Samples are retreived using a variety of keywords
  • Data needs to be inspected/filtered for sample selection

Usage

python meta_geofetch.py
Enter GEO search query: [your_GEO_query]
Enter organism name (for filtering, optional): [latin organism name]
Enter assay type (for filtering, optional): [i.e. ChIP-seq]
Enter target string in sample_name (for filtering and renaming the metadata directory)

This generates a list of GSEs from given GEO search query and writes it to metadata.txt
metadata.txt is then processed, retrieving metadata of files associated with given GSEs. Metadata then might be filtered using provided filters and concatenated into one file.

Requirements

conda create -f environment.yml
conda activate meta_geofetch

Improved usage

The original geofetch package was designed to run a list of GSEs retrieved using Finder function, which resulted in crashes if a single GSE was faulty. meta-geofetch surpasses this limitation by sequentially calling geofetch for each individual GSE using an integrated bash script.

Citations

https://doi.org/10.1093/bioinformatics/btad069

@article{10.1093/bioinformatics/btad069,
    author = {Khoroshevskyi, Oleksandr and LeRoy, Nathan and Reuter, Vincent P and Sheffield, Nathan C},
    title = "{GEOfetch: a command-line tool for downloading data and standardized metadata from GEO and SRA}",
    journal = {Bioinformatics},
    volume = {39},
    number = {3},
    pages = {btad069},
    year = {2023},
    month = {03},
    abstract = "{The Gene Expression Omnibus has become an important source of biological data for secondary analysis. However, there is no simple, programmatic way to download data and metadata from Gene Expression Omnibus (GEO) in a standardized annotation format.To address this, we present GEOfetch—a command-line tool that downloads and organizes data and metadata from GEO and SRA. GEOfetch formats the downloaded metadata as a Portable Encapsulated Project, providing universal format for the reanalysis of public data.GEOfetch is available on Bioconda and the Python Package Index (PyPI).}",
    issn = {1367-4811},
    doi = {10.1093/bioinformatics/btad069},
    url = {https://doi.org/10.1093/bioinformatics/btad069},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/39/3/btad069/49407404/btad069.pdf},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages