Skip to content

tiadams/pmc-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PubMedCentral (PMC) automatic PDF/XML Downloader

Bulk-Downloads full-text PDFs (or any other sources) from PubmedCentral OA articles

How to use

After inputting your desired document identifiers into the file ids.txt, execute the main.sh script. It will do the following:

  1. Download the document index file from PMC
  2. For each id check wether it is available; if it is, download the sources from PMCs ftp server
  3. Extract the archives
  4. Move and rename the pdfs (or any other source type if you want to adapt the copy_rename.sh scripts suffix constant, e.g. to nxml)
  5. Remove the downloaded archive files

Your files will be located in the pdf folder, named as PMC${ID}.${SUFFIX}

Depending on the number of requested ids, this process may take a while and may require several GBs of disk space

About

Bash scripts to bulk download full-text PDFs (or any other soruces) from PubmedCentral OA articles

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages