Skip to content

A simple python program that manipulates the data pulled from Elsevier Pure to make it easier to upload to Researchfish.

License

Notifications You must be signed in to change notification settings

trulylostheaven/PureToResearchfish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PureToResearchfish

Please note this may not work for your institution so please review what the code does before using it. PureToResearchfish is a Python program designed to streamline the manipulation of data extracted from Elsevier Pure, making it easier to upload to Researchfish. The final excel sheet may require some additional manipulation but this should save off several hours of work.

Requirements

  • Python
  • Excel
  • pandas
  • numpy
  • fuzzywuzzy
  • Excel file containing data from Pure
  • Excel file from Researchfish containing Funder Reference IDs

Features

PuretoResearchfish.py

  1. Delete Duplicates: Removes duplicate rows from the dataset.
  2. Filter by "Funder Project Reference":
    • Removes rows with blank "Funder Project Reference" fields.
    • Splits "specific" rows with more than one "Funder Project Reference" id.
  3. Filter by DOIs and Additional Source IDs:
    • Keeps rows with a DOI OR where Additional Source IDs start with "PubMed:".
    • Removes the "PubMed:" prefix from Additional Source IDs.
  4. Clear Additional Source IDs if DOIs are Present: Clears the "Additional Source IDs" field if a DOI is present.
  5. Filter by "Funder Project Reference": Removes rows with just dates or via institution in "Funder Project Reference"

PureAddFunder

  1. Match Name to Funder Organisation: Adds Funder Organisation while matching to Name.
  2. Add Funder ID: Adds the Funder ID matching to Funder Organisation.

Usage

  1. Clone the repository:
    git clone https://github.com/trulylostheaven/PureToResearchfish.git
    cd PureToResearchfish
  2. Install the required Python packages:
    pip install pandas numpy
    pip install fuzzywuzzy
  3. Run the program:
    python run_program.py
  4. Run program PuretoResearchfish.py (this will run both programs in order)
  5. Select input_file.xlsx, select comparison_xlsx, and select comparison_sheet

How it Works

  1. Delete Duplicates The program utilizes Excel to remove duplicate rows from the dataset. This is run twice, once in the beginning and at the end.
  2. Filter by "Funder Project Reference" Rows with blank "Funder Project Reference" fields are removed using pandas.
  3. Filter by DOIs and Additional Source IDs The program filters rows based on the presence of DOIs (Digital Object Identifiers) or "PubMed:" in Additional Source IDs.
  4. Clear Additional Source IDs if DOIs are Present If a DOI is found in a row, the program clears the corresponding "Additional Source IDs".

About

A simple python program that manipulates the data pulled from Elsevier Pure to make it easier to upload to Researchfish.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages