This script processes a list of UUIDs from an Excel file, fetches research outputs from the Pure API, extracts and cleans DOI values, and sends them back to the API. It ensures that DOIs are correctly formatted by removing unwanted prefixes and validating the structure.
- Fetches research output data using UUIDs.
- Extracts and corrects DOIs from the
electronicVersionsfield. - Ensures DOI validity using regex validation.
- Logs all actions (corrections, updates, errors) to
doi_correction.log. - Includes a progress bar for better visibility.
- Implements a rate-limiting delay to prevent excessive API calls.
- Python 3.7+
- Required libraries:
pip install requests pandas tqdm openpyxl
- API key with read and write access to the Pure API.
- Excel file containing UUIDs with a column named
UUID. Can be easily generated using the reporting module in Pure.
- Set the API base URL in
API_BASE_URL. - Provide a valid API key in
API_KEY. - Ensure the Excel file path is correct in
pd.read_excel("mylist.xlsx").
python doicorrector.py- The script will display a progress bar while processing.
- Check
doi_correction.logfor detailed logs.
- Reads the list of UUIDs from the Excel file.
- Queries the Pure API for research output metadata.
- Extracts and cleans DOIs from
electronicVersions. - Validates DOI format before updating.
- Sends updated data back to the API (only if changes were made).
- Waits 1 second between API requests to prevent throttling.
- Removes prefixes:
doi:,DOI:,https://doi.org/,DOI. - Ensures the DOI matches the standard format:
10.xxxx/.... - Skips invalid DOI formats and logs a warning.
2025-03-11 14:32:01 - INFO - Corrected DOI for UUID b17aad70-9c2d-11db-8ed6-000ea68e967b: doi:10.4203/ccp.76.56 → 10.4203/ccp.76.56
2025-03-11 14:32:02 - INFO - Successfully updated UUID b17aad70-9c2d-11db-8ed6-000ea68e967b
2025-03-11 14:32:05 - WARNING - Invalid DOI format detected: 10.xxx/invalid_doi
2025-03-11 14:32:07 - ERROR - Failed to fetch UUID abc123: 404 Not Found
- Change the delay: Modify
time.sleep(1)if the API allows faster/slower requests.
This script is released under the MIT License. Contributions are welcome!