Sarah Montoya SunnkerLocket89

Idaho4 Exhibits Parser

This repository provides a command line helper that automates the task of downloading and organising the public exhibits listed in the Idaho4_exhibits_with_full_metadata.xlsx spreadsheet. The script reads the spreadsheet, downloads the referenced PDF files, and optionally extracts the first N pages of each document into a dedicated folder.

Installation

The parser now works out of the box using only the Python standard library. Optional third-party packages improve performance and unlock extras:

openpyxl – faster workbook loading.
requests – robust HTTP downloads.
tqdm – rich progress bars.
PyMuPDF or PyPDF2 – PDF page extraction.

Install them individually or via the provided requirements.txt file when available:

pip install -r requirements.txt

Usage

python run_idaho4_parser.py \
  --in-file Idaho4_exhibits_with_full_metadata.xlsx \
  --sheet Exhibits_With_Metadata \
  --workers 6 \
  --extract-pages 4

By default the script stores the downloaded PDFs in idaho4_output/downloads and writes a JSON manifest plus a CSV summary to idaho4_output. Downloaded files are prefixed with the zero-padded Excel row number to guarantee unique filenames while keeping the on-disk order aligned with the worksheet. The manifest records whether each row succeeded, was skipped (for example because it did not contain a URL), or failed, and includes the corresponding Excel row number for quick cross-referencing. Re-run the command with --resume to continue from where a previous session stopped without re-downloading files.

Common flags

--url-column – Set the spreadsheet column that contains the PDF URL. When omitted the script attempts to infer a sensible column automatically.
--id-column – Configure the column that uniquely identifies each exhibit. This identifier is used to name the downloaded files.
--out-dir – Choose a different destination directory for all generated artefacts.
--manifest / --csv – Override the default manifest output paths.
--verbose – Enable verbose logging for troubleshooting.

Run python run_idaho4_parser.py --help to see the full list of supported flags.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sarah Montoya SunnkerLocket89

Achievements

Achievements

Block or report SunnkerLocket89

Idaho4 Exhibits Parser

Installation

Usage

Common flags

Pinned Loading

Uh oh!