RPM (Records, Promotion, Music) was a Canadian music publication. In 1964, it began publishing weekly charts of what Canadians were listening to at the national level, based on radio airplay, and from 1964–1988 record sales.
RPM Dataset (RPMD for short) is a project to parse / analyze the RPM charts made available by the BACLAC (Bibliothèque et Archives Canada - Library and Archives Canada) as PDFs in an easy-to-digest format.
Available to download from archive.org: https://archive.org/download/rpm_dataset
rpm_charts_pages.zip(10,534 pages, 840 MB): A zip file containing the individual chart pages as PDFs.- Directory structure is e.g.
1965/1965-02-15 - v1no18 - Top Singles - id7810.pdf - The "Top Singles" charts have had their metadata corrected and normalized for various errors like misprints on the physical magazine, metadata input errors on BACLAC where month and day were swapped, and de-duped for issues accidentally digitized twice.
- Directory structure is e.g.
rpm_charts_pages_metadata.csv(10,638 rows, 2 MB): A CSV file containing an index of all the pages. Also includes rows for pages that were removed for de-duplication. These can be excluded by filtering out rows where the "Notes" column begins with "Duplicate of". Includes the same corrections asrpm_charts_pages.zip.
rpm_charts_pages_metadata_raw.csv(10,637 rows, 2 MB): A CSV file containing metadata about the chart pages raw from BACLAC without any of the above corrections or "Notes" column. Useful if you want to do a diff to see the corrections made.
- ✅ Phase 1: Data dump of the chart PDFs and PDF metadata
- Crawled the digital archives of BACLAC, manually corrected/normalized errors in printing or digitization.
- [...] Phase 2: OCR and/or LLM parsing of the PDFs into CSV format
- In progress:
rpm_charts_songs.csv: A CSV file containing the song data extracted from the charts, including title, artist, and chart position.
- In progress:
- Phase 3: Entity recognition of artists (and possibly songs) to Wikidata entities