Skip to content

cdrini/rpmd

Repository files navigation

RPM Dataset

RPM (Records, Promotion, Music) was a Canadian music publication. In 1964, it began publishing weekly charts of what Canadians were listening to at the national level, based on radio airplay, and from 1964–1988 record sales.

RPM Dataset (RPMD for short) is a project to parse / analyze the RPM charts made available by the BACLAC (Bibliothèque et Archives Canada - Library and Archives Canada) as PDFs in an easy-to-digest format.

Available data

Available to download from archive.org: https://archive.org/download/rpm_dataset

  • rpm_charts_pages.zip (10,534 pages, 840 MB): A zip file containing the individual chart pages as PDFs.
    • Directory structure is e.g. 1965/1965-02-15 - v1no18 - Top Singles - id7810.pdf
    • The "Top Singles" charts have had their metadata corrected and normalized for various errors like misprints on the physical magazine, metadata input errors on BACLAC where month and day were swapped, and de-duped for issues accidentally digitized twice.
  • rpm_charts_pages_metadata.csv (10,638 rows, 2 MB): A CSV file containing an index of all the pages. Also includes rows for pages that were removed for de-duplication. These can be excluded by filtering out rows where the "Notes" column begins with "Duplicate of". Includes the same corrections as rpm_charts_pages.zip.

Other data

  • rpm_charts_pages_metadata_raw.csv (10,637 rows, 2 MB): A CSV file containing metadata about the chart pages raw from BACLAC without any of the above corrections or "Notes" column. Useful if you want to do a diff to see the corrections made.

Planning

  • ✅ Phase 1: Data dump of the chart PDFs and PDF metadata
    • Crawled the digital archives of BACLAC, manually corrected/normalized errors in printing or digitization.
  • [...] Phase 2: OCR and/or LLM parsing of the PDFs into CSV format
    • In progress: rpm_charts_songs.csv: A CSV file containing the song data extracted from the charts, including title, artist, and chart position.
  • Phase 3: Entity recognition of artists (and possibly songs) to Wikidata entities

About

A dataset of RPM music chart data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published