This repository is a mess of code which:
- Takes in the PDF file published by the Australian Department of Health
- Takes in national second dose data from the WA Health vaccination dashboard
- Converts it into machine-readable statistics (JSON and CSV files)
- Publishes data files via GitHub Actions and GitHub Pages
Looking for COVID-19 Case and Test Data? That data is in a separate repository: https://github.com/jxeeno/aust-govt-covid19-stats
Due to changes in the way vaccination data is reported throughout this year, some of the data may not be comparable. This section tries to summarise many of the data issues and reporting changes. It's lengthy -- you have been warned!
Prior to 18 May 2021, statistics about second dose (or number of fully vaccinated people) were not available.
From 18 May 2021 to 1 July 2021, approximate second dose data by state of administration (data values in key format APPROX_<State>_SECOND_DOSE_TOTAL
in all.csv
) is derived from the WA Health Vaccination Dashboard which is updated weekly usually at the start of the week (exact day of week varies).
The percentages are extracted from the WA Health dashboard and multiplied against the ABS 16 and over population data as noted in the WA Health interpretation guide.
State-level second dose totals are not published daily and therefore do not necessarily correlate with daily dose totals. This means that deducting total doses from approx second dose total does not produce an accurate value for first doses.
The APPROX_<State>_SECOND_DOSE_TOTAL
field in all.csv
is deprecated and should not be used any more. From 17 August 2021, these fields will no longer be populated.
From 30 June 2021, the Department of Health started publishing breakdowns of doses by age group and first/second doses. This data is derived from the Australian Immunisation register (AIR) and may not correspond directly with the headline figures which include self-reported figures.
From 1 July 2021, the Department of Health started publishing breakdowns of doses by age group, first/second doses and by state of administration. This data is derived from the Australian Immunisation register (AIR) and may not correspond directly with the headline figures which include self-reported figures.
This data is published separately as air.csv
and air.json
files.
From 28 July 2021, the Department of Health started publishing breakdowns of doses by age group, first/second doses and by state of residence. This data is derived from the Australian Immunisation register (AIR) and may not correspond directly with the headline figures which include self-reported figures.
The switch from reporting state of administration to state of residence resulted in some states reporting a decrease in total number of doses. ACT and NT both reported drops.
Additional data points (doses by state of residence, by age group) were also published and is made available separately as air_residence.csv
and air_residence.json
files.
From 15 August 2021, all data reported by the Department of Health is obtained from the Australian Immunisation Register (AIR). Previously, some statistics were based on self-reported figures from state-run clinics.
This reporting change resulted in negative doses in some states reported in the all.csv
file. This drop is due to the lag between a dose being administered and the record being entered into AIR.
This change resulted meant that the statistics in the all.csv
file prior to 15 Aug 2021 is not directly comparable to the data published on or after 15 Aug 2021.
All data up to this point is based on date of reporting. Department of Health has also begun reporting doses on the date of administration. This data is not available in this repository yet.
Vaccination rates for Aboriginal and Torres Strait Islander peoples (First Nations people) are now included in the dataset. This data was uploaded on 8 September, dating back to 16 August 2021.
Department of Health updates this data on a weekly basis and was included in the daily vaccination data pack since 16 August 2021.
These appear as FIRST_NATIONS_<STATE|TERRITORY|AUS>_<FIRST|SECOND>_DOSE_TOTAL
in all.csv
and all.json
.
Department of Health no longer publishes the first and second dose and visit count breakdowns for aged and disability care.
This means the following fields are now deprecated:
CWTH_AGED_CARE_DOSES_FIRST_DOSE
CWTH_AGED_CARE_DOSES_SECOND_DOSE
CWTH_AGED_CARE_FACILITIES_FIRST_DOSE
CWTH_AGED_CARE_FACILITIES_SECOND_DOSE
Department of Health is now publishing dose data for 12-15 year olds. This data is available in air.csv
as:
AIR_12_15_<FIRST|SECOND>_DOSE_<COUNT|PCT>
AIR_<STATE>_12_15_<FIRST|SECOND>_DOSE_<COUNT|PCT>
All totals are now include 12-15 age groups.
Department of Health is now publishing the number of individuals aged 16+ with 3 doses (or more) of COVID-19 vaccine recorded in AIR. This data is available in air.csv
/air.json
as:
AIR_AUS_16_PLUS_THIRD_DOSE_COUNT
AIR_AUS_16_PLUS_THIRD_DOSE_PCT
Vaccination rates for Aboriginal and Torres Strait Islander peoples (First Nations people) are now also expressed as a percentage of population. This data was uploaded on 25 November, dating back to 10 November 2021.
Department of Health updates this data on a weekly basis and was included in the daily vaccination data pack since 10 November 2021.
These appear as FIRST_NATIONS_<STATE|TERRITORY|AUS>_<FIRST|SECOND>_PCT_TOTAL
in all.csv
and all.json
, in addition to existing FIRST_NATIONS_<STATE|TERRITORY|AUS>_<FIRST|SECOND>_DOSE_TOTAL
which counts the total number of doses.
Department of Health is now publishing the number of individuals aged 18+ with 3 doses (or more) of COVID-19 vaccine recorded in AIR. This data is available in air.csv
/air.json
as:
AIR_<STATE|TERRITORY|AUS>_18_PLUS_THIRD_DOSE_COUNT
AIR_<STATE|TERRITORY|AUS>_18_PLUS_THIRD_DOSE_PCT
Note: the keys from the 7 November 2021 change was 16_PLUS
for AUS. However, in this release, the key has changed to 18_PLUS
in line with booster eligibility criteria. The 16_PLUS
column for AUS is maintained, however, a duplicated 18_PLUS
column for AUS is also available for consistency.
*_16_PLUS_THIRD_DOSE_PCT
uses 16+ population as denominator, even though only 18+ population is eligible. This field should be avoided and is here for backwards compatability purposes.
*_18_PLUS_THIRD_DOSE_PCT
uses 18+ population as denominator. This is current eligible age group for boosters.
Department of Health is now publishing the number of individuals aged 5-11 with at least 1 dose of COVID-19 vaccine recorded in AIR. This data is available in:
air.csv
/air.json
:
AIR_<STATE|TERRITORY|AUS>_5_11_FIRST_DOSE_COUNT
AIR_<STATE|TERRITORY|AUS>_5_11_FIRST_DOSE_PCT
AIR_<STATE|TERRITORY|AUS>_5_11_POPULATION
The following fields have been provisioned for, and will be available when the government publishes the data
AIR_<STATE|TERRITORY|AUS>_5_11_SECOND_DOSE_COUNT
AIR_<STATE|TERRITORY|AUS>_5_11_SECOND_DOSE_PCT
For consistency, the following fields have also been added:
AIR_AUS_12_15_<FIRST|SECOND>_DOSE_<COUNT|PCT>
(same value asAIR_12_15_<FIRST|SECOND>_DOSE_<COUNT|PCT>
)
air_residence.csv
/air_residence.json
:
Where AGE_LOWER
is 5
and AGE_UPPER
is 11
.
Note: Due to changes to the data layout, summary vaccination rates by jurisdiction for 50+ and 70+ are no longer available. I will add middleware in due course to estimate these figures.
Department of Health is now publishing the percentage of eligible individuals who have received a third dose of COVID-19 vaccine by geographical region. This is different from other representation percentage data which is generally to the population. To report against a common metric, we have estimated the actual number of people with at least 3 doses and percentage of population by comparing against second dose counts 12 weeks ago (approx 3 months).
This data is available in:
air_<lga|sa4|sa3>.csv
air_<lga|sa4|sa3>.json
The following fields have been added:
AIR_THIRD_DOSE_ELIGIBLE_PCT
- % of eligible people with at least three doses of vaccine, as provided by DOHAIR_THIRD_DOSE_APPROX_COUNT
- estimated number of people with at least three doses of vaccine (derived by AIR_THIRD_DOSE_ELIGIBLE_PCT * AIR_SECOND_DOSE_APPROX_COUNT from 12 weeks ago)AIR_THIRD_DOSE_PCT
- estimated % of population with at least three doses of vaccine (derived by AIR_THIRD_DOSE_APPROX_COUNT / ABS_ERP_2019_POPULATION)
NOTE: The order of the columns in the CSV has changed to accomodate this additional data
Fixes:
- Resolved an issue with booster by jurisdiction data being missing due to a change of layout
- Fixed LGA matching issue with Nambucca Valley / Nambucca
- Fixed LGA matching issue with DOH merged LGA: Mount Gambier (C) & Grant (DC)
You must attribute the source of the data as Department of Health (all data except second doses by state prior to 1st July 2021) and WA Health (second dose by state data prior to 1st July 2021).
When using this data extract, I'd appreciate it if you attribute data extraction to myself (Ken Tsang) and link to this repository. This will be greatly appreciated, but not required.
Example:
Source: WA Health (second dose by state data prior to 1st July 2021) and Department of Health (all other data); Data extracted by Ken Tsang
The data is also available at the following locations:
Daily dose administration data (and weekly second dose by state data from 18 May 2021 onwards)
- CSV (all): https://vaccinedata.covid19nearme.com.au/data/all.csv
- JSON (all): https://vaccinedata.covid19nearme.com.au/data/all.json
Daily dose breakdown from AIR incl state of administration data (available from 30 June 2021 onwards)
- CSV (AIR - Administration): https://vaccinedata.covid19nearme.com.au/data/air.csv
- JSON (AIR - Administration): https://vaccinedata.covid19nearme.com.au/data/air.json
Daily dose breakdown from AIR incl state of residence data (available from 28 July 2021 onwards)
Note: AIR residence data is normalised differently from the other data. Each row represents a separate day, state and age bucket. Counts are estimated by reverse calculating percentage and ABS Estimated Resident Population.
- CSV (AIR - Residence): https://vaccinedata.covid19nearme.com.au/data/air_residence.csv
- JSON (AIR - Residence): https://vaccinedata.covid19nearme.com.au/data/air_residence.json
Weekly dose distribution data
- CSV (distribution): https://vaccinedata.covid19nearme.com.au/data/distribution.csv
- JSON (distribution): https://vaccinedata.covid19nearme.com.au/data/distribution.json
Index to raw data extracts
- Raw JSON data (index): https://vaccinedata.covid19nearme.com.au/data/publications.json
Geographical vaccination rates are updated weekly.
Vaccination rates by address of residence, grouped by ABS Statistical Area 4.
- CSV (AIR - SA4): https://vaccinedata.covid19nearme.com.au/data/geo/air_sa4.csv
- JSON (AIR - SA4): https://vaccinedata.covid19nearme.com.au/data/geo/air_sa4.json
Vaccination rates by address of residence, grouped by ABS Statistical Area 3. SA3s with less than 500 people aged 15 and over have been excluded.
- CSV (AIR - SA3): https://vaccinedata.covid19nearme.com.au/data/geo/air_sa3.csv
- JSON (AIR - SA3): https://vaccinedata.covid19nearme.com.au/data/geo/air_sa3.json
Vaccination rates by address of residence, grouped by ABS Local Government Areas. LGAs with large ‘very remote’ and ‘remote’ areas where geo-coding addresses difficult are excluded.
- CSV (AIR - LGA): https://vaccinedata.covid19nearme.com.au/data/geo/air_lga.csv
- JSON (AIR - LGA): https://vaccinedata.covid19nearme.com.au/data/geo/air_lga.json
Vaccination rates of the Indigenous population by address of residence, grouped by ABS Statistical Area 4.
Note: ABS_ERP_2019_POPULATION
represents the general estimated resident population for the SA4. AIR_INDIGENOUS_POPULATION
is provided as an estimate of the Indigenous population in the SA4 based on records in the Australian Immunisation Register / Medicare. Percentages are calculated using AIR_INDIGENOUS_POPULATION
as denominator.
- CSV (AIR - SA4 Indigenous): https://vaccinedata.covid19nearme.com.au/data/geo/air_sa4_indigenous.csv
- JSON (AIR - SA4 Indigenous): https://vaccinedata.covid19nearme.com.au/data/geo/air_sa4_indigenous.json
Vaccination rates by address of residence, grouped by ABS Postal Areas (POA). POAs with significant population change since the 2016 census are excluded as it is not possible to accurately provide vaccination rates.
Vaccination rates are expressed as percent ranges in 5% increments.
This data is obtained from https://www.coronavirus.vic.gov.au/weekly-covid-19-vaccine-data
- CSV (AIR - VIC POA): https://vicvaxdata.covid19nearme.com.au/data/vic_poa.csv
- JSON (AIR - VIC POA): https://vicvaxdata.covid19nearme.com.au/data/vic_poa.json
This is the legacy SA4 feed for backwards compatability. The data contained in this file is the same as the new SA4 data feed, however, the column names are slightly as SA4-specific terminology has been removed. This legacy feed will continue to be updated. There is no need to switch to the new feed if you've already integrated against the legacy one.
- CSV (AIR - SA4): https://vaccinedata.covid19nearme.com.au/data/air_sa4.csv
- JSON (AIR - SA4): https://vaccinedata.covid19nearme.com.au/data/air_sa4.json
Important note about data quality: This data is provided as-is. I'm not guaranteeing the timeliness or accuracy of any data provided above. Some basic validation steps are present (i.e. we test the data and see if the totals add up to expected values and if there are empty data values), but no manual checks are conducted. Use at your own risk.
The data files above are usually updated daily. GitHub Actions is configured to scrape and extract data from the Department of Health website every 5 minutes and published via GitHub Pages. The data is also available via this git repo in under docs/data
.
Documentation for these data files will come in due course.
You can use the =IMPORTDATA()
formula:
=IMPORTDATA("https://vaccinedata.covid19nearme.com.au/data/all.csv")
You can also run this code yourself. You'll need:
- Yarn (or NPM) to install JS dependencies
- Node (not sure what version but I'm running v12.x)
git clone https://github.com/jxeeno/aust-govt-covid19-vaccine-pdf.git
cd aust-govt-covid19-vaccine-pdf
yarn # or: npm install
# prior to 16 Aug 2021
node index.js "https://www.health.gov.au/sites/default/files/documents/2021/04/covid-19-vaccine-rollout-update-19-april-2021.pdf"
# from 16 Aug 2021 onwards
node index.js "https://www.health.gov.au/sites/default/files/documents/2021/08/covid-19-vaccine-rollout-update-19-august-2021.pdf" "https://www.health.gov.au/sites/default/files/documents/2021/08/covid-19-vaccine-rollout-update-jurisdictional-breakdown-19-august-2021.pdf"
Because for some reason, our Health department reckons the best way to provide statistical data is in a PDF file generated from Microsoft PowerPoint.
This data should be available in machine-readable formats for transparency and to enable ease of access.
Yeah, that's probably going to happen. Every time the Health department decides to add some new disclaimers or tweak the layout/wording a little, this thing will break.
You can try and fix it and submit a PR. Or raise an issue and I'll have a look at it.
Yeah, it's spaghetti code because it's basically disposable code. I expect to need to rewrite this every few days.
Having said that, you're welcome to raise a PR if you want to make it better! :)
Department of Health occasionally updates their historical vaccine statistics to correct incorrect data. See below for summary of changes:
Date | Description of changes |
---|---|
2021-05-13 | NSW total revised from 266,514 (+10,321) to 264,135 (+7,942) |
2021-05-13 | National total revised from 2,980,644 (+85,874) to 2,978,265 (+83,495) |
2021-05-14 | National 24 hour difference revised from +76,153 to +78,532 |
2021-05-14 | NSW 24 hour difference revised from +8,237 to +10,616 |