As a Wikidata advocate, I want to have accurate statistics about the amount and quality of Wikidata’s references when discussing Wikidata with other projects.
Problem:
The wikidata-datamodel-references dashboard currently claims that only some 3.8% of Wikidata references are “Wikimedia” references. While this sounds awesome, I don’t think it can possibly be true, based on my own experience with Wikidata references. Another panel on the same dashboard, meanwhile, names P248 as the most common P143 as the fourth most common property for references, and that property is exclusive to Wikimedia sources nowadays.
A closer look at the code generating these statistics ([MetricsProcessor in analytics/wmde/toolkit-analyzer](https://github.com/wikimedia/analytics-wmde-toolkit-analyzer/blob/master/analyzer/src/main/java/org/wikidata/analyzer/Processor/MetricProcessor.java)) reveals that it uses hard-coded lists of properties and items, which have not been updated for at least three years. This desperately needs to be reworked.
As well as updating the code, we will also have to get a new version deployed.
To do this the build jar file needs to be updated in https://github.com/wikimedia/analytics-wmde-toolkit-analyzer-build
Acceptance criteria:
- The dashboard’s data seems plausible to Wikidata people
Open questions: