Wikidata:Requests for permissions/Bot/EpidòseosBot
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 18:53, 13 June 2024 (UTC)[reply]
EpidòseosBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Epìdosis (talk • contribs • logs)
Task/s: improve incomplete references to Integrated Authority File (Q36578) for sex or gender (P21) and add new references containing GND ID (P227) for sex or gender (P21) when the property has no references containing stated in (P248)
Code: User:EpidòseosBot/GND P21.py
Function details: the bot will run firstly on the results of query1 (presently about 170k items) and then on the results of query2 (presently about 174k items). Query1 finds items with sex or gender (P21) containing an incomplete reference to Integrated Authority File (Q36578) (i.e. a reference to Integrated Authority File (Q36578) missing its most fundamental part, the ID from which the information is taken); the incomplete reference is substituted with a new one containing also GND ID (P227). Query2 finds items with sex or gender (P21) referenced through imported from Wikimedia project (P143) but not through stated in (P248) and adds to them, if possible, a new reference containing GND ID (P227). In both operations the bot removes non-authoritative references, i.e. references containing imported from Wikimedia project (P143) and/or VIAF ID (P214) and/or based on heuristic (P887). No new values of sex or gender (P21) are added. These are 50 example edits. Any suggestion to the code is welcome, of course; it is in fact the first bot I program. --Epìdosis 23:20, 8 June 2024 (UTC)[reply]
- Support I'm also interested in normalizing references, and I see these edits as improvements to the items. I think references should be atomic, so I agree with Query1. I have seen no issues in the example edits. I'm no python expert, but I have two suggestions about the code. First, I think WikidataSPARQLPageGenerator returns a page including deprecated statements, so you need to check for those and ignore them, i.e. the code
gndID = item.claims['P227'][0].getTarget()
andgenderitem = item.claims['P21'][0].getTarget()
could return a deprecated statement. Secondly, if you execute the query without theLIMIT 50
, you might get timeouts. In that case, you need to slice the query. Here is an example query. Difool (talk) 02:35, 12 June 2024 (UTC)[reply]- @Difool: thanks very much for the hints. I have manually checked all P21 deprecated values (https://w.wiki/ANSR and https://w.wiki/ANSi), there were about 30, and I can say there will be no issues; for the deprecated P227 values, I have added a line in both queries
MINUS { ?item p:P227 ?std . ?std wikibase:rank wikibase:DeprecatedRank } .
to avoid them completely, there were a few hundreds. For the slicing of the queries: I was thinking about running with LIMIT 5000 or 10000 so as to escape timeouts, but if I will have issues I will surely try the slicing. Epìdosis 22:40, 12 June 2024 (UTC)[reply]
- @Difool: thanks very much for the hints. I have manually checked all P21 deprecated values (https://w.wiki/ANSR and https://w.wiki/ANSi), there were about 30, and I can say there will be no issues; for the deprecated P227 values, I have added a line in both queries
- Support -- Bargioni 🗣 10:26, 12 June 2024 (UTC)[reply]
- Support --Wüstenspringmaus talk 05:51, 13 June 2024 (UTC)[reply]