Wikidata:Requests for permissions/Bot/DifoolBot 3
From Wikidata
Jump to navigation
Jump to search
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 18:35, 17 May 2024 (UTC)[reply]
DifoolBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Difool (talk • contribs • logs)
Task/s: fill in empty English/French/German labels and basic statements for persons with a VIAF ID (P214) and a VIAF authority source GND ID (P227), IdRef ID (P269), Bibliothèque nationale de France ID (P268) or Library of Congress authority ID (P244)
Code: at Github
Function details: The bot iterates through:
- persons with a VIAF ID (P214) and a GND ID (P227) and an empty German label,
- persons with a VIAF ID (P214) and a IdRef ID (P269) and an empty French label,
- persons with a VIAF ID (P214) and a Bibliothèque nationale de France ID (P268) and an empty French label,
- persons with a VIAF ID (P214) and a Library of Congress authority ID (P244) and an empty English label.
And:
- Fills an empty English label with the name from the Library of Congress authority ID (P244) page,
- Fills an empty French label with the name from the Bibliothèque nationale de France ID (P268) and IdRef ID (P269) page,
- Fills an empty German label with the name from the GND ID (P227) page,
If the bot determines that the name should be in Eastern name order, then the label is not filled and the item is written to a report for manual checking- CHANGED AFTER FEEDBACK: If the country or the language associated with the person has a non-Latin script, then the label is not added. The country/language is determined by the data in GND/IdRef/BnF/LoC, i.e. associated country, language, and script of variant names. If no country/language is found, then the bot looks at the wikidata page itself and checks country of citizenship (P27), languages spoken, written or signed (P1412) and name in native language (P1559). If still no country/language is found, then the bot looks at the sitelinks. The found country/language is added to the edit summary.
- Adds a sex or gender (P21), date of birth (P569), date of death (P570) statement if the item doesn't have such statement. Adds a reference if the statement exists but doesn't have a reference or only 'weak' references (based on heuristic (P887) or imported from Wikimedia project (P143)). The bot only adds the statement/reference if there is no conflict between the data in the GND/IdRef/BnF/LoC pages. For dates, only the date with the most precision is added. If multiple GND/IdRef/BnF/LoC pages list the same value, only one reference is added.
- Updates the external id value of the GND/IdRef/BnF/LoC statement, if the GND/IdRef/BnF/LoC page has a redirect or is not found.
Example edits that handle redirects of IdRef ID (P269) are here. If approved, I'll run the script every few months or so.
--Difool (talk) 07:00, 9 April 2024 (UTC)[reply]
- Support surely a good enrichment of the items; all the additions are IMHO uncontroversial, and improve the quality of the items; I have seen no issues in the example edits. Thanks very much for this! --Epìdosis 15:00, 15 April 2024 (UTC)[reply]
- Neutral @Difool: The DNB often contains translits for Hebrew or Cyrillic which are invalid according to the German name conventions. Because I'm already wasting so much of my life correcting those I would like to ask you to preliminarily exclude German from this bot job. But I'm very open to contribute scripts to filter and/or autocorrect probably wrong German names.--Tadarrius Bean (talk) 16:24, 15 April 2024 (UTC)[reply]
- @Tadarrius Bean: thanks for the links; I assumed that the DNB and other VIAF authority sources took the author's name from a book leaflet or that they did some other checking, so that the name they provide carries weight. My main goal is to add some info to prevent duplication and to more easily check whether items are duplicates.
- If I understand the links you provided correctly, then it's not possible to automate the transcription of Hebrew names according to the German name conventions; but the transcription of Cyrillic seems possible, for example with https://pypi.org/project/translitua/. I'll investigate that.
- I'll also check if it is possible to determine whether the native language of the name has Cyrillic or Hebrew script, and to add name in native language (P1559). Excluding German altogether is also possible, of course. Difool (talk) 12:59, 16 April 2024 (UTC)[reply]
- Support - PKM (talk) 21:40, 15 April 2024 (UTC)[reply]
- Support Skipping over East Asian names makes sense to me since the Library of Congress transliteration scheme don't always match how individuals romanize their names, and I'm happy to see that the bot will still add the dates from the LC authority record even when it doesn't add in the romanized name. I'd be happy to keep an eye on Japanese names in the report for manual checking. Mcampany (talk) 15:14, 22 April 2024 (UTC)[reply]
- Thanks @Tadarrius Bean, @Mcampany for the feedback: I modified the code so that names are only added if the language of the name has Latin script, see CHANGED AFTER FEEDBACK above. I'll create another 'bot request for permission' to do the other names. Example edits are here. Difool (talk) 06:43, 4 May 2024 (UTC)[reply]