Skip to content

Allow for total (t) CLI functionality to be ran using a Wikidata lexemes dump #520

@andrewtavis

Description

@andrewtavis

Terms

Description

This issue will be the first issue to add dump processing functionality to the Scribe-Data CLI. In it, we'll do the following:

  • We'll add in the --wikidata-dump (-wd) argument to the total command
  • If the user passes this argument, the Add check_lexeme_dump_prompt_download function to cli/utils.py #518 functionality will be passed to make sure that a dump is available or download one
  • From there, the functionality of the total command will be ran over the dump rather than the via the Wikidata query service
    • This functionality will be added into a file src/scribe_data/wikidata/parse_dump.py and called from the CLI

Before starting, we should map out the best way to process the dump, with a specific question being whether we need to uncompress the dump or whether we can work directly from the compressed .json.bz2 file.

Contribution

@axif0 will be working on this as a part of Outreachy! 📶🚤

Metadata

Metadata

Assignees

Labels

OutreachyAvailable for Outreachy participantsfeatureNew feature or requesthelp wantedExtra attention is needed

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions