User:Waldyrious

From Wikidata
Jump to navigation Jump to search
Babel user information
kea-N Es uzuáriu se língua maternu e kriolu.
pt-5 Este utilizador tem um nível profissional de português.
en-4 This user has near native speaker knowledge of English.
pt-BR-3 Este usuário pode contribuir com um nível avançado de português brasileiro.
es-2 Este usuario tiene un conocimiento intermedio del español.
gl-2 Este usuario ten un coñecemento intermedio de galego.
ȹThis user uses quickpresets, with custom settings.
Users by language

About me: Waldir@meta.wikimedia

Work in progress

[edit]

To do

[edit]

Recurring tasks

[edit]
  • Clean up Repology's Problems in Wikidata report
  • Clean up the tricky items in Mix'n'match Exoplanets list

Assorted todo

[edit]

Geography

[edit]

(Moved to Workflowy.)

Cape Verde

[edit]

Lexemes (kea dictionary)

[edit]
  • Books I own
    • Dicionário Caboverdiano—Português (Manuel Veiga)
    • Léxico do dialecto crioulo do Arquipélago de Cabo Verde (Armando Napoleão Rodrigues Fernandes)
  • Google Sheets file
  • Stats in the Ordia tool
  • Tools to work with lexemes (To experiment with)
  • TODO: Create Wikidata:Lexicographical data/Documentation/Languages/kea, documenting the intended structure for kea lexemes, showing examples of lexemes for various word classes, etc.
    • Examples
    • Contents
      • Guidelines: what language to use for "spelling variant"; how to model dialect variations; which properties to set in the senses, which grammatical features to set in the forms, etc.
      • Instructions for how to create lexemes, add pronunciation, etc.
  • TODO: set up completion dashboards for lexemes
    • Senses: translations, glosses in en/pt/kea, item for this sense, etc.
    • Forms: badiu & sampadjudu, IPA transcription, pronunciation audio
    • Perhaps a simplified interface (as a toolforge tool?), similar to how ninjawords.com did for Wiktionary. Example image:
  • Automatic list of kea lexemes:
    SELECT ?item ?lexemeLabel WHERE {
        ?item a ontolex:LexicalEntry ;
                dct:language wd:Q35963 ; 
                wikibase:lemma ?lexemeLabel .
      }
    
    Try it!

TEDxPraia

[edit]

Goal: Add items for the TEDxPraia event (tedxpraia.com, TED event ID 19377) and talks

Scripts / gadgets

[edit]
  • Auto-fill item titles (labels) with corresponding language's article title (using the same algorithm as link piping to cut out parentheticals, etc.)
  • Suggest content for unfilled descriptions, with:
    • First sentence of corresponding language's article
    • Automatic translation of description in other languages, in the order defined by translatewiki's fallback chain (should be accessible through API), ultimately falling back to English
  • Highlight (bolden) the label for the current interface language, or move it to the top

Musical chords

[edit]

Goal: model musical chords in Wikidata.

Software data

[edit]
  • Repology
    • See Comment by Repology's maintainer
    • See discussion in the property talk page
    • Automated report of outdated software versions in Wikidata
      • TODO: to connect this with one of the software version updater tools
    • Automated reports of packages missing in Wikidata that are in other repos: Arch, DistroWatch, etc.
      • TODO: convert these into a Mix'n'match catalog. Ideally weighted/filtered by number of (unrelated) repos?

Unix distro manifests

[edit]

I.e. the set of packages that come pre-installed with (specific versions of) Unix-like operating systems (distros)

Listeria

[edit]

Try replacing the table at pt:Prémio Camões#Premiados with Listeria, based on a query like this: https://w.wiki/LJB

Lexemes

[edit]

See also Useful stuff § Lexemes below, for general information about these.

Fonts and characters

[edit]

Useful stuff

[edit]

Assorted useful stuff

[edit]
  • languages to skip on wikidata game: zh,ja,ru,uk,hu,ko,pl,tr,et,el,ar,bg,vi
  • languages to prefer on wikidata game: pt,gl,es,it,ro,fr
  • Useful templates
    • link to an item given one of its sitelinks: {{Item}} (there's no option for "item by label", since multiple items with the same label can't be automatically disambiguated)
    • link to an item given its ID: {{LinkedLabel|Q1}}Universe
    • statement (full or partial semantic triple): {{statement||P31|Q5}}instance of (P31)human (Q5)
    • graphical statement display, emulating Wikidata's interface: {{Statement+| P={{P-|27}} |V={{Q-|145}} }}
      • Use qp/qv to add a qualifier property and value, and rp/rv to add a reference property and value, as demonstrated here.
    • block query: {{SPARQL|query = ...}}
    • inline query: {{SPARQL Inline|label = foobar|query = ...}}foobar (query)
  • Wikidata Diff: compare two entries (example: Portugal vs. Cape Verde)
    • Doesn't work with more than two entries :(
    • Perhaps there should be a way to filter out external identifiers (which by definition are going to be all different anyway)
  • Narrowing down search results: <translate> To search for Wikidata items by their title on a given site, use Special:ItemByTitle.</translate>
  • Cradle
    • Forms to create Wikidata items by filling all the relevant properties given the type of entity
    • Presets are listed at Wikidata:Cradle (like the query examples, this is a shared list that everyone uses)
    • Similar to QuickSettings, but has two major differences
      • It is used to create new items, rather than edit existing items
      • There's no way to create custom presets, so the full shared list is a little noisy, with e.g. multiple forms for the same kind of entity depending on what properties one wants to fill in
    • Feedback
      • It should be possible to create multiple items at once, e.g. a book as a work, its edition(s), and its author(s)
      • It should be possible to load custom presets, not just those from Wikidata:Cradle
      • The presets should allow a description field besides just the name
      • The presents should allow usage of the {{P}} template to make the page more readable
      • Optional fields should be collapsed under a "more" section, to make the forms less overwhelming
      • The + icons should stay in place, rather than be to the left of the - icon when multiple items are added
      • The form should not be fillable if one's not logged in, since logging in loses all input 😱
  • According to Special:MyLanguageFallbackChain, the languages that appear in item pages are determined by the contents of the {{#babel}} box in the userpage.

Data model

[edit]

Modeling specific topics

[edit]
EntitySchemas
[edit]
Model items
[edit]
Property constraints
[edit]
  • Property constraints are scoped to a property, so they have to be broad/generic, and can't offer nuanced recommendations depending the type of item (e.g. a different constraint for a human and a country)
  • Entity Schemas and model items, on the other hand, pertain to items as a whole, so they provide restrictions/recommendations as cohesive sets, and are more and won't apply to items that don't match that schema).
  • Both property constraints and EntitySchemas are amenable to mechanical application/validation, whereas Model items require manual lookup and application.
  • Property constraints are much easier to edit and are well integrated with the Wikidata interface than EntitySchemas, though that could change in the future
WikiProjects
[edit]
  • Many Wikiprojects have lists of properties that can be applies to items in the subject areas they cover. They can also have explicit recommendations about which properties are mandatory, recommended or ancillary.
  • TODO: Add some examples
Recoin
[edit]

OpenRefine

[edit]

TABernacle

[edit]

SPARQL queries

[edit]

Query building interfaces

[edit]

Notes:

  • Neither VizQuery nor Wikidata Query Builder allow combining conditions with OR
  • Neither VizQuery nor Wikidata Query Builder allow specifying non-property conditions (number of sitelinks, label/description, ...)

See also User:SpinachBot (more info below)

REST endpoint

[edit]

Documentation

[edit]
Introduction / general reference
[edit]
Prefixes
[edit]
  • General reference
    • Prefixes (wd:, wdt:, etc.) are used to qualify elements of a query (operators and operands) depending on their type (e.g. item, property, value, etc.)
    • About prefixes
    • Full list
  • List of prefixes
    • wd = Wikidata entity (e.g. ___)
      • wds = Wikidata statement (e.g. ___)
      • wdv = Wikidata value (e.g. ___)
      • wdt = Wikidata property (equivalent to p + ps as shown below)
        • "p: links to a statement node which has various things (the main value, rank, qualifiers, etc), you'll need to use p:P123/ps:P123 to get the value. wdt: is simpler to use if it does what you want (return just the main value of preferred rank statements if they exist, otherwise of normal rank ones), but for anything else it has to be p: and ps:" (Nikki in the Wikidata group on Telegram, 21 June 2023)
    • p = a property statement (e.g. ?item p:P123 ?prop.)
      • ps = prop/statement/ — the value of a property statement (e.g. ?item p:P123 ?prop. ?prop ps:P123 ?propValue.)
        • psv = prop/statement/value/ — the numeric value of a property as written in the statement (i.e. disregarding the unit)
        • psn = prop/statement/value-normalized/ — the numeric value of a property, normalized to the base unit of the measured quantity.
      • pq = prop/qualifier/ — a qualifier for a property statement (e.g. ?item p:P123 ?prop. ?prop pq:P456 ?propQualifier.)
        • pqv = prop/qualifier/value/ — ?
      • pr = prop/reference/ — ?
        • prv = prop/reference/value/ — ?
    • TODO: add examples above where missing
Query syntax cheatsheet
[edit]

Condensed/edited from the excellent —but awfully verbose— Wikidata:SPARQL tutorial)

  • The core structure of any query is a semantic triple (subject, predicate, object).
    The "predicate" represents the relationship between subject and object, so I'll call it "relation" to make this clearer:
    • ?subject wdt:relation wd:object.
  • The object of one triple can be the subject of another triple, which allows building more complex queries:
    • ?nephew wdt:child ?father. ?father wdt:brother wd:uncle.
    • There are also two shorthands for this:
      • ?nephew wdt:child/wdt:brother wd:uncle. — using the path separator character / to chain predicates together, creating a "property path" from the subject to the object.
      • ?nephew wdt:child [ wdt:brother wd:uncle ]. — using [] to nest a partial triple, where the omitted part is the missing piece in the outer triple.
  • Use , to append another object to the previous triple, reusing both the subject and the predicate:
    • ?subject wdt:relation wd:object1, wd:object2.
  • Use ; to append a predicate-object to the previous triple's subject:
    • ?subject wdt:relation1 wd:object1;
      wdt:relation2 wd:object2.
  • Predicates can be combined using regex-like syntax:
    • Use the regex-like quantifiers *, + and ? to represent how many times a predicate appears in the query:
      • ?descendant wdt:child+ ?ancestor.
    • The two constructs above are commonly used to specify the notion "instance of X or of any subclass of X":
      • ?subject wdt:P31/wdt:P279* ?object.
    • As in regex, () groups expressions.
    • As in regex, | means OR:
      • ?itemA wdt:relation1|wdt:relation2 wd:itemB.
      • Note that this is not an OR for entire triples, but for parts thereof!
      • Parenthesis may be needed to mark the limits of the OR expression: ?itemA (wdt:prop1|wdt:prop2)/wdt:prop3 wd:itemB.
        • Note: for OR-ing entire "sentences", wrap them in curly brackets and use the UNION operator between the two blocks.
  • Sorting results
    • Add ORDER BY ?fooBar after the closing } of the SELECT statement
  • Negative assertions
    • MINUS { ?item wdt:P3999 ?closure_date }
    • FILTER NOT EXISTS { ?item wdt:P3999 ?closure_date }
    • Both of the above work... not sure if one is preferable over the other.
    • Remove specific items
      • FILTER(?item != wd:Q12345) for a single item
      • FILTER(?item NOT IN (wd:Q123,wd:Q456,wd:Q789)) for multiple items
  • Optional assertions
    • OPTIONAL { ?city wdt:P1082 ?population. }
  • More useful info: https://www.slideshare.net/LeeFeigenbaum/sparql-cheat-sheet
    • UNION / MINUS (slide 8)
    • literal values (strings, numbers, ...)
    • comparison operators (!, &&, ||, <, =, !=, ...)
    • more predicate path operators (^, !, ...)
    • underspecified triples (e.g. two or even 3 variables)
  • Wikidata-specific helpers
    • label and description
      • Include SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
      • For a variable ?foo representing an item, that automatically binds its label to ?fooLabel, and its description to ?fooDescription
      • To add custom names for the label variables (other than ?fooLabel), use rdfs:label:
        SERVICE wikibase:label {
          bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
          ?variableName rdfs:label ?customLabel .
        }
    • Associated Wikipedia article ("sitelink")
    • lexeme representations (lemmas)
      • ?item wikibase:lemma ?lemma.
      • Filter lemmas by plain string matching: FILTER (str(?lemma) = "foobar")
      • Filter lemmas by regex matching: FILTER (regex(?lemma, '^foobar$'))

Example queries

[edit]

Written works

[edit]

books, scholarly/academic/scientific publications (papers, reports, theses/dissertations), etc.

Items of personal interest

[edit]
Authors
[edit]
Publications
[edit]

Books

[edit]
[edit]
Creating new books
[edit]
  • Check the Wikidata section below for properties
  • Ideally there should be an interface to create, at once, all items related to a book (the work, edition and author items), all with the appropriate connections, with a good autocomplete system, and able to import public data given e.g. an ISBN.
  • For now, we can use the Cradle forms for book (work), book (edition) and author
    • There's a strict book edition preset, but requires authors to exist first (it doesn't accept author name string (P2093), nor empty author fields); the flexible preset, linked above, accepts both, and also has a field for Google Books ID.
Data models
[edit]
  • Re-thinking Open Library’s Book Pages
    • Explains very well the intricacies of distinguishing between a book as a work with multiple representations, a specific edition of a book, physical instances of books, etc.
    • Slightly edited quote: To simplify the experience for readers, we released a new type of Book Page which combines the affordances of the Work and Edition Pages into a single view. Two pages become one: By default, the Book Page attempts to automatically feature the “best” (previewable, available) edition of a book and places an editions table front-and-center to enable readers to quickly switch which edition is selected.
  • Wikidata's model is based on this
  • From What is FRBR? A conceptual model for the bibliographic universe, quoted in b:Introduction to Library and Information Science/Information Organization#Cataloging:
    • When we say the word book in everyday language, we may actually mean several things.
    • For example, when we say book to describe a physical object that has paper pages and a binding and can sometimes be used to prop open a door or hold up a table leg, FRBR calls this an "item."
    • When we say book we also may mean a "publication" as when we go to a bookstore to purchase a book. We may know its ISBN but the particular copy does not matter as long as it's in good condition and not missing pages. FRBR calls this a "manifestation."
    • When we say book as in "who translated that book," we may have a particular text in mind and a specific language. FRBR calls this an "expression."
      • This is the most confusing aspect to me. It may help to think of different versions of a work by the same author as it evolves in time (draft/manuscript, first edition, revised edition, etc.) as different expressions of the work; and similarly, a translation is a separate expression, authored by the translator. But, as hinted in v:pt:Transcrição digital#Definição formal, a transcription (e.g. from a manuscript to a printed book) transforms one manifestation in another, but preserves the expression.
    • When we say book as in "who wrote that book," we could mean a higher level of abstraction, the conceptual content that underlies all of the linguistic versions, the story being told in the book, the ideas in a person's head for the book. FRBR calls this a "work."
  • Diagram:
  • Meant to replace MARC
  • Diagram:
  • (MARC, UNIMARC, MARC 21...
  • Diagram:

    (Probably a simplified diagram would be helpful, containing just the work/expression/manifestation/item chain, plus the agent box (with an indication that this could be a single person or a collective.)
  • Glossary
  • Getting started — describes the data model composed of the following elements:
    • Author
    • Work
    • Edition
    • Edition group (e.g. paperback, hardcover, e-book)
    • Publisher
    • Series (sequential grouping of works)
  • Allows creating private or public collections
    • In this sense, it's similar to Inventaire, although the latter tries to rely more on Wikidata

QuickStatements reference

[edit]
  • Help:QuickStatements
  • QuickStatements v1 (deprecated)
  • QuickStatements v2 (recommended)
    • Can import commands in the v1 format
    • The CSV format is pretty straightforward (and actually easier to author than v1):
      • cells are comma-separated instead of tab-separated
      • there's a header row, which allows avoiding the repetition of the same prefixes in every row (thus halving the number of fields per row)
      • Queries in the query service can be tweaked to produce near-ready QuickStatement commands; see e.g. these steps to remove the "NAME" prefix from the label of exoplanets).
    • The batch mode ("run in background") doesn't seem too reliable; I got some errors, but then wasn't able to see what they were
      • Update Nov 2021: still getting some "no API success flag set" errors. Better just stick to the sequential mode.

Examples

[edit]

Example 1RescueTime (Q34637733): software version identifier (P348) = "2.12.5.1503"; publication date (P577) = +2017-06-09T00:00:00Z/11; platform (P400) = Microsoft Windows (Q1406); version type (P548) = stable version (Q2804309) (others listed here); reference URL (P854) = "https://www.rescuetime.com/updates/win_release_notes.html"; title (P1476) = "RescueTime for Windows Release Notes" (English).

Q34637733	P348	"2.12.5.1503"	P577	+2017-06-09T00:00:00Z/11	P400	Q1406	P548	Q2804309	S854	"https://www.rescuetime.com/updates/win_release_notes.html"	S1476	en:"RescueTime for Windows Release Notes"

Simplified template:

<item>	P348	"<version number>"	P577	+<date>T00:00:00Z/11	S854	"<url>"	S1476	en:"<title>"

Observations:

  • Note how source (reference) properties must be provided using the nonstandard "S" prefix — so "S854" instead of "P854".
  • Note that the whitespace characters are tabs, not spaces
  • Note that timestamps must have zero time
  • Note that the reference title requires a language specifier, here indicated by the en: prefix.

Example 2 → (TODO: human-readable translation)

CREATE
LAST	Len	"Buying Lumber"
LAST	Den	"song from the sountrack of the 2000 game The Sims"
LAST	P361	Q7764364	P1545	"4"	P2047	306U11574
LAST	P31	Q217199
LAST	P86	Q943225
CREATE
LAST	Len	"Mall Rat"
LAST	Den	"song from the sountrack of the 2000 game The Sims"
LAST	P361	Q7764364	P1545	"5"	P2047	164U11574
LAST	P31	Q217199
LAST	P86	Q943225

Observations:

  • Note the usage of CREATE and LAST directives, since we're creating new items, rather than adding statements to an existing item
  • Note the Len and Den, for the English label and description
  • Note now each line can only contain a single statement triplet, but a given statement (e.g. part of (P361)) can have any number of properties/qualifiers.
  • Note now the duration (P2047) is provided as seconds which are marked U11574, when in reality the item is second (Q11574).
  • Note how the number for series ordinal (P1545) is provided as a string, even though a plain number should work as a quantity, according to the docs ("unit is optional")

Example 3 → (TODO: human-readable translation)

Q2986828	P348	"CLDR 30.0.1"	S854	"http://cldr.unicode.org/index/downloads/cldr-30#TOC-CLDR-30.0.1-Maintenance-Release"	S1476	en:"CLDR 30 Release Note"	S958	"CLDR 30.0.1 Maintenance Release"

Example 4 → (TODO: human-readable translation)

Q839063	P1324	"http://git.savannah.gnu.org/cgit/oddmuse.git/"	P8423	Q186055

Example 4 → Add software versions, release dates and reference URLs

qid,P348,qal577,S854
Q109462071,"""v0.0.3""",+2020-05-17T00:00:00Z/11,"""https://github.com/bigskysoftware/htmx/releases/tag/v0.0.3"""
Q109462071,"""v0.0.4""",+2020-05-26T00:00:00Z/11,"""https://github.com/bigskysoftware/htmx/releases/tag/v0.0.4"""
Q109462071,"""v0.0.5""",+2020-06-19T00:00:00Z/11,"""https://github.com/bigskysoftware/htmx/releases/tag/v0.0.5"""

Observations:

  • Note the usage of triple quotes for string values
  • Note the same clunky v1 syntax for dates
  • Other than that, this is actually quite an improvement: less repetition, and no reliance on tabs

Lexemes

[edit]

See also To do § Lexemes above, for tasks related to lexemes.

FAQ

[edit]

TODO: Create a quickstart / FAQ / examples page in Wikidata:Lexicographical data. See also Wikidata:Lexicographical data/Glossary (which isn't linked from the main page, for some reason)

  • What are lexemes?
    • words, phrases/expressions, prefixes, acronyms, etc.
  • Wikidata vs. Wiktionary
    • User:Rua/Wikidata for Wiktionarians
    • In Wiktionary each page contains all homographs of a word, with sections for each language, and subsections for each lexical category (verb, noun, etc.)
    • In Wikidata, each Lexeme page contains the homographs that share the same spelling+language+grammatical class (verb, noun, etc.)
      • The same Wikidata Lexeme page groups the different forms in the same word — e.g. "houses" is represented as a form in the house (noun) lexeme
      • Words that are spelled the same but belong to different languages are placed in different Lexeme pages. These homographs in other languages can be connected via homograph lexeme (P5402)
      • Additionally, there can be the same word spelled in alternative ways (e.g. loiça/louça). These can be connected via alternative form (P8530) (or probably synonym (P5973), for those that aren't similar, e.g. cruzeta/cabide)
  • Lexemes (L...) vs. items (Q...)
    • A lexeme has statements that describe the word
    • An item has statements that describe the concept
    • noun-type lexemes link to items via the item for this sense (P5137) property; verb-type lexemes link to items via the predicate for (P9970) property
  • Senses and forms
  • Spelling variations
    • A lexeme can have different representations in different spelling variants (e.g. color vs. colour in en-us and en-gb).
    • These spelling variants need to have an official language code assigned, so thinks like Sampadjudu (Q2217638) and Sal Creole (Q18707467) can't be used.
    • A poor man's spelling variant can be done with separate lexemes connected via alternative form (P8530) (in the Forms section of the lexeme)

Tools

[edit]

Data model

[edit]
  • Lexeme
    1. Top level
      1. Lemma
        • e.g. "run"
      2. Language
      3. Lexical category
      4. Statements (properties of the lexeme that are not specific to a Form or Sense). E.g. derived from, region, period, homonym, etc.
    2. Forms (i.e. inflections)
      • string representations of variants per gender, number, conjugation, etc.
      • one for each combination, tagged with the relevant qualifiers/properties (e.g. 2nd person, singular, past tense...)
      • Representation
      • Grammatical features
      • e.g. (TODO)
    3. Senses
      • string representations different meanings
      • link to items for the actual concepts (see "lexemes vs. items" above)
      • e.g. the lexeme "bank" (English noun) would have the senses "financial institution" and "edge of a body of water"
      • Gloss

Diagrams:

Also:

Problems

[edit]
  • The creation form shows a "language variant" field when the entered language is not recognized.
    • See Help:Monolingual text languages and the tracking ticket phab:T144272.
    • For some reason the list of languages is restricted to the one approved by the Language committee for new Wikipedias, rather than e.g. the full list of languages from CLDR
    • To request a language to be supported, a new Phabricator task needs to be created in the same model as one of the child tasks of the one linked above
    • As a workaround, new lexemes can still be created by using the mis as described in the help page linked above.
    • For Kabuverdianu, the code is actually already available/linked (since phab:T127435), but the extra field was appearing nonetheless in the creation form; possibly that was due to a "no value" value for the language code, which was just removed in this edit.
      • TODO: if the problem persists, maybe a new Phabricator issue needs to be created.
      • Update Jun 2021: the problem still occurs — may be related to phab:T284870?

Question-answering

[edit]

Tools:

See also:

Benchmarks:

Maybe there should be a completeness/coverage/parity dashboard, similar to w:Wikipedia:WikiProject Missing encyclopedic articles, that maps how much of the Wolfram Language can be modeled in terms of properties/qualifiers.

Mini-bios

[edit]

Thanks to the Wikidata Game, it will be possible to move quickly to a state where we Wikidata will have all the information needed to build automated mini-bios in the form

<label> (<place of birth, <date of birth> — <place of death>, <date of death>) was a <country of citizenship> <occupation> who <description>.

In fact, the description field for people in Wikidata should probably forgo occupation and nationality, and go straight to their claim to notability, since the former are redundant with the corresponding fields.

This proposal was originally posted here.

[edit]