Jump to content

WikiCite/Shared Citations

From Meta, a Wikimedia project coordination wiki

This is a proposal for the Wikimedia Foundation to create a database of Wikimedia citation records; and associated improvements to cross-wiki monitoring and editing. These two pillars would empower community-managed workflows and tools to:

Make citations easier for the editor,
more useful for the reader,
and more efficient for our architecture.

Overview

[edit]
Slides of this proposal,presented at WikidataCon 2021 (Etherpad)

Problem statement

[edit]

Citations are the core of verifiability, anti-disinformation and knowledge integrity in our movement. Ensuring Knowledge equity extends to ensuring representative diversity of references – forms, languages, and author demographics.
Our Verifiability policy has become the backbone of the reliable web. Wikimedia’s citations are one of our greatest assets. However, because of being stored as raw “inline” text in each content page, references are also one of our biggest burdens.
Our references are high in maintenance, technical complexity, and duplication of effort. This results in knowledge gaps and biases that are difficult to quantify and address.
This burden of reference creation and maintenance is shouldered by repetitive, manual, volunteer effort which is disproportionately felt by smaller communities.
In order to reach 2030 goals of eliminating systematic knowledge and contributor demographic gaps, better systems for understanding, monitoring, and redressing these gaps are required.

Citations are a simple, critical interconnection mechanism for all modern knowledge in the digital, Internet-connected world... Arguably the most important ingredient of open knowledge, sources and references have ironically received little technical attention in the Wikimedia movement up until now.

Not only is there duplication of effort, but massive duplication of the content itself. According to March 2018 research, every source used in Wikipedia is cited approximately 3.5 times,[1] and the most reused reference appeared across all Wikipedias more than 2.8 million times.[2] It is impossible to determined these numbers definitively precisely because of a lack of consistent or centralised citation management. At 450+ characters per citation, this results in at least 1 GB of HTML to refer to a single publication—hand curated by volunteers, often with semi-automated tools. Moreover, due to small stylistic variations and dataentry errors, there are probably many more instances of this one "single" citation unaccounted for. Wikimedia Commons was created in 2004 to host media files for all Wikipedias rather than having to reupload the same file to each language to reuse it; and yet in 2020 we still reinsert the same metadata every time we wish to reuse a footnote in a different page.

Background

[edit]
Wikimedians have been thinking about how to handle citations for many years... – WikiCite Conf (2017)

Many proposals have been made for centralising and simplifying reference management in Wikimedia projects, since at least as early as 2005. In 2022 “reusing references” was the most popular request in Wikimedia Deutschland’s technical wishes poll. There are also various methods currently in use which try to reduce both the visual complexity of the sourcecode, and the workload of curating frequently reused references. These include:

  • Specific source templates – many Wikipedias. Template pages for hosting the citation details of a specific source, inserted directly into an article.
  • Cite Q template – several Wikipedias and other projects. A citation template which, when inserted, calls upon a nominated Wikidata item to display the source metadata.
  • References namespace – French Wikipedia. A specific namespace where all notable published editions of popular works and authors are collated for easy comparison.

With the launch of Wikidata as a technical solution for representing and managing structured data in a single, centralised knowledge base, the WikiCite initiative was launched in 2016. Supported by the WMF, several Wikimedia affiliates, technical and philanthropic organisations, the initiative's aim was to foster a community of people and ecosystem of projects to support workflows, tools, datamodels, and best practices that seed a universal repository of open bibliographic data of citable sources within Wikidata. Aside from engaging a broad community of volunteers, tools developers, library and GLAM professionals, and linked data experts, the WikiCite initiative succeeded at creating a very large corpus of well-modelled items about sources in Wikidata, including 36+ million items about scholarly articles and 26+ million items with a DOI.[3] It also showcased the power of running large scale queries and analyses on this bibliographic corpus using the Wikidata Query Service with services such as Scholia.

However, the practical use of the WikiCite corpus as a resource for supporting citation management in Wikimedia projects (including Wikidata itself) has been hindered by multiple factors including, but not limited to:

  • the difficulty of mapping individual sources used as references in Wikimedia projects' pages to the corresponding Wikidata items about their works, authors, publishers, main subject.
  • the lack of tools to transparently monitor how changes made in Wikidata propagate to other wikis (for example when metadata used in Wikidata are used to populate reference templates in a Wikipedia article).
  • the lack of solutions to integrate Wikidata with tools like Citoid, which enormously reduce barriers to create new references in Wikipedia articles.
  • the undesirability of storing the sheer volume of sources that are used in Wikipedia and sister projects as individual Wikidata items. For example, tens of millions of items about individual URLs and newspaper articles.
  • the lack of a good definition of "completeness", making the project very open-ended and its scope unclear.

While the WikiCite initiative has demonstrated the value of creating and managing a rich database of sources in Wikidata, through the development of countless tools and applications that leverage its data or facilitate its curation, Wikidata's practical use to support the daily needs of volunteer editors working on references in Wikimedia projects has not been realized to date. What has been missing so far is a bridging layer that enables managing references in individual Wikimedia projects, while connecting them to the rich, structured information about these sources, authors, publishers, main subjects etc. that exists in Wikidata.

Shared Citations

[edit]

Shared Citations is a proposed new database, aiming to centralise the hosting and metadata-management of individual references used in any Wikimedia project as structured data "records". Each Wikimedia project could then call upon these records and, according to its own citation style preferences, display them to their readers.

By centralising, structuring, and sharing the content, the following can be achieved:

  • much duplication of content curation effort can be reduced
  • many workflows which improve knowledge integrity can benefit all sister projects simultaneously
  • new processes and research can be undertaken to improve increase our understanding of Wikimedia references; and
  • the architecture of how we store and update reference information can be made much more efficient.

Additionally, this proposed database would enable linking entities mentioned in its records (authors, publishers, main subject, journals, etc.) to the corresponding Wikidata items, making cross-language reconciliation and the analysis of sources across projects a practical possibility. This would enable significant research (both internal, and academic) to be undertaken for the first time.

Examples of how a Shared Citation record might work

[edit]
A citation record of a book edition

Of note in the Book example:

  • References to different editions of the same work could be linked to each other.
  • References to different sub-sections of the same work (e.g. Page/pagerange/Chapter) can be combined into one Shared Citation record, providing faceted search.
  • That multiple references to the same citation record can be sorted and grouped in many ways. By extension, different content management workflows can be created.
A citation record of a scholarly-journal article.

Of note in journal article example:

  • The difference between an author's name (as described on the publication itself), the Wikidata item for that author, and any given structured format (F.Last/First Last/Last, M. First) that we might like to display to our readers.
  • The ability to store and update the various ways which a source might be accessed by different users (e.g. for free, or those with academic institutional logins).
  • Easy and precise access to version(s) of a citation record can be maintained.
  • The ability to know a source's status (e.g. preprint, peer review published, self-published, retracted). By extension, to treat the reference differently as a result.
  • Traceability of where in use, or former, instances of references to a citation record are used in Wikimedia projects.
A citation record of a newspaper article.

Of note in newspaper article example:

  • The capacity to identify frequently reused text strings (author, publisher...) which have no Wikidata item as creation-candidates, yet still be able to host citations which refer to them.
  • Different citations which refer to the same author/publisher/location/publication date (etc.) across Wikimedia project can be searched and filtered.
  • That Wikimedia editing campaigns which worked with any given citation could be identified.
A hypothetical implementation of a reference template in Source editor mode.

Of note in reference template example:

  • The reference's content and format for the reader is independent of the data. The display format could be determined by the choice of citation template, or a parameter in that template.
  • Quick and easy access to where the citation record "lives" is provided in any editing environment (see #Principles below)
  • The ability to have locally hosted "freetext" fields which could include contextual information not stored in the central citation record. Equally, the ability to "suppress" centrally hosted fields and replace with local data.
  • [Not shown] the ability to search, create new, and modify existing references should be able to be incorporated into editing workflows such as Citoid (see #Principles below)

Many other undescribed and unforeseen use-cases, metadata fields, and possibilities are certain to exist. Please describe them on the talkpage!

Scale

[edit]

With regard to the expected scale of the Shared Citations database: estimates vary greatly of the potential "total" number of de-duplicated references which exist across all Wikimedia projects and have the potential to be turned into Shared Citation records. A 2020 dataset extracted 29m references from English Wikipedia "to books, journal articles or Web contents",[4] while the Internet Archive tracks 23m archive-URLs across 46 Wikipedias.[5] These datapoints indicate that the scale should be in the same order of magnitude as Wikidata itself, which currently hosts 90m items. Moreover, these datasets overlap, and neither includes Wikidata's references in scope. Being able to know just how many separate references exist across Wikimedia projects would be a benefit of the Shared Citations project itself.

Relationship to Movement Strategy

[edit]

Knowledge Equity

Real time visibility into our citation graph allow us to help marginalized languages, communities and subject specialists to curate their reliable sources for easy reuse. Also, it will dramatically reduce the citation management workload, especially for smaller communities. The challenge is: How might we identify citation gaps or imbalances, while not supercharging existing inequities?

  • “This will help us achieve epistemological decolonisation” João Peschanski
  • “If libraries could look at global Wikimedia citations, it would help break a self-reinforcing cycle of certain sources’ popularity in library holdings” Phoebe Ayers
  • “Standardized templates and references benefit products designed for emerging markets, which need high interoperability of content formfactor, language, and projects.” Runa Bhattacharjee
  • “Having systematic insight into where our knowledge comes from will help us to diagnose what kinds of sources, languages, voices are missing.” Ben Vershbow

Knowledge as a Service

Shared Citations would underpin the reference ecosystem inside and around Wikimedia by making relatively raw data into semantic knowledge. The challenge is: How might reusers take advantage of a language-agnostic citation graph?

  • “Commons makes it easy to use the same image on different wikis. But to copy a citation is very hard. Everything is manual and slow.” Amir Aharoni
  • “Abstract Wikipedia articles will be far more useful if references are formatted from structured data instead of plaintext.” Denny Vrandečić
  • “Imagine being able to recommend useful footnotes to editors, readers.” Sam Walton
  • “This adds integrity to our citations - “a fortified citation layer.” Chris Albon

Individual strategic initiatives

1: Systematic approach to improve satisfaction and productivity [Increase the Sustainability of Our Movement]

9: Community engagement around product design and UX [Improve User Experience]

11: Resources for newcomers [Improve User Experience]

14: Cross-project tool development and reuse [Improve User Experience]

29: Enhance communication and collaboration capacity with partners and collaborators [Coordinate Across Stakeholders]

36: Misinformation [Identify Topics for Impact]

37: Bridging content gaps [Identify Topics for Impact]

38: Content initiatives in underrepresented communities [Identify Topics for Impact]

40: Policies for experimentation with projects for knowledge equity [Innovate in Free Knowledge]

41: Continuous experimentation, technology, and partnerships for content, formats, and devices [Innovate in Free Knowledge]

Use cases

[edit]
Knowledge creators Knowledge users
As a Wikimedian editor, I would like to
  • Quickly and easily add references to my article by reusing references from another language Wikipedia
  • Benefit from the formatting done by others, so I can spend my time on research and writing, not on templates
  • Train new users to add footnotes in one session, in a way they can continue without intense support
  • Create ‘redlists’ for Wikidata of authors and publications which are frequently cited but don’t have a Wikidata item
  • Have my work on disambiguating authors on Wikidata cascade through to their references on other Wikiprojects
As a reader of Wikimedia projects, I would like to
  • Understand if I can trust the citation I am reading
  • Be suggested other topics in Wikipedia which also reference this same author or book
  • See dictionary definitions which prioritise usage examples published in my country’s vernacular
  • Generate a list of primary, or newspaper, or local sources used on this topic, for my highschool homework
As a library, I would like to
  • Track that Wikimedia external links to our collection from Wikimedia are well maintained
  • Notice which books on our key subjects are cited and ensure our library has holdings
As a content patroller, I would like to
  • Track all citations to an instance of disinformation or misinformation, or a retracted publication, across any project.[6]
  • See when someone is adding the same link across many different projects in quick succession, and revert
  • Identify and track any citations to predatory journals
As a technology company, I would like to
  • Be able to answer the question “says who?” when a customer asks me to verify a fact just given to them
  • Train my algorithm to show more reliable sources for languages and topics where I have limited other data
As a professional writer, I would like to
  • Have academic ‘impact factor’ reports include Wikipedia citations to my scholarly work[7]
  • Check when Wikidata is referencing my work to ensure its findings are accurately represented
  • Be notified when a Wikipedia article cites my journalism and be able to share it on social media

As a researcher, I would like to

  • Be able to extract parts of the citation corpus and analyse it, without massive pre-processing
  • Know how many references are behind paywalls, how much they cost, and if there are alternatives
  • See how many references are about a language, culture, or place but were published from outside it
  • Assess the demographics of authors cited in any given language or topic area, and changes over time

By way of illustration, the answers to the following questions could be useful to any or all of the above use-case groups. We currently lack the ability to answer basic questions about our own citation corpus without heavy one-time research investments. Questions such as:

  • How many times is the New York Times cited as a source across all Wikimedia projects? Is its newer content cited more frequently than older content, and in which languages?
  • Which Wikipedia articles in all languages reference a source in Russian, that the Russian community has flagged as unreliable?
  • How many sources in languages other than Vietnamese are cited in Vietnamese Wikipedia? Which sources in Vietnamese are cited by a Wikimedia project, but not in Vietnamese Wikipedia itself?
  • How often are works by Naomi Klein used as references? Which chapters, of which editions, of which books, are her most cited?
  • Which articles exclusively cite paywalled scholarly articles, or articles which have been since retracted?
  • Other? <Add more here>

Without the ability to answer these kinds of questions, it is difficult to ensure accuracy, consistency, and currency of our citation corpus. It is also near impossible to investigate our citation corpus for biases.

Principles

[edit]

Comment on this section

Database

[edit]

Combined, these principles are what makes the Shared Citations database a "service" project and not a new "sister" project.

  1. Pragmatic scope. Records in the citation database must have been used as a reference for a statement in any Wikimedia project page. It is not a place to compile completed sets of citation corpora (also known as "stamp collecting"[8]) or an attempt at a universal a "bibliographic commons". Its existence makes no impact upon Wikidata's own discussions of scope [see also #Relationship to Wikidata].
  2. Create upon use. Citation records are created by editors, at the moment that they are being used for a reference on a Wikimedia project. The citation database is not pre-populated with records that are not [yet] used as references in Wikimedia project page. There is no "create new citation record" workflow in the Shared Citation database itself – This is in direct contrast to other Wikimedia projects (e.g. Wikidata's "create new item" or Wikimedia Commons' "upload file" workflows). In effect, new records are created by being "pushed" to the Shared Citations database from one of the sister projects.
  3. Non-deprecation. Existing referencing systems remain. The existence of the Shared Citations database being "enabled" to be used on a wiki does not prejudice the continued use and addition, removal, or modification of ‘traditional’ references. For example, an editor can add two new references to a Wikivoyage page—one from Shared Citation database, and one in the 'traditional' way. Policy and technology allows for both to coexist on the same page. Equally, if another editor wishes to convert them to both be of either style, they may do so – in accordance with all editorial and behavioural policies on that Wiki. This is analogous to how each Wiki determines its own multimedia "local uploads" policy. The many edge-cases of what can be used as a reference means many might not even be possible to make into Shared Citation records – this diversity of reference content should be encouraged not stifled.
  4. Enabled upon readiness. Access to the Shared Citations database from a wiki should not be enabled before that wiki's workflows, policies, and templates are generally ready to use the new system. Access to the shared citation database is not enabled before a community has received reasonable support and time to be ready to benefit from and manage it. This means that enabling the system in a Wiki would not be subject to a fixed rollout-timeline, but rather to whether that community felt prepared. Nonetheless, this principle should not be construed to mean that all templates need to be converted, or all local administrators have to agree, before the system can be enabled. Because of the "non-deprecation" principle, no existing workflows would break when the new system is enabled.
  5. Style independence. Local wikis determine the display format of a reference for the reader. [See also: the en.wp editorial guideline "variation in citation methods" (CITEVAR)] Shared citations provide the content of a reference but do not determine the form, which can be structured by local tools and templates. For example, a shared citation of the type "scholarly article" might have a default display format for author names as "Last, First", but it must be possible to programatically alter it to be displayed as "F. Last". This display format could be set by: an adjustable parameter in each individual reference, a template covering a whole page, the fixed formatting of the chosen template, standardised across the entire wiki, or even as a user preference (or any combination of these, at the discretion of the local community).
  6. Editorial independence. Local wikis determine their content standards. [See also: the en.wp editorial guideline "reliable sources" and "perennial sources"]. The existence of a shared citation in the database does not prejudice another project’s policies, and vice versa. For example, a shared citation exists, having been created from a reference in the Romulan Wikipedia which is using a website that the Vulcan Wikipedia has 'banned' as an unreliable source. The Vulcan Wikipedia is under no obligation to use the footnote merely because it is used by the Romulans, but equally the Romulans are not required to delete the citation merely because the Vulcans have banned it.

Finally, the general principles listed in “MediaWiki:Principles” apply.

Cross-wiki integration

[edit]

Community feedback from #Precedents & related projects indicate two broad areas needing improvement before structured data integration efforts across wikis would be more widely embraced. These are exemplified in the template-deletion debates on English Wikipedia for: "Cite DoI" (2015) and "Cite Q" (2017)

In all editing environments, editors must be able to:

  1. Monitor changes which affect how content is displayed to readers here, even when it was edited over there. This monitoring integration must be Granular, Arbitrary, and Cascading.
    • Granular means that watchlist must be able to show every change that affects what is shown to a reader of that page—no more, no less. This will avoid watchlists being "flooded" with irrelevant changes. Also applying this principle to the Page History might be possible, though it is a separate engineering task.
    • Arbitrary means the citation information being changed and displayed on the client wiki (e.g. Wikipedia) might come from any Wikidata item referenced in the citation.
    • And Cascading means a dependency tracking system must propagate notifications through any and all affected items across sites (Wikipedia article, Shared Citation record, Wikidata item), regardless of the origin of the change T253026.
  2. Access the content which will be shown to readers here, even though it is hosted over there. This access takes three forms – to jump to to where the citation is stored, to inspect the content in situ, to edit the citation in situ. Given that references are an important part of any Wikimedia page's content, even when the text content of a reference is actually being hosted in a separate database, an editor of a page must be able to easily view [and text-search] all the content which will be displayed in the saved page whilst in editing mode. The nature of that access would differ depending on the editing environment [including mobile and screen-readers] and could be implemented in various ways. Nonetheless, in all editing environments, it must be made easy to 'jump' directly to where the reference's content is stored in the shared citation database, where it can be edited. Furthermore, and depending on the technical capabilities of the editing environment, an editor should be able to edit a 'shared citation' reference (including adding new ones) without leaving the page.

Finally, the general principles listed in "Risker's checklist for content-creation extensions" apply.

MediaWiki Development

[edit]

Comment on this section

These two specific areas of community feedback relating to cross-wiki integration require MediaWiki development to improve the cross-wiki integration of Monitoring, and also the cross-wiki integration of Editing. The status quo can be understood as incomplete ecosystem integration. The "minimum viable product" needs of this work are:

Monitoring

[edit]

Relevant subscribed watchlists will:

  • update when a change to any entry in the citations database affects how a page will display to a reader. e.g. The ISBN of a Shared Citation record is changed
  • update when any change to a Wikidata item which is used in a citation database entry, affects how a page will display to a reader. e.g. the label of an author is changed in Wikidata in the same language as where the Shared Citation record is used in Wikipedia.
  • update when an editor changes information in the same shared citation in use on a page in another Wikimedia project, when it affects how a page will display to a reader the watchlists and recent changes relating to both articles is updated. e.g. When an editor on Klingon Wikipedia adds a publication date to a reference which is stored a shared citation that is also used in an article on the Vulcan Wikipedia, the watchlists of people who monitor either article are updated.
  • not update when any change to the above would have no affect on how a page will display to a reader. e.g. 1: if a new field/property is added to the citation record, but that field is not displayed in the reference. e.g. 2: if a label is changed but in another language
  • indicate to the user the origin and nature of the change for easy tracking. e.g. including edit summary, #tags or equivalent flags, the username, the sisterproject where the edit originated, and that the change was propagated "via shared citations database".

Work on this issue, or close variation on it, is already underway within Wikidata and would need to be coordinated so as to not duplicate effort T90435, T191831. This monitoring functionality, once built, should be applicable to other cross-wiki use cases. e.g. 1: If a Wikimedia Commons file is deleted or replaced [T91192].

Above and beyond watchlist monitoring, these updates might also be applicable to the relevant page history. While they serve a similar monitoring purpose, page history is constructed independently of watchlists and would require separate engineering effort. Further research is required.

Editing

[edit]
A screenshot of the Citoid system in Visual Editor. As far as possible, editor-interaction with a "shared citation" within their wikis should utilise existing services and workflows.

The minimum viable product (MVP) for actions across the various editing environments should be as follows. The user interface and workflow of each action would be dependent on the editing environment.

All editing environments

[Including in the 2010 Wikitext editor]

  • Inspect an existing citation. View the content of what is shown to the reader and easily access the page where that data is stored and is editable. Implementations could include popup on click, appear on hover, dropdown from the toolbar, display as a group below the editable area, etc.
  • Jump to the Shared Citation record in the database. Simply and easily access the page in the shared citations database where the specific citation is actually hosted, in order to edit there – without interrupting the original editing workflow. Implementations could include a link to open a new tab, a popup...

Mobile editor

All of the above, and also:

  • Find. When using the browser's inbuilt text-search system (ctrl+F) while in editing model, be able to search for text strings which would appear to the reader of the page (e.g. "Hugo, Victor") even though that sting is hosted on the Shared Citation database and only the Shared Citation record's ID number is embedded in the article's wikitext.
  • Create
    • New. This ought to be integrated with the current "create a new reference" Citoid workflow. Adding all the relevant metadata to a newly created record in the Shared Citations database. Requires searching the database before creating a new record to check that it doesn't already exist. This is expected to be a primary means by which new references are created in the shared citation database and is therefore a critical workflow.
    • Existing. Creating a new reference in the page to something which already has a citation record in the Shared Citations database. Essentially a widget that searches that database for the URL/ISBN/DoI/etc of the intended source, selecting the correct response and inserting that Shared Citation database ID directly.
  • Reuse
    • Repeat. Making another reference in the same page to something which is already referenced on that page. Essentially equivalent to existing workflow that utilises <refname>, but working on Shared Citation references.
    • New subsection. To be able to repeat an existing reference but to a different subsection - e.g. edition, page, chapter...
  • Modify existing reference. Equivalent to existing workflow, but also working on Shared Citation references.
    • Connect citation record fields about concepts to equivalent items in Wikidata (e.g. author, publisher, main subject), when they exist. This workflow also applies equally to the aforementioned "create" workflows. [See examples in the wireframes in the #Shared Citations section above].

Visual Editor & Source Editor

All of the above, and also:

  • Convert an existing reference to a Shared Citations reference. A wholly new workflow, which allows a user to "push" an existing "traditional" reference's content it to the Shared Citations database, to create a new record over there, and then bring that new record's ID back to the page to be saved. This is expected to be a primary means by which new references are created in the shared citation database and is therefore a critical workflow.

Read mode [optional]

All of the above. As Wikidata Bridge is demonstrating that it is also possible to consider the "read" mode as an editing environment in its own right – where users can click upon an icon inline and, via a popup widget, edit structured data fields. This occurs without needing to click the "edit" button at all. This editing mode could also be implemented and could could be used as a method to achieve some of the features described above if it is not feasible to achieve in a particular editing environment (notably with the 2010 Editor).

WikiBase

All of the above. Wikidata, among all sister projects, uses a unique editing environment – one that does not group the fields of a reference together in a citation template. Furthermore, Citoid is not available in Wikidata (yet) T199197. As mentioned in #Examples of the status quo above, reference fields in Wikidata are edited independently even when two references' information is identical, on the same property, on the same item. A unique UX would need to be developed for the creation of new Shared Citation records within Wikidata, for their visualisation by readers, and for the conversion of existing references to Shared Citations.

Bots & tools

The existence of means of mass-editing in Wikimedia projects by community-designed and operated bots must be accounted for and supported as best as possible. Notably, supporting new workflows of automated- and semi-automated conversion of existing references (by community owned/operated tools and bots), to move the metadata of references to the Shared Citations database and replacing it with a Shared Citation record ID [in accordance with local bot-flag editing permissions]. Equally, linking or annotating a Shared Citation record with its associated Wikidata items (including authors, publisher, main subject etc.) is another candidate for automated or semiautomated tools – as demonstrated by the existence of several bots and tools already developed by the community for this purpose in Wikidata.

Considerations

[edit]

Comment on this section

Workflows

[edit]

The following topics are a non-exhaustive list of workflows and processes in the Shared Citations database which need to be created and owned by the community to ensure the operation of a new database which hosts content which serves all Wikimedia projects.

  • Administrator, Bot and other requests for permission process. Equivalent to all other wiki projects [See also Wikidata's various "Request for permission" processes]
  • Accessibility A11y. Need to ensure that screenreader software can still access the content of a Shared Citation reference in an article page. Would need to ensure that it is at least as accessible as current locally-hosted Specific source templates. Other potential future software development work to help build a shared template schema will facilitate applying high accessibility standards across all wikis.
  • Blocking, banning, oversighting and related moderation workflows, policies, and associated user-rights. [See also Wikidata's "User access levels"]
  • Creation. All new records would be "pushed" from the sister projects, either through newly created references or converting of existing references, via the #Editing environments describe above. There would be no "create new record" workflow – either manually or mass-edit – in the Shared Citations database itself, as described in the #Principles section.
  • Deletion and under what circumstances. If a record is being used in any Wikimedia project that would make it automatically valid for retention. But what happens when a citation is no longer used in any Wikimedia page? See also "Spam" in the #Open questions, below.
  • Policy and Administration. As a separate project, the contents of the Shared Citations database would be subject to local wiki policies and administrative control of the editing community on the new site, not those of any individual Wikipedia (or other sister project). Workflows and systems would need to be created for appropriate propagation of edits to Shared Citations records where the local Wikipedia (etc.) community uses differing editing restrictions/page protections. [See also Mismatched page/item protection state]
  • Editorial dispute management. Given that a Shared Citation record can be used simultaneously on several wikis, and can also be edited from within those wikis, any changes will propagate back out to all the places where the citation is used (and with associated updates to local watchlists). Editorial disagreements would be fewer – since the scope of the database is purely the citation metadata, not the facts in the work itself – but not entirely absent. Wikimedians will still need to be able to easily identify the talkpage (or equivalent) of where to discuss the best way to structure that reference.
  • Merging. Presumably a workflow that can only happen when working in the database itself (not an action performable while editing an individual reference from within a sister project page. Equivalent workflow and outcome to the process of merging Wikidata items - requiring redirecting the duplicate item number (and never reusing it), concatenating the metadata. Gadgets and tools can be built to facilitate the process. Bots on wikis could have redirected item cleanup as a task.
  • Property creation/deletion proposal and approval process; and associated user rights [See also Wikidata's "Property creator" user right]
  • Identify commonly reused strings (authors, publishers) to create items-for-creation worklists for Wikidata. [See also English Wikipedia's "lists of redlinks" projects to counter systemic-bias]
  • Protecting or semi-protecting records, or individual fields within them which have high usage or are easily misused. [See also Wikidata's "Highly used items" page protection policy].
  • Schemas for defining the relevant and necessary properties (and ontology) for different types of sources which can be used as citation records. E.g. All types of sources should have some kind of "publisher" field, but the "ISBN" field is only relevant to some types of sources. [See also Wikidata's Project Schemas/Shape Expressions project]
  • Other? <Add more here>

Relationship to Wikidata

[edit]

In Wikidata there is a vibrant community, extensive data modelling, and a corpus of existing citation content (for example, Wikidata's "WikiProject Source MetaData" and WikiCite more generally). It has been often suggested that Wikidata should include within its scope either a universal citation database, or a database of all Wikimedia citations. [See also the WikiCite "Roadmap" discussions and the #Background section, above] However, as "Shared Citations" is proposed to also support the references workflow in Wikidata, it is more accurately seen as a service project which sits "behind" and "among" the Wikimedia sister project. The Shared Citations database is different from, and non-competing with, Wikidata in the following ways:

  • Scale: A separate database means that all Wikimedia citations can “fit”. E.g. the 10s of millions of citations to specific URLs.
  • Service: By serving all Wikimedia projects, Wikidata itself can equally be a beneficiary of the shared citations for references used in its own statements.
  • Scope #1: Restricting to only Wikimedia citations ensures the ontology remains practical.
  • Scope #2: Restricting to only creation-upon-need ensures the community understands the difference from WD.
  • Sovereignty: Wikidata and the Shared Citations database would be editorially independent. Some content overlap, but would be overwhelmingly limited to individual scholarly journal articles, some book editions. WD does not often have items about URLs, newspaper articles etc. Content policies would nonetheless need to adapt.
Comparison of Shared Citation and Wikidata scope and content
Shared Citations database Wikidata
Service project Sister project
Records about specific individual publications:

Scholarly publications, URLs, editions of books, newspaper articles, archival records...

Items about concepts:

Authors, publishers, newspapers, websites, main subjects....

Only that which is cited as a reference in a Wikimedia project (including Wikidata).

Only created upon their being used in a Wikimedia project, individually or semi-automated.

Any works.

Created in advance, en masse, for any use case.

Example Shared Citation record types:
  • Metamorphoses. Ovid. Translated by A. D. Melville; introduction and notes by E. J. Kenney.
    Oxford: Oxford University Press. 2008. ISBN 978-0-19-953737-2.
  • https://www.theguardian.com/technology/2018/mar/13/youtube-wikipedia-flag-conspiracy
  • Mérimée: IA68002588
Examples Wikidata Items:
Estimated size: ~50-100m records Current size: ~90m items
Publications and citations in Wikidata over time, from Wikicite.org/statistics There is an important relationship between the Shared Citations and Wikidata.

Due to its unique data format among the Wikimedia sister projects, the interaction of this new references format with the Wikidata Query Service would need to be considered so the content could still be queried, perhaps with federated search.

Unlike on the other Wikimedia sister projects, Wikidata references are all structured data and are all in the same format, and there are no reference templates. It is therefore theoretically possible that all Wikidata references could be converted to becoming Shared Citation records. It would then be possible to enforce that all new references be made in the new format and the old format deprecated. However, while technically possible, these are both editorial decisions which would rest entirely with the Wikidata community as per #Principles, above.

Integration with other Wikimedia software

[edit]

The following is a non-exhaustive list of other pieces of software in the Wikimedia ecosystem with which Shared Citations would need to be either integrated with, or at the very least, be aware of its impact upon them.

WMF-developed

[edit]
  • APIs and third party reuse features (such as the EventStream for "page-links-change") would need to be aware of new workflows for the insertion and deletion of references in Wikimedia projects. The dataset (live-changes, or archive-dump) could be particularly useful for academic research purposes.
  • Cite extension – the system which manages the <ref> and <references /> tags.
  • Citoid. The primary tool for the creation and modification of references in Wikis in the Visual Editor and Source editor modes. As per the non-deprecation #Principle the capability to insert traditional references should be retained. As described above in the #Editing section, integrating Shared Citations with Citoid would allow the most seamless user experience. For example, creation of a new "shared citation" in Citoid would require the user to paste a URL, DOI, or ISBN into the wizard (as usual) but instead of going directly to the web to look for information to scrape, Citoid would first check the Shared Citations database to see if that reference has already been used elsewhere and therefore is in the database. If so, it would collect the citation ID number and insert that into the Wiki page's code. If that record did not exist in the Shared Citation database yet, Citoid would create it, with the information that it would normally have saved directly into the wiki page, and then bring the Citation ID back and save.
The ContentTranslation tool
  • ContentTranslation (CX). An important tool for creating new Wikipedia articles in smaller languages and a common workflow for training events. Currently, the tool has to "guess" what the relevant equivalent citation template is when converting from one language to the other – resulting in lost data (and lower data quality) in the conversion process. If both source and target wikis have Shared Citations enabled it will become a trivial process.
  • The Wikipedia Library. TWL allows Wikimedians to have privileged access to otherwise close-access academic databases. The existence of Shared Citations would permit new features such as "Recommended citations" (based on equivalent articles in other languages) and a field for proxied URL access from the database (see wireframe examples above).
  • Visual Editor – the most feature-rich editing environment and the assumed primary means of creating and editing Shared Citation references.
  • Wikidata Bridge – currently under development by Wikimedia Deutschland. A project to enable Wikidata editing from within Wikipedia infoboxes.
  • Other? <Add more here>

Community-developed

[edit]
  • Bots (e.g. User:Citation bot) which undertake pre-approved, autonomous, editing "cleaning" processes, often relating to standardising template usage.
  • Cite Q. A citation template that enables relevant Wikidata items to be used as references in Wikipedias in locally defined formats.
  • Citation Styles (e.g. CS1). Not software per se, but important code-consistency efforts to ensure that a Shared Citation record renders to the reader appropriately.
  • Global Templates – current proposal to centralise the creation and hosting of templates used across many wikis.
  • Tools Various semi-automated editing aid tools which focus on improving or adding references in Wikimedia projects e.g. AutoWikiBrowser and many others listed at English Wikipedia; Hay's directory. A new database focusing on structured data will cause new tools to be developed, and this must be accounted for and supported. It can be assumed that tools for the semi-automated "conversion" of many 'traditional' references to Shared Citations records will become a critical workflow for the growth of the Shared Citations corpus and wider uptake on sister projects.
  • WikEd. A text editor extension providing advanced editing tools (including content-folding of citations) for the 2010 Wikitext editor.
  • Semantic Cite is a MediaWiki extension providing a simple way of organizing citation resources with the help of semantic annotations.
  • Other? <Add more here>

Open questions

[edit]

The following is a non-exhaustive list of decisions that need to be taken by the community as the project gets built.

  • Spam management. A service that centrally hosts content visible to reusers (e.g. search engines) creates a new vector for spam link promotion. Centralising citations allows easier identification and followup of poor quality links from common domains. Being a "service project" (rather than a public-facing "sister project") there is minimal direct audience and hopefully reduced spammer motivation and the use of "Nofollow" tags decreases SEO spam. Nonetheless, spammers will undoubtedly find innovative ways to stress the system. A potential way to mitigate the risk would be to have a "speedy delete" policy for newly created Shared Citation database record where the associated and sole reference on the originating wiki was itself removed within a short period of time.
  • Cross-wiki "Perennial sources" management. Smaller language communities often don't have the community scale for local moderation projects of external source quality so centralising the reference management would logically lead towards centralised monitoring of source quality. What kinds of properties or processes might need to be invented to support that (without overriding local wiki editorial autonomy)? For example, some sources may be allowed only for particular kinds of topics (e.g. sport but not politics), some wikis might restrict them to being added by users with extended permissions, some may be banned on particular wikis, some may be banned for all wikis....
  • To which namespaces on the sister project wikis should the Shared Citation system be associated? Any reference—even on policy pages, talkpages? userpages? Related to the aforementioned spam management, one option is to only create Shared Citation records when the reference is used in the main namespace of a wiki (to avoid the potential of a spammer creating a user-subpage filled with external URLs simply to create them in the database). However, this solution would hinder the creation of legitimate userspace drafts.
  • The relationship of the Shared Citation database's properties to those in Wikidata. Is the localisation L10n of property names for all languages is most appropriately done in Wikidata (and then import the properties via "federation"), or should it be done in the Shared Citations database itself? It would benefit from shared effort to use Wikidata's properties, but it would also mean that some properties only needed for Shared Citations' use case would be created in Wikidata: Would the WD community accept that?
  • Referencing different subsections (e.g. different pages, chapters, editions) of the same citation in a single article. How should the system handle different references in a page which are to the same Shared Citations record but different subsections. The Shared Citation record ought to be able to group all references to different subsections onto the one record (see wireframe examples above). How this would integrate with existing editor behaviour and reader expectations would need careful consideration to ensure that use cases and nuance are not lost (see the "reuse" workflow described above).
  • Re-using the same reference in a page. The functionality of the <ref name> feature – which allows a single reference to be called upon multiple times in one page – would need to be replicated. Equally, the ability to refer to a different page of a reference already in use in that page (see the "Reuse" workflow described above).
  • Relationship to Wikisource pages. Wikisource's transcribed documents rarely have citations in them, and when they do linking to the Shared Citations record should not alter the esoteric way that citation is displayed (as it should be an accurate transcription). Moreover, these transcribed documents often are the document being used in a reference elsewhere. Each Wikisource document should have a matching Wikidata item but it should not automatically have a Shared Citations database record until that document is being used for a reference in a Wikimedia project. When it is, that Shared Citation reference in a Wikipedia article to a document which also exists in Wikisource should be connected together seamlessly. What is the best way to do that - perhaps with a dedicated "equivalent"?
  • Relationship to Wikidata items about works which are also used for Wikimedia references. There are a large number of Wikidata items about specific works which are also things that have been used in Wikimedia projects, notably scholarly journal articles. These would now exist twice: once in Wikidata as an item in their own right, and once in the Shared Citations database as something which has been referenced from [for example] a Wikipedia article. There would need to be a community consensus built as to how these two interact with each other. Tools such as Cite Q would need to be adapted to support that new consensus.
  • Integration in wiki syntax. Currently, all references in MediaWiki use the XML-like element <ref>, provided by the Cite extension to denote a reference. In order to be able to differentiate Shared Citations and not 'overload' [the editor and the software] from that without any interruption to the existing system, it might be necessary to use a new piece of syntax (perhaps <cite> or {{#cite}} ) to invoke the new functionality. The specific technical implementation, and implications for accessibility, right-to-left (RTL) languages, and other localisation considerations need to be explored.
  • Database Dumps. When the data in the references are hosted separately, can a complete database dump of a wiki still be generated which includes the references. Or, can a dump of the Shared Citations database be provided which provides the relevant information? This question equally applies to where Wikidata is generating significant quantities of content in another wiki.
  • Mismatched page/item protection state. How to deal with circumstances where a Shared Citation record is used in multiple pages (potentially across multiple wikis), but where some of those pages are protected from general editing. For example, if an anonymous editor edits a citation on a page in the Klingon Wikipedia, that change would be saved on the Shared Citation database and propagated out to all other locations where the same citation is used. However, if one of those uses is on a Vulcan Wikipedia article which is fully-protected, how should that change be handled? This is an equivalent issue to Wikidata facts being embedded in Wikipedias.
  • Multiple versions of the same work. Most journal articles likely to be cited exist in both print or electronic format; most books, in several editions. Existing WP references are highly variable in which they cite; even some standard "reliable" reference sources like OCLC are inconsistent and sometimes incorrect. Do we accept references however cited, or attempt to verify and unify them?
  • Oral citations and other uncommon or contested formats of knowledge references. There are many debates about how (or whether) to have references to unprinted sources. The Shared Citations database would be agnostic to the editorial policies of each language edition but it's likely that people would experiment with new formats in a way that others might find frustrating/disruptive.
  • Other? <Add more here>

Partners

[edit]

Internet Archive

[edit]
The InternetArchiveBot's current web archive statistics page; A project which could be greatly simplified if it could integrate with a single Shared Citations database.

For the better part of a decade the Internet Archive has been archiving the content of Web pages linked from Wikipedia Articles (and Web pages linked from those pages as well) in near-real-time. Millions of URLs/week. And, with the InternetArchiveBot we have helped rescued more than 14 million URLs that had been returning a 404 (page not found) by editing those broken links to point to Wayback Machine (and other Web archive) links. We have run InternetArchiveBot on more than 50 Wikipedia sites and are in the process of setting up the software to run on even more.

As part of our Turn All References Blue (TARB) project we recently started using InternetArchiveBot to help add links to digital versions of books referenced in Wikipedia articles. To date we have added more than 700,000 book links to 10 Wikipedia sites. As a demonstration of the commitment of the Internet Archive to this project, in 2019 one of our sister non-profits purchased the online book store Better World Books, allowing us to pull books cited in Wikipedia articles off conveyor belts, digitize them, and add links to them.

In the course of this work we have built databases of URLs, and book/academic paper citations, from more than 50 Wikipedia sites. We built, and maintain, those databases out of necessity, to power our broken link fixing, and links to books/papers, efforts. Here is an inventory of some of the citations templates we imported metadata from (from just 2 Wikipedia sites) to help populate our databases. Our goal is simple… every link, and citation, in every Wikipedia article, in every Wikipedia site should be persistent, reliable and take the curious to (at least) a snippet of the cited work. Here is an incomplete report of our work to date. The Shared Citations Database proposal would be a godsend to help support the TARB project, help the Internet Archive acquire, digitize and link more works to citations and, in so doing, help make Wikipedia sites, and the Web, more useful and reliable.

Mark Graham. Director of Wayback Machine at Internet Archive

Scribe

[edit]
Scribe aims to support low-resource Wikipedia editors in creating new articles.

Scribe is a project to support low-resource Wikipedia editors in creating articles in their language. One of the most important components of the project is to suggest references to the editor, ideally in their language. To give the reader an insight into the quality of the reference, we opted to show them (among other things) how much the reference is used on Wikipedia already, in their language Wikipedia and in the topic area they are writing about, with a focus on online references. This information is currently extracted from Wikipedia dumps, but to be more flexible would need to be stored in a central place.

While creating the database for Scribe about online references, we found that there is a general need for a centralised place for references, to understand, for example, how much references are used across languages. We, therefore, opened a part of our dataset with an API: Scribe's credibility API. This API supports analysis of references and is available in 10 Wikipedia languages. But the fact that it needs to process the dumps limits its actuality, which the Shared Citations project could tackle.

Lucie-Aimée Kaffee, Scribe

Other? <Do you represent another organisation which would like to consider itself a 'partner' of this project? Please add details here>

Timeline

[edit]
timeline estimate
Done Research phase
  • Initial stakeholder interviews and research
  • Proposal to WMF management
  • Proposal publication on Meta
  • Seek community endorsements for a pilot phase
Current blocker Annual Planning phase
  • Get proposal included in WMF annual plan
Future Q1 "Pilot approval" phase
  • Defining the resource requirements
  • Architectural exploration with WMF Architecture Team
  • Grant writing/applying
Q2
  • WMF ramp-up (e.g. hiring)
  • Researching "current state of citations" across Wikimedia
  • Community consultation and designing

[if pilot approval phase is successful]

Q3-4 "Design" phase
  • Design research
  • Engineering design and prototype
  • Monitoring + Editing backend research
Q5-6 "Pilot" phase [if design phase is successful]
  • Launch database
  • Community building + properties
  • Iteration on Monitoring + Editing integration
  • First user actions in 1st round of supported wikis
Q7-8 "Beta" phase [if Pilot phase is successful]
  • Completed workflow support for 1st round wikis
  • Enable on 2nd round of wikis
  • Populating with content + support community tools...

As the proposal reaches further into the future, the timeline naturally becomes less clear/fixed.

* The concept of "approval" for this complex project is very difficult to define. As this project would requires significant software development investment and also community involvement, approval would be required from both the WMF executive (through its annual planning process) and volunteer editing community (perhaps through a proposal for new projects RfC process?). However, both the editing community and Wikimedia Foundation executive would need to see that the proposal has received in-principle support by the other, and also neither would wish to feel that the other has made a decision without consulting them first. This "delicate dance" is why the early-2021 rows have parallel WMF/Community elements. The most likely scenario is a series of phases which each require a more formal approval from either WMF, Community, or both – These have been tentatively described in the above table as the "pilot approval", "design", "pilot" and "beta" phases. Furthermore, the software development described in #MediaWiki Development would be beneficial to many aspects of Wikimedia projects whether or not the Shared Citations database itself was created, meaning it could be worked upon independently of any approval process for creating a new citation management system (or enabling it for use on any individual sister project wiki).

Rollout & Metrics

[edit]

This is a suggested order for rollout prioritisation while recalling that the #principle of "enabled on readiness" means that rollout would follow the order of community demand (and therefore would not be a neat "all language editions of a sister project" in sequence).

The prioritisation of the rollout should be focusing on, 1) ensuring the diversity of the community engaging in the Shared Citations database is broad from the beginning (ensuring the culture of the project is not overwhelmed by one 'form' of knowledge management from the beginning), and 2) ensuring the corpus of the Shared Citations database grows to have a likely high overlap with needs of the subsequent groups.

Pilot: Incubator and/or Abstract Wikipedia. Single MediaWiki instance representing multiple language communities which are the smallest in the movement – ensuring a diversity of cultural contexts of references built in from the start. Highest need for technical simplification, lowest risk of negative hostile response to beta software.

  1. Wiktionaries & Wikiquotes. Community familiar with complex citation management and citation reuse. Diversity of citation types, template formats, and languages. Corpus of most oft-cited works. Built on MediaWiki.
  2. Wikidata. A large number of citations already in a consistent structured format. A single wiki with a multilingual and multicultural community, familiar with structured-data, properties etc. Necessary community discussion on how properties and content should be mapped to each other. Built on WikiBase which would require bespoke UI and workflows.
  3. Wikisources. Most active community entirely among “emerging communities” ensuring citation diversity built-in to the culture.
  4. Wikibooks & Wikiyoyage. Smaller projects which are closest in style to Wikipedias
  5. Wikipedias and other sister projects. Largest block of content – most benefit from the "network effect" of increased corpus size and community curation of the corpus. Largest variety of citation formats and edge cases.

Possible metrics and associated targets for determining "success" should focus on the number of individual editors using the system (and continuing to use); proportion of newly-created references being created in the new system on a wiki relative to 'traditional' style; the proportion of total references on a wiki using the new system; diversity of types of citations supported by the system; average number of reuses of a citation record (compared to an existing system baseline); and the number of Wikimedia communities widely/consistently using Shared Citations on their wiki. The metric of success should explicitly not include the speed at which rollout to new wikis occurs (this would encourage haste over readiness) nor total number of records in the Shared Citations database (which would encourage record-creation before the community is ready to support it).

References

[edit]

Endorsements

[edit]

If you would like to list yourself as supporting this proposal, please sign your username below. If/when this project moves into the stage of resource-investigation for software development in the WMF and WMF annual planning, a proposal for new projects listing would be published.
If you would like to provide substantive feedback, ask questions, or comment please use the talkpage.

  • Strong support Strong support. --Joalpe (talk) 01:58, 3 December 2020 (UTC) Justification: References are often not well structured in my context --which is something I'd assume happens in other Global South contexts--, which may lead to a lot of effort to create refined references for Wikipedia. It is fine to do it once and then have this structuring effort be reused in other articles; this is not what happens now, as editors end up redoing this every time they want to use the same reference in a different article.[reply]
  • Strong support Strong support. --Silva Selva (talk) 02:00, 3 December 2020 (UTC)[reply]
  • Strong support Strong support. --Erokhin (talk) 05:31, 3 December 2020 (UTC)[reply]
  • Support Support This would significantly improve citation management, reducing redundancy on any given wiki and between the different multilingual wikis, and also make it a lot easier to use different citation formats if people want. en:Template:Cite Q does something similar, but only for references with their own Wikidata item; extending it to all references would be great. Mike Peel (talk) 13:55, 3 December 2020 (UTC)[reply]
  • Strong support Strong support. This proposal is the natural evolution of the WikiCite initiative and a sensible solution to bridge the gap between the horrible mess of citation management in Wikipedia with the abstract world of source metadata in Wikidata. There are still many open ended questions to answer, but I am strongly supportive of this proposed direction, which I see as a pragmatic, incremental solution to managing sources across Wikimedia projects. (Full disclosure: I was formerly the PI of the WikiCite grant and a WMF employee. I currently serve on the WikiCite steering committee as a volunteer) --DarTar (talk) 03:47, 4 December 2020 (UTC)[reply]
  • Strong support Strong support - This is so badly needed in our movement and is the logical progression of centralizing our knowledge in images (Commons), factual claims, and language links (Wikidata). The implications for knowledge equity, verifiability, and reliable sourcing are profound. We would be able to maintain references and accurate information across all our projects and quickly ascertain how this information is being used. Structuring the references and citations in a modern, machine-readable database (rather than the current lexical, redundant, and inconsistent <ref> tags we have now) would be transformative. It would not just be an asset for Wikimedia projects, but would have implications for partners, GLAM organizations and scholarship globally. This is the next logical step for the WikiCite project and builds on great experiences and technology we have several years of activity within the working group. -- Fuzheado (talk) 14:57, 4 December 2020 (UTC)[reply]
  • Strong support Strong support I've been around with WikiCite since its inception, and followed countless discussions about reconciling the various directions this project could go. This is what we need to try to do in the next years, since we have the software and the mindset to make it evolve into a project that would serve Wikidata, all the other WMF projects and third parties. I am wholeheartedly all for it. Sannita - not just another it.wiki sysop 15:07, 4 December 2020 (UTC)[reply]
  • Strong support Strong support Since starting the Wikipedia Library in 2011, there has been a need for rational citation infrastructure. This sensible proposal takes a good crack at this hard problem. It would benefit hundreds of wikis and it promises to reduce time spent adding references to Wikimedia projects. Reliability and verifiability are at the heart of our reputation and functioning as a community; this proposal gets us one big step closer to our mission. Great work! Ocaasi (talk) 18:21, 4 December 2020 (UTC)[reply]
  • Strong support Strong support I have worked for years for this kind of project to be possible. It would be a crucial step in our community's work to verify content, it does not create unfunded mandates for communities, and has an information structure which allows tight integration with Wikidata without requiring or forcing it. It could potentially make the experience of editing Wikipedia more user-friendly and would vastly reduce the maintenance burden of Wikipedia's overworked labor. harej (talk) 19:35, 4 December 2020 (UTC)[reply]
  • Strong support Strong support I agree wholeheartedly with all the expressions of support outlined above. In my view this proposal has the potential to solve many of the citation and reference issues currently being faced by the plethora of WMF projects as well as aid the editing community contributing to the same. Ambrosia10 (talk) 21:22, 4 December 2020 (UTC)[reply]
  • Strong support Strong support The best time to fix this is fifteen years ago; the second-best is now. Whatever the eventual solution looks like, this is a nettle we should grasp as soon as possible. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:35, 4 December 2020 (UTC)[reply]
  • Strong support Strong supportGreat to see this problem being tackled so comprehensively. One of the hardest things for me to accept as a new editor was the way citations are handled in Wikipedia.DrThneed (talk) 22:24, 4 December 2020 (UTC)[reply]
  • Strong support Strong support You had me at "Make citations easier for the editor, more useful for the reader, and more efficient for our architecture." --Rosiestep (talk) 22:57, 4 December 2020 (UTC)[reply]
  • Strong support Strong support. Hugely needed - timely - I support as an editor & information science researcher. This is important for flagging misinformation/retractions as well as for improving the reliability ("says who?") of Wikipedia-based info in other web information systems. Would save editor & reader time. Excellent use cases! Jodi.a.schneider (talk) 22:58, 4 December 2020 (UTC)[reply]
  • Support Support per Fuzheado and Andy Mabbett, and for the potentially massive reduction in the bloat of Wikidata items (said bloat having been a burden to usability of the knowledge base in some cases) that implementing this database would represent for a large class of items. Mahir256 (talk) 23:26, 4 December 2020 (UTC)[reply]
  • Strong support Strong support. --Mlemusrojas (talk) 01:10, 5 December 2020 (UTC)[reply]
  • Strong support Strong support Yes please. sounds like an excellent solution to a much lamented problem. Doctor 17 (talk)
  • Strong support Strong support As a Wiktionary editor, I'll be happy to add easily a new example with the reference already formatted properly. As a wiki newspaper editor, I'll be happy to know how many country provide examples of use, how many examples are publication prior to a certain date, how many examples are by women, the division between examples from books, journal, website, etc. All those information may help to balance an unfair provenance of information, like adding more example from Senegal or wrote by women. I also support the way this proposal was made, and the implication of the authors to include perspectives from as much projects as possible! Great job here! -- Noé (talk) 14:35, 5 December 2020 (UTC)[reply]
  • Support Support Nicely compact and internationalizable. Helps us make footnotes in languages we don’t know. — econterms (talk) 15:25, 5 December 2020 (UTC)[reply]
  • Strong support Strong support. This would be very valuable also for research purposes. Giovanni1085 (talk) 15:27, 5 December 2020 (UTC)[reply]
  • Strong support Strong support Amazing, comprehensive plan. I am glad to see such an important project taking off. TiagoLubiana (talk) 20:15, 5 December 2020 (UTC)[reply]
  • Strong support Strong support This proposal will enable Wikipedians to save and use citation data in a shared citations database, which I would compare to what reference management software are used for writing scientific articles. The advantages are numerous as outlined here and the strong focus on the integration and tools for the Wikipedia community is convincing. --Zuphilip (talk) 20:51, 6 December 2020 (UTC)[reply]
  • Support Support A better solution for citations is indeed needed. Interoperability must be taken into account in all aspects. I don't think it's a trivial task. From my science perspective, I see these needs (they may be touched upon, but still reading the full text and all details): 1. "main subject" ability: being able to link literature to other things is what citable content in Wikidata so important (think Scholia); 2. being able to use the citations as references in Wikidata; 3. author names are non-trivial; reference types are non-trivial: it must have a detailed, documented model (think RIS, BibLaTeX, CSL). Basically, the model must be a superset of all we need (articles, software, data, chapters, books, etc, etc, etc), to ensure we will not get edit-wars and allow full curation. ShEx comes to mind as supporting tool; 4. seek connection with university libraries who are looking for open infrastructures to support their work. I hope to find time to read all the details next week. --Egon Willighagen (talk) 11:02, 7 December 2020 (UTC)[reply]
  • Strong support Strong support. This seems like a logical first step in structuring, standardising and improving citations on Wikipedia, and should make it easier for users to make that all important journey from Wikipedia article to reliable source material. Jason.nlw (talk) 10:31, 9 December 2020 (UTC)[reply]
  • Strong support Strong support. It is a no-brainer that this would be enormously helpful. At the same time, spontaneously it is less clear how to do it exactly—but I am amazed about the herein proposed implementation which has obviously taken into account a lot of experiences with our other projects. Great job so far! —MisterSynergy (talk) 13:51, 9 December 2020 (UTC)[reply]
  • Strong support Strong support. A very valuable project. A lot of work but really worth to do. Raymond (talk) 09:00, 10 December 2020 (UTC)[reply]
  • Strong support Strong support Thank you for pushing it forward with this well thought out plan! – Susanna Ånäs (Susannaanas) (talk) 12:55, 14 December 2020 (UTC)[reply]
  • Strong support Strong support This seems like a great next step for WikiCite. Reusing citations across Wikimedia projects in a standard way makes so much sense and I think libraries would be excited to contribute and help move this forward. Chicagohil (talk) 06:34, 15 December 2020 (UTC)[reply]
  • Strong support Strong support I agree that it's a logical extension of centralized images, language links and data. Desperately useful even just from a synchronisation point of view, but also monitoring and improving source usage across the sites. Early experiments by the wikicite community have laid the groundwork for what this would look like, but significant technical integration is needed for usability (especially with citoid). T.Shafee(Evo﹠Evo)talk 11:28, 17 December 2020 (UTC)[reply]
  • Strong support Strong support. Such an important project. Also very well planned. I hope it starts working -- and soon! -- GiFontenelle (talk) 13:39, 19 December 2020 (UTC)[reply]
  • Strong support Strong support, this is logically the next best step for WikiCite to evolve from the existing scenario. -- Bodhisattwa (talk) 07:06, 3 January 2021 (UTC)[reply]
  • Strong support Strong support, we recently organized an event on WikiCite in Ghana where we partnered with a library and we were enlightened on how to get access to references using books etc. It really helped us on how to use citations. So this project is very important for the movement. DaSupremo (talk) 00:09, 4 January 2021 (UTC)[reply]
  • Strong support Strong support, it is a logical step, the evolution of the wikiverse after wikidata and wikifunctions makes mandatory a project like this.--TaronjaSatsuma (talk) 16:01, 4 January 2021 (UTC)[reply]
  • Strong support Strong support. What excites me most about this proposal is that it addresses the specific urgent need for better citation management in Wikipedia projects, while leaving the door open to ambitious open catalog efforts in Wikidata/Wikicite/etc in the future. The improvements in editing workflow, reading experience, library integrations, and epistemic confidence/quality are hard to overstate. --Blnewbold (talk) 21:41, 4 January 2021 (UTC)[reply]
  • Strong support Strong support - This is a wonderful proposal for me as an editor in a small language Wikipedia project. I use the content translation tool a lot for creating new pages. Usually once I have translated the article, then starts the cumbersome task of copying and adding citations to the new article, and inevitably template migration of references templates as well. This task of adding citations, usually takes as much time, it takes to translate the article in the first place. This project would cut-down my translation time in HALF!!, I strongly support this proposal. -- Thuvack (talk) 16:05, 6 January 2021 (UTC)[reply]
  • Strong support Strong support. I am in full favour! Example: same reference used to support multiple Wikidata statements, the same references are used in the Wikipedia article. David Nind (talk) 21:48, 10 January 2021 (UTC)[reply]
  • Strong support Strong support of course we need to expand and improve our citations infrastructure: to make our collection of bibliographic open knowledge more robust, to give reliability in articles a stronger foundation, to make editing easier for editors, and to support small languages. -- phoebe | talk 20:05, 25 January 2021 (UTC)[reply]
  • Support Support All of this makes sense. Just a few days ago, a new article was created on the English Wikipedia about a scientist who is an author of several books and papers cited in other articles. To add links from existing appearances of her name in other articles to the new article, I had to search for several different spellings of her name and to use various template parameters like editor-link or author-link instead of the familiar [[link syntax]], even though most of the instances were from the same three books! This added up to about fifty edits, and this is just one of the times that I had to do it. In some other cases it's way more than fifty. Such things have to be done by lots of editors, and in many, many thousands of cases. The current infrastructure was good for adding a lot of references, but it doesn't scale. If we had a system proposed on this page, it would take much fewer edits—not fifty, but three or four, or maybe even zero. One important comment: To realize the proposed system's full potential, the global templates and modules repository needs to be impemented, too, because almost all structured references are inserted into articles using templates and modules. As I write this, global templates are already mentioned here, and it's good, but it bears repeating: the two projects will really help each other, and both should be implemented. --Amir E. Aharoni (talk) 14:14, 27 January 2021 (UTC)[reply]
  • Strong support Strong support --So9q (talk) 07:25, 10 February 2021 (UTC)[reply]
  • Strong support Strong support. It seems to me that this would be a magnificent and very important, possibly fundamental, addition to the Wikiverse. If nothing else, the accumulation and integration of this kind of data in the public domain might help to loosen the stranglehold that academic publishing has on academia, medicine, and scientific and technological progress in general. Once established and being used widely, I suspect it will generate a natural "pressure" or gradient, to expand and encompass more than just items cited in sister projects. — Jonathanischoice (talk) 21:35, 14 February 2021 (UTC)[reply]
  • Strong support Strong support. This is a solid approach to an issue that has been festering for a very long time. As a Wikidatan who adds many identical references every day, I am well aware of the massive duplication of data and effort involved in the current processes. - PKM (talk) 22:49, 14 February 2021 (UTC)[reply]
  • Strong support Strong support This is core to our projects. Mauricio V. Genta (talk) 19:01, 15 February 2021 (UTC)[reply]
  • Support Support A fascinating idea which should help with verifiability and transparency. Richard Nevell (talk) 09:45, 17 February 2021 (UTC)[reply]
  • Bien sûr. Commented on earlier drafts; this is looking tremendous. Hurry up and take my cites. –SJ talk  17:51, 19 February 2021 (UTC)[reply]
  • Strong support Strong support This makes a whole lot of sense. Working with citations through and across articles has taught me the difficulties of filing and duplicating templates, and maintaining references over time ; translating across Wikipedia versions has shown the pain of remapping template fields ; working with references on Wikidata has been a glimpse of what could be a 'nicer' reference editing experience, but crashed again the tediousness of reusing them across statements and items. The proposal is well researched and thought-through, drawing from relevant past projects (aaah, the Espace référence!) and reflects on the needs of various Wikimedia projects beyond Wikipedia (particularly glad to see the Wiktionaries involved here!). It’s a tremendous opportunity. Jean-Fred (talk) 17:05, 22 February 2021 (UTC)[reply]
  • Support Support an important and ambitious idea, the problem is real and solutions need to be explored. Cdlt, VIGNERON * discut. 21:29, 22 February 2021 (UTC)[reply]
  • Support Support Complex, but definitely useful and interesting proposal --LucaMauri (talk) 08:01, 23 February 2021 (UTC)[reply]
  • Strong support Strong support. It is high time this will be deployed! Exec8 (talk) 08:52, 23 February 2021 (UTC)[reply]
  • Strong support Strong support. --Olea (talk) 12:39, 23 February 2021 (UTC)[reply]
  • Strong support Strong support. This kind of centralized management of citations would also greatly simplify research projects aiming at understanding how knowledge is constructed in Wikipedia, and how Wikipedia in turn influences scientific knowledge construction. --Diegodlh (talk) 17:01, 23 February 2021 (UTC)[reply]
  • Strong support Strong support. This is very much needed, and the proposal looks great! --Akorenchkin (talk) 17:40, 23 February 2021 (UTC)[reply]
  • Support Support This seems a neat idea, and if it leads to one being able to reuse citations of the same book (or journal article or legal decision or report) but with different page citations—as seems to be suggested—that'd be worth it generally. There are almost certainly going to be issues regarding community acceptance on-wiki if it is Wikidata based, and the Wikidata community probably need to have a bit of soul searching about how vandalism on Wikidata is handled, given that WikiCite potentially means vandalising one Wikidata object could affect thousands of Wikipedia articles across many projects and languages. The idea is good though and if the practical and community issues can be resolved, well worth pursuing. —Tom Morris (talk) 21:45, 26 February 2021 (UTC)[reply]
  • Support Support it was time...--Alexmar983 (talk) 12:03, 28 February 2021 (UTC)[reply]
  • Oppose Oppose I fail to see how it will make life easier. It rejects the existing information particularly for scholarly works. When it is to start from scratch AND insists that it will include everything with citations in Wikidata it will become a maintenance nightmare with any luck, the structured data that is already in Wikidata will save that day. Thanks, GerardM (talk) 12:33, 28 February 2021 (UTC)[reply]
  • Oppose Oppose There are known citation biases. See Wikipedia’s political science coverage is biased. I tried to fix it. I do not support this proposal in its current form, as the problem statement does not include bias at the highest level. For example: "Our references are high in maintenance, technical complexity, and duplication of effort." AND amplify and reinforce existing biases against groups that are already underrepresented in academia and elsewhere. This needs to be stated at the top as a significant component of the problem at hand, and metrics for success must include progress towards equity--and specifically gender equity. OpenSexism (talk) 22:36, 28 February 2021 (UTC)[reply]
    • Ensuring Community and epistemological diversity was 'baked in' to the proposed Rollout & Metrics and described somewhat in the Knowledge Equity subheading - but you're right that it did not explicitly name gender-equity (e.g. gender ratio of authors cited in references). In addressing content biases, it is hoped that being able to centralise and thereby quantify/analyse the currently used references, this will bring greater visibility to existing biases. This doesn't "solve the problem" by itself, but makes it harder to ignore by shedding light on the issue. For example, User:Noé had some interesting theoretical uses for French-Wiktionary in this direction – being able to highlight when a page didn't have any references from use-cases/pronunciations from beyond mainland France. p.s. for what it's worth, I've now added an explicit gender-analysis example to the table of use-cases. LWyatt (WMF) (talk) 15:23, 1 March 2021 (UTC)[reply]
      Yeah, that's right. As I am invoked here, I want to say I am still eager to see this project on the rails. Well, in Wiktionaries, we are concerned by references but also by quotations to show the uses of the words described. For those, we will be pleased to be able to map the provenances of the books quoted and the gender of the people who wrote those quotations, including with more subtle data such as the genre of publication related to the gender (are women writers are more common in press articles or in theatrical pieces? are they more common after 1940s publications? are they more common in publication from Mali than from Senegal?). The Wiktionary community doesn't expect Shared Citations project to contribute to the Wiktionaries on this basis. It is a precious metrics to impulse other communities actions, if they want to, but I am convinced that it can't be expected to make the edits in every projects when it could only be made by the whole movement involvement Noé (talk) 17:36, 1 March 2021 (UTC)[reply]
    • Thanks, Liam and Noé. Gaps and biases are a significant problem, as large as the ones listed in the proposal's problem statement, and equally relevant. The findings of the writer in The Washington Post story speak to a very deep problem that isn't represented among those driving this work. If knowledge equity is fundamental to the goals of the movement, it is also fundamental to the problems that must be solved. It's great that you noted a use case for gender bias research, but the existence of gender bias and other harmful biases is known, and belongs beside the other established problems and goals that motivate this work (verifiability, anti-disinformation, knowledge integrity, duplication, repetition, manual effort). It's much easier to understand that something is a priority--one that people care deeply about and are committed to realizing/solving--when it's at the top. Is there a reason not to do this? OpenSexism (talk) 17:56, 2 March 2021 (UTC)[reply]
  • Liam has been graciously discussing these comments and incorporating changes. If you have thoughts about how to better align existing and proposed systems to the goal of understanding, monitoring, and redressing citation biases, please join the discussion here. I am leaving these comments as they stand to draw attention to the problem and the need for resources OpenSexism (talk) 18:14, 14 March 2021 (UTC)[reply]
  • Strong support Strong support. As a keen reference user and contributor, and staunch believer in reliable sources of information, I support this. If it can be achieved, it would assist the whole project from the macro to the micro level. At the macro level - alignment with Wikipedia purpose, better functionality, enhanced reputation and ease of use; at the micro level - consistency down to the spelling of authors’ names. It would improve everything from contributing references to accessing them, making the encyclopaedia even more valuable. Whiteghost.ink (talk) 23:44, 28 February 2021 (UTC)[reply]
  • Strong support Strong support Kpjas (talk) 19:00, 1 March 2021 (UTC)[reply]
  • Support Support I have some questions on details, but this proposal makes a lot of sense to me. ArthurPSmith (talk) 21:00, 1 March 2021 (UTC)[reply]
  • Strong support Strong support Andrawaag (talk) 14:52, 2 March 2021 (UTC)[reply]
  • Strong support Strong support it'd be incredibly useful to have this for Wikidata. Nicereddy (talk) 01:48, 3 March 2021 (UTC)[reply]
  • Support Support The project already sum up how meaningful it could be, no need to paraphrase it here. My only concern is on the license, since it talk about Wikidata, which so far as only used CC-0 license. A the project page state, "Wikimedia’s citations are one of our greatest assets". and I my personal opinion is that throwing this asset under such a license is not the most efficient way to protect and foster it as a perennial heritage. Now, that doesn't invalid the clearness and importance of the issue this project means to address. Thank you for all that was already done. --Psychoslave (talk) 09:21, 17 March 2021 (UTC)[reply]
  • Strong support Strong support --Hfordsa (talk) 05:04, 19 March 2021 (UTC)[reply]
  • Strong support Strong support, and I've been urging this since I joined. But I just added the question about multiple version to the open question list; from my experience as a serials librarian, this can be surprising difficult; from my experience a a Wikipedia editor, a great many existing citations are incomplete or in error. DGG (talk) 03:42, 26 March 2021 (UTC)[reply]
  • Strong support Strong support Bryandamon (talk) 18:17, 16 April 2021 (UTC)[reply]
  • Strong support Strong support, this is the most well-written and thoughtful proposal I've read on here. --Azertus (talk) 21:19, 8 June 2021 (UTC)[reply]
  • Strong support Strong support. I think this is a crucial proposal, that will avoid much duplication of effort and citation errors, as well as provide useful data on citation usage. InverseHypercube (talk) 15:07, 18 June 2021 (UTC)[reply]
  • Strong support Strong support Rtnf (talk) 12:42, 19 July 2021 (UTC)[reply]
  • Strong support Strong support --Thadguidry (talk) 16:42, 19 July 2021 (UTC)[reply]
  • Strong support Strong support --Kristbaum (talk) 16:54, 1 August 2021 (UTC)[reply]
  • Support Support: Robust structuring and reuse of citation sources in and across articles, across languages and across projects would be an important gain. AllyD (talk) 16:49, 7 August 2021 (UTC)[reply]
  • Support Support I finally understand why people are excited about WikiCite. Rachel Helps (BYU) (talk) 16:59, 18 August 2021 (UTC)[reply]
  • Strong support Strong support For years, I have longed for something like {{cite isbn|...}}, where supplying the ISBN would automatically fill in default values from a database. Finally I got around to submitting a request in the 2022 Community Wishlist Survey. In discussion there, someone referred me here. So I am here strongly to support functionality of the form {{cite isbn|...}} Feline Hymnic (talk) 17:44, 29 January 2022 (UTC)[reply]
  • Strong support Strong support - definitely will be beneficial to the project. (talk) 16:26, 16 February 2022 (UTC)[reply]
  • Strong support Strong support I made a similar suggestion at enwiki a few weeks ago, particularly focusing on citebots feeding a references db with editors manual corrections to article citations, but was unfortunately misunderstood. I now see this is contemplated here. Guarapiranga (talk) 02:10, 20 July 2022 (UTC)[reply]
  • Strong support Strong support This just makes so much sense, and would make the experience simpler for editors and readers alike. —Locke Coletc 05:52, 17 August 2022 (UTC)[reply]
  • Strong support Strong support Similar to uploading images on Wikipedia (indirectly via Commons) or interlinking Wikipedia articles (via Wikidata) this interface could make it easier to monitor usage of quality sourcing, and also ensure higher percentage of archived websites in cases of dead-urls, something that isn’t automated at the moment, perhaps because of scale. This would also normalize research on citation usage. I can imagine with wikidata usage, merging items would be an additional benefit too. Shushugah (talk) 16:22, 25 August 2022 (UTC)[reply]
  • Support Support With citations shared and (more importantly,) centralized, we can have more structural data (like linking different translations of the same source together) and keep them consistent across articles and language editions. Also, centralized citations make them easier to monitor, maintain and (for us) detecting vandalism. Especially in wake of Zhemao hoaxes where sources themselves are fabricated in large amount, patrolling sources is becoming more important. Other solutions like BiBTeX uses a similar solution and I believe that makes sense. MilkyDefer 05:27, 26 August 2022 (UTC)[reply]
  • Support Support This would be such a better paradigm for understanding and analyzing references. Imagine being able to follow a link to a shared citation and then seeing all the pages that cite that same source. That alone would be enormously useful for readers.--Sage (Wiki Ed) (talk) 21:42, 7 September 2022 (UTC)[reply]
  • Support Support This would be very valuable to me as both an editor and a reader. TypistMonkey (talk) 22:35, 9 September 2022 (UTC)[reply]
  • Strong support Strong support I've spent countless hours adding/formatting references, adding quotations, etc., probably duplicating hundreds of hours of works by others who have used the same references to support different statements elsewhere. Things like <ref name="..."/> and list-defined references in Wikipedia, as well as the DuplicateReferences gadget in Wikidata, definitely help with the reuse (in the same Wikipedia article or Wikidata item) and wikicode legibility issues, but the problem is much deeper as explained above in depth. I also am negatively affected by the thousands of scholarly publications in Wikidata (which still represents just a fraction of all the publications out there), that make it hard to find items about the concepts themselves. A proper database of reusable, citable works would be a great boon to Wikimedia's mission. --Waldyrious (talk) 22:50, 4 December 2022 (UTC)[reply]
  • Strong support Strong support We definitely need to have modular management of citations, as well as taking explicitly into account ways of doing this that can try to minimise the role of w:Geographical bias on Wikipedia and other known biases. Boud (talk) 13:57, 4 February 2023 (UTC)[reply]
  • Strong support Strong support AmandaSLawrence (talk) 05:39, 13 March 2023 (UTC) It would be great to see this move to the next stage.[reply]
  • Strong support Strong support I could see this project being incredibly popular with universities, academics, and the librarians who support them, as it would enable them to see where their research is being cited. I am looking forward to seeing how this would progress and I'd be happy to help in any way that I can. Drkirstyross (talk) 20:58, 21 March 2023 (UTC)[reply]
  • Strong support Strong support per above, this would make re-using and expanding references across the board so much easier. --SilverTiger12 (talk) 16:38, 22 August 2023 (UTC)[reply]
  • Strong support Strong support WikiCite would be a fantastic tool for so many reasons, that go beyond Wikipedia and its sister projects at Wikimedia. Thanks to the wiki model of shared collaboration it could become what current tools like OCLC will never be able to be, a truly universal citation repository for any piece of retrievable information ever published. If such a tool became available, I can see virtually any indexing service (from Worldcat and LoC down to my neigborhood library) eventually using it or at least referring back to it, as the default cataloguing repository of knowledge. -- JudeFawley (talk) 09:41, 24 February 2024 (UTC)[reply]