Jump to content

Wikimedia Research/Showcase

From mediawiki.org

The Monthly Wikimedia Research Showcase is a public showcase of recent research by the Wikimedia Foundation's Research Team and guest presenters from the academic community. The showcase is hosted virtually every 3rd Wednesday of the month at 9:30 a.m. Pacific Time/18:30 p.m. CET and is live-streamed on YouTube. The schedule may change, see the calendar below for a list of confirmed showcases.

How to attend

[edit]

We live stream our research showcase every month on YouTube. The link will be in each showcase's details below and is also announced in advance via wiki-research-l, analytics-l, and @WikiResearch on Twitter. You can join the conversation and participate in Q&A after each presentation using the YouTube chat. We expect all presenters and attendees to abide by our Friendly Space Policy.

Upcoming Events

[edit]

December 2024

[edit]
Time
Wednesday, December 11, 17:00 UTC: Find your local time here
Theme
AI for Wikipedia

Archive

[edit]

For information about past research showcases (2013-present), you can search below or see listing of all months here.

2024

[edit]

November 2024

[edit]
Time
Wednesday, November 20, 16:30 UTC: Find your local time here
Theme
A Look at External Factors that Help Different Language Versions of Wikipedia Thrive

November 20, 2024 Video: YouTube

The social embeddedness of peer production- A comparative qualitative analysis of three Indian language Wikipedia editions
By Sejal Khatri
Why do some peer production projects do a better job at engaging potential contributors than others? We address this question by comparing three Indian language Wikipedias, namely, Malayalam, Marathi, and Kannada. We found that although the three projects share goals, technological infrastructure, and a similar set of challenges, Malayalam Wikipedia’s community engages language speakers in contributing at a much higher rate than the others. Drawing from a grounded theory analysis of interviews with 18 community participants from the three projects, we found that experience with participatory governance and free/open-source software in the Malayalam community supported high engagement of contributors. Counterintuitively, we found that financial resources intended to increase participation in the Marathi and Kannada communities hindered the growth of these communities. Our findings underscore the importance of social and cultural context in the trajectories of peer production communities.


Low-Resource Languages and Online Knowledge Repositories- A Need-Finding Study
By Hellina Hailu Nigatu, UC Berkeley
Online Knowledge Repositories (OKRs) like Wikipedia offer communities a way to share and preserve information about themselves and their ways of living. However, for communities with low-resourced languages—including most African communities—the quality and volume of content available are often inadequate. One reason for this lack of adequate content could be that many OKRs embody Western ways of knowledge preservation and sharing, requiring many low-resourced language communities to adapt to new interactions. In this talk, we will go through findings from two studies: (1) a thematic analysis of Wikipedia forum discussions and (2) a contextual inquiry study with 14 novice contributors who create content in low-resourced languages. We will focus on three Ethiopian languages: Afan Oromo, Amharic, and Tigrinya. Our analysis revealed several recurring themes; for example, contributors struggle to find resources to corroborate their articles in low-resourced languages, and language technology support, like translation systems and spellcheck, result in several errors that waste contributors’ time. Based on our analysis, we will also outline design opportunities for building better language support tools and interfaces for low-resourced language speakers.

October 2024

[edit]
Time
Wednesday, October 16, 16:30 UTC: Find your local time here
Theme
Wikipedia for Political and Election Analysis

October 16, 2024 Video: YouTube

Throw Your Hat in the Ring (of Wikipedia)
Exploring Urban-Rural Disparities in Local Politicians' Information Supply
By Akira Matsui, Yokohama National University
This talk explores the socio-economic factors contributing to disparities in the supply of local political information on Wikipedia. Using a dataset of politicians who ran for local elections in Japan, the research investigates the relationship between socio-economic status and creating and revising politicians' Wikipedia pages. The study reveals that areas with different socio-economic backgrounds, such as employment industries and age distribution, exhibit distinct patterns in information supply. The findings underscore the impact of regional socio-economic factors on digital platforms and highlight potential vulnerabilities in information access for political content.


Party positions from Wikipedia classifications of party ideology
By Michael Herrmann, University of Konstanz
We develop a new measure of party position based on a scaling of ideology tags supplied in infoboxes on political parties' Wikipedia pages. Assuming a simple model of tag assignment, we estimate the locations of parties and ideologies in a common space. We find that the recovered scale can be interpreted in familiar terms of "left versus right." Estimated party positions correlate well with ratings of parties' positions from extant large-scale expert surveys, most strongly with ratings of general left-right ideology. Party position estimates also show high stability in a test-retest scenario. Our results demonstrate that a Wikipedia-based approach yields valid and reliable left-right scores comparable to scores obtained via conventional expert coding methods. It thus provides a measure with potentially unlimited party coverage. Our measurement strategy is also applicable to other entities.

September 2024

[edit]
Time
Wednesday, September 18, 16:30 UTC: Find your local time here
Theme
Curation of Wikimedia AI Datasets

September 18, 2024 Video: YouTube

Supporting Community-Driven Data Curation for AI Evaluation on Wikipedia through Wikibench
By Tzu-Sheng Kuo, Carnegie Mellon University
AI tools are increasingly deployed in community contexts. However, datasets used to evaluate AI are typically created by developers and annotators outside a given community, which can yield misleading conclusions about AI performance. How might we empower communities to drive the intentional design and curation of evaluation datasets for AI that impacts them? We investigate this question on Wikipedia, an online community with multiple AI-based content moderation tools deployed. We introduce Wikibench, a system that enables communities to collaboratively curate AI evaluation datasets, while navigating ambiguities and differences in perspective through discussion. A field study on Wikipedia shows that datasets curated using Wikibench can effectively capture community consensus, disagreement, and uncertainty. Furthermore, study participants used Wikibench to shape the overall data curation process, including refining label definitions, determining data inclusion criteria, and authoring data statements. Based on our findings, we propose future directions for systems that support community-driven data curation.


WikiContradict- A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia
By Yufang Hou, IBM Research Europe - Ireland
Retrieval-augmented generation (RAG) has emerged as a promising solution to mitigate the limitations of large language models (LLMs), such as hallucinations and outdated information. However, it remains unclear how LLMs handle knowledge conflicts arising from different augmented retrieved passages, especially when these passages originate from the same source and have equal trustworthiness. In this work, we conduct a comprehensive evaluation of LLM-generated answers to questions that have varying answers based on contradictory passages from Wikipedia, a dataset widely regarded as a high-quality pre-training resource for most LLMs. Specifically, we introduce WikiContradict, a benchmark consisting of 253 high-quality, human-annotated instances designed to assess LLM performance when augmented with retrieved passages containing real-world knowledge conflicts. We benchmark a diverse range of both closed and open-source LLMs under different QA scenarios, including RAG with a single passage, and RAG with 2 contradictory passages. Through rigorous human evaluations on a subset of WikiContradict instances involving 5 LLMs and over 3,500 judgements, we shed light on the behaviour and limitations of these models. For instance, when provided with two passages containing contradictory facts, all models struggle to generate answers that accurately reflect the conflicting nature of the context, especially for implicit conflicts requiring reasoning. Since human evaluation is costly, we also introduce an automated model that estimates LLM performance using a strong open-source language model, achieving an F-score of 0.8. Using this automated metric, we evaluate more than 1,500 answers from seven LLMs across all WikiContradict instances.

August 2024

[edit]
No showcase due to Wikimania.

July 2024

[edit]
Time
Wednesday, July 24, 16:30 UTC: Find your local time here
Theme
Machine Translation on Wikipedia

July 24, 2024 Video: YouTube

The Promise and Pitfalls of AI Technology in Bridging Digital Language Divide
By Kai Zhu, Bocconi University
Machine translation technologies have the potential to bridge knowledge gaps across languages, promoting more inclusive access to information regardless of native languages. This study examines the impact of integrating Google Translate into Wikipedia's Content Translation system in January 2019. Employing a natural experiment design and difference-in-differences strategy, we analyze how this translation technology shock influenced the dynamics of content production and accessibility on Wikipedia across over a hundred languages. We find that this technology integration leads to a 149% increase in content production through translation, driven by existing editors becoming more productive as well as an expansion of the editor base. Moreover, we observe that machine translation enhances the propagation of biographical and geographical information, helping to close these knowledge gaps in the multilingual context. However, our findings also underscore the need for continued efforts to mitigate the preexisting systemic barriers. Our study contributes to our knowledge on the evolving role of artificial intelligence in shaping knowledge dissemination through enhanced language translation capabilities.


Implications of Using Inorganic Content in Arabic Wikipedia Editions
By Saied Alshahrani and Jeanna Matthews, Clarkson University
Wikipedia articles (content pages) are one of the widely utilized training corpora for NLP tasks and systems, yet these articles are not always created, generated, or even edited organically by native speakers; some are automatically created, generated, or translated using Wikipedia bots or off-the-shelf translation tools like Google Translate without human revision or supervision. We first analyzed the three Arabic Wikipedia editions, Arabic (AR), Egyptian Arabic (ARZ), and Moroccan Arabic (ARY), and found that these Arabic Wikipedia editions suffer from a few serious issues, like large-scale automatic creations and translations from English to Arabic, all without human involvement, generating content (articles) that lack not only linguistic richness and diversity but also content that lacks cultural richness and meaningful representation of the Arabic language and its native speakers. We second studied the performance implications of using such inorganic, unrepresentative articles to train NLP tasks or systems, where we intrinsically evaluated the performance of two main NLP upstream tasks, namely word representation and language modeling, using word analogy and fill-mask evaluations. We found that most of the models trained on the organic and representative content outperformed or, at worst, performed on par with the models trained with inorganic content generated using bots or translated using templates included, demonstrating that training on unrepresentative content not only impacts the representation of native speakers but also impacts the performance of NLP tasks or systems. We recommend avoiding utilizing the automatically created, generated, or translated articles on Wikipedia when the task is a representation-based task, like measuring opinions, sentiments, or perspectives of native speakers, and also suggest that when registered users employ automated creation or translation, their contributions should be marked differently than “registered user” for better transparency; perhaps “registered user (automation-assisted)”.

June 2024

[edit]

No Research Showcase due to Wiki Workshop.

May 2024

[edit]
Time
Wednesday, May 15, 16:30 UTC: Find your local time here
Theme
Reader to Editor Pipeline

May 15, 2024 Video: YouTube

Journey Transitions
By Mike Raish and Daisy Chen
What kinds of events do readers and editors identify as separating the stages of their relationship with Wikipedia, and which of these kinds of events might the Wikimedia Foundation possibly support through design interventions? In the Journey Transitions qualitative research project, the WMF Design Research team interviewed readers and editors in Arabic, Spanish, and English in order to answer these questions and provide guidance to WMF Product teams making strategic decisions. A series of semi-structured interviews revealed that readers and editors describe their relationships with Wikipedia in different ways, with readers describing a static and transactional relationship, and that even many experienced editors express confusion about core functions of the Wikimedia ecosystem, such as the role of Talk pages. This presentation will describe the Journey Transitions research, as well as present its implications for the sponsoring Product teams in order to shed light on the way that qualitative research is used to inform strategic decisions in the Wikimedia Foundation.


Increasing participation in peer production communities with the Growth features
By Morten Warncke-Wang and Kirsten Stoller
For peer production communities to be sustainable, they must attract and retain new contributors. Studies have identified social and technical barriers to entry and discovered some potential solutions, but these solutions have typically focused on a single highly successful community, the English Wikipedia, been tested in isolation, and rarely evaluated through controlled experiments. In this talk, we show how the Wikimedia Foundation’s Growth team collaborates with Wikipedia communities to develop and experiment with new features to improve the newcomer experience in Wikipedia. We report findings from a large-scale controlled experiment using the Newcomer Homepage, a central place where newcomers can learn how peer production works and find opportunities to contribute, and show how the effectiveness depends on the newcomer’s context. Lastly, we show how the Growth team has continued developing features that further improve the newcomer experience while adapting to community needs.

April 2024

[edit]
Time
Wednesday, April 17, 16:30 UTC: Find your local time here
Theme
Supporting Multimedia on Wikipedia

April 17, 2024 Video: YouTube

Towards image accessibility solutions grounded in communicative principles
By Elisa Kreiss
Images have become an omnipresent communicative tool -- and this is no exception on Wikipedia. However, the undeniable benefits they carry for sighted communicators turns into a serious accessibility challenge for people who are blind or have low vision (BLV). BLV users often have to rely on textual descriptions of those images to equally participate in an ever-increasing image-dominated online lifestyle. In this talk, I will present how framing accessibility as a communication problem highlights important ways forward in redefining image accessibility on Wikipedia. I will present the Wikipedia-based dataset Concadia and use it to discuss the successes and shortcomings of image captions and alt texts for accessibility, and how the usefulness of accessibility descriptions is fundamentally contextual. I will conclude by highlighting the potential and risks of AI-based solutions and discussing implications for different Wikipedia editing communities.


Automatic Multi-Path Web Story Creation from a Structural Article
By Daniel Nkemelu
Web articles such as Wikipedia serve as one of the major sources of knowledge dissemination and online learning. However, their in-depth information--often in a dense text format--may not be suitable for mobile browsing, even in a responsive user interface. We propose an automatic approach that converts a structured article of any length into a set of interactive Web Stories that are ideal for mobile experiences. We focused on Wikipedia articles and developed Wiki2Story, a pipeline based on language and layout models, to demonstrate the concept. Wiki2Story dynamically slices an article and plans one to multiple Story paths according to the document hierarchy. For each slice, it generates a multi-page summary Story composed of text and image pairs in visually appealing layouts. We derived design principles from an analysis of manually created Story practices. We executed our pipeline on 500 Wikipedia documents and conducted user studies to review selected outputs. Results showed that Wiki2Story effectively captured and presented salient content from the original articles and sparked interest in viewers.

March 2024

[edit]
Time
Wednesday, March 20, 16:30 UTC: Find your local time here
Theme
Addressing Gender Gaps

Wednesday, March 20, 2023 Video: YouTube

Leveraging Recommender Systems to Reduce Content Gaps on Wikipedia
By Mo Houtti
Many Wikipedians use algorithmic recommender systems to help them find interesting articles to edit. The algorithms underlying those systems are driven by a straightforward assumption: we can look at what someone edited in the past to figure out what they’ll most likely want to edit next. But the story of what Wikipedians want to edit is almost definitely more complex than that. For example, our own prior research shows that Wikipedians prefer prioritizing articles that would minimize content gaps. So, we asked, what would happen if we incorporated that value into Wikipedians’ personalized recommendations? Through a controlled experiment on SuggestBot, we found that recommending more content gap articles didn’t significantly impact editing, despite those articles being less “optimally interesting” according to the recommendation algorithm. In this presentation, I will describe our experiment, our results, and their implications - including how recommender systems can be one useful strategy for tackling content gaps on Wikipedia.


Bridging the offline and online- Offline meetings of Wikipedians

[[1]]

By Nicole Schwitter
Wikipedia is primarily known as an online encyclopaedia, but it also features a noteworthy offline component: Wikipedia and particularly its German-language edition – which is one of the largest and most active language versions – is characterised by regular local offline meetups which give editors the chance to get to know each other. This talk will present the recently published dewiki meetup dataset which covers (almost) all offline gatherings organised on the German-language version of Wikipedia. The dataset covers almost 20 years of offline activity of the German-language Wikipedia, containing 4418 meetups that have been organised with information on attendees, apologies, date and place of meeting, and minutes recorded. The talk will explain how the dataset can be used for research, highlight the importance of considering offline meetings among Wikipedians, and place these insights within the context of addressing gender gaps within Wikipedia.

February 2024

[edit]
Time
Wednesday, February 21, 16:30 UTC: Find your local time here
Theme
Platform Governance and Policies

Wednesday, February 21, 2023 Video: YouTube

Sociotechnical Designs for Democratic and Pluralistic Governance of Social Media and AI
By Amy X. Zhang, University of Washington
Decisions about policies when using widely-deployed technologies, including social media and more recently, generative AI, are often made in a centralized and top-down fashion. Yet these systems are used by millions of people, with a diverse set of preferences and norms. Who gets to decide what are the rules, and what should the procedures be for deciding them---and must we all abide by the same ones? In this talk, I draw on theories and lessons from offline governance to reimagine how sociotechnical systems could be designed to provide greater agency and voice to everyday users and communities. This includes the design and development of: 1) personal moderation and curation controls that are usable and understandable to laypeople, 2) tools for authoring and carrying out governance to suit a community's needs and values, and 3) decision-making workflows for large-scale democratic alignment that are legitimate and consistent.

January 2024

[edit]
Time
Wednesday, January 17, 17:30 UTC: Find your local time here
Theme
Connecting Actions with Policy

January 17, 2023 Video: YouTube

Presenting the report "Unreliable Guidelines"
By Amber Berson and Monika Sengul-Jones
The goal behind the report Unreliable Guidelines: Reliable Sources and Marginalized Communities in French, English and Spanish Wikipedias was to understand the effects of the set of reliable source guidelines and rules on the participation of and the content about marginalized communities on three Wikipedias. Two years following the release of their report, researchers Berson and Sengul-Jones reflect on the impact of their research as well as the actionable next steps.


Why Should This Article Be Deleted? Transparent Stance Detection in Multilingual Wikipedia Editor Discussions
By Lucie-Aimée Kaffee and Arnav Arora
The moderation of content on online platforms is usually non-transparent. On Wikipedia, however, this discussion is carried out publicly and the editors are encouraged to use the content moderation policies as explanations for making moderation decisions. However, currently only a few comments explicitly mention those policies. To aid in this process of understanding how content is moderated, we construct a novel multilingual dataset of Wikipedia editor discussions along with their reasoning in three languages. We demonstrate that stance and corresponding reason (policy) can be predicted jointly with a high degree of accuracy, adding transparency to the decision-making process.