Reference Check is designed to increase the likelihood that newcomers and Junior Contributors who are editing from within Sub-Saharan Africa:
- Publish edits that they are proud of and experienced volunteers consider useful
- Return to edit again in the future
This task involves the work with running an A/B test (or perhaps a multivariate test [i]) to evaluate the extent to which this initial Edit Check has been effective at impacting newcomers and Junior Contributors in the ways described above.
Decision(s) To Be Made
- 1. Decide whether the impact Edit Check is having on users' behavior are positive enough to be made available by default, at all Wikipedias.
Hypotheses
ID | Hypothesis | Metric(s) for evaluation |
---|---|---|
KPI | The quality of new content edits newcomers and Junior Contributors make in the main namespace will increase because a greater percentage of these edits will include a reference or an explicit acknowledgement as to why these edits lack references. | 1) Proportion of published edits that add new content and include a reference or explicit acknowledgement of why a citation was not added, 2) Proportion of published edits that add new content (T333714) and are reverted within 48 hours (or have a high revision risk score) if we use revision risk model (T317700, T343938)) |
Curiosity #1 | Newcomers and Junior Contributors will be more aware of the need to add a reference when contributing new content because the visual editor will prompt them to do so in cases where they have not done so themselves. | Increase in the proportion of newcomers and Junior Contributors that publish at least one new content edit that includes a reference. |
Curiosity #2 | Newcomers and Junior Contributors will be more likely to return to publish a new content edit in the future that includes a reference because Edit Check will have caused them to realize references are required when contributing new content to Wikipedia. | 1) Proportion of newcomers and Junior Contributors that publish an edit Edit Check was activated within and successfully and return to make an unreverted edit to a main namespace during the identified retention period., 2) Proportion of newcomers and Junior Contributors that publish an edit Edit Check was activated within and return to make a new content edit with a reference to a main namespace during the identified retention period. |
Leading indicators
See T352130.
Guardrails
This section describes the metrics we will use to make sure other important parts/dimensions of the "editing ecosystem" are not being negatively impacted by Edit Check. The scenarios named in the chart below emerged through T325851.
ID | Name | Metric(s) for Evaluation |
---|---|---|
1) | Edit quality decrease (T317700) | Proportion of published edits that add new content and are still reverted within 48hours (or have a low revision risk score if we use the revision risk model (T317700)). Will include a breakdown of revert rate of published edits with and without a reference added. |
2) | Edit completion rate drastically decreases | Proportion of edits that are started (event.action = init) and are successfully published (event.action = saveSuccess) |
3) | Edit abandonment rate drastically increases | Proportion of contributors that are presented Edit Check feedback and abandon their edits (indicated by event.action = abort and event.abort_type = abandon). |
4) | People shown Edit Check are blocked at higher rates | Proportion of contributors blocked after publishing an edit where Edit Check was shown |
5) | High false positive or false negative rates | A) Proportion of new content edits published without a reference and without being shown edit check (indicator of false negative) & B) Proportion of contributors that dismiss adding a citation and select "I didn't add new information" or other indicator that their edit doesn't require a citation |
A/B Test: Decision Matrix
ID | Scenario | Indicator(s) | Plan of Action |
---|---|---|---|
1) | Edit Check is disrupting, discouraging, or otherwise getting in the way of volunteers who are attempting to make edits in good faith. Read: people are less likely to publish the edits they start. | Significant drop in edit completion and spike in edit abandonment in | Pause scaling plans; investigate changes to UX |
2) | Edit Check is increasing the likelihood that people will publish destructive edits | Increase in proportion of contributors blocked after publishing an edit where edit check is activated, Increase in proportion of published edits where edit check was activated and are reverted within 48 hours relative to new content edits edit check was NOT activated within. | Pause scaling plans, review edits to try to identify pattern in abuse and propose changes to UX to mitigate them |
3) | Edit Check is causing people to publish edits that align with project policies | Increase in the proportion of edits edit check was activated within that include a reference and are not reverted within 48 hours relative to new content edits without a reference edit check was NOT activated within | Move forward with scaling plans |
4) | Edit Check is effective at causing people to accompany new content edits that include a reference, but those references are unreliable | Increase in the proportion of published edits edit check was activated within that include a reference and increase or no change in the proportion of these edits that are reverted within 48 hours | Block scaling plans on reference reliability work (T276857) |
5) | Edit Check is not effective at causing people to accompany new content edits that include a reference but is not disrupting to volunteers. | No change or decrease in the proportion of published edits edit check was activated within that include reference and A) no significant drop in edit completion or abandonment rate or B) no significant spike in block or revert rate | Move forward with scaling plans |
i. Where a "multivariate test" in this context could look like tests wherein we compare: A) multiple variations of Reference Check user experiences or B) people who are shown the source editor by default, to people who are shown VE by default, and people who are shown VE by default with Edit Check activated, as @MNeisler and @DLynch raised offline
ii. See T331582#9132480
iii. Being able to distinguish edits made in good faith from those made in bad faith depends on T343938
iv. Per the reasons @MNeisler discovered and named in T343938#9368298, it is not feasible to use the revert risk model to assess whether an edit was made in good or bad faith: "This would require us to determine if it is a good-faith edit session while the user is attempting an edit, which is not feasible yet per engineering constraints @Pablo mentioned in T343938#9082581. The revision risk model requires a revision ID, which is only stored with published edits."
v. Per discussions with the Editing team, we have decided not to include the reference reliability check in this AB test. We we will review the impact of this feature in a separate deployment.