Page MenuHomePhabricator

Emit lint error or category when a page uses duplicate HTML IDs
Closed, ResolvedPublic

Description

Spurred by a discussion at en.WP about citation templates emitting duplicate HTML IDs, it would be nice if we could find pages which have duplicate IDs so that we can fix them. That means probably a maintenance category or a lint error.

See also

Event Timeline

Change 493116 had a related patch set uploaded (by Farida; owner: Farida):
[mediawiki/services/parsoid@master] Emit lint error when a page has duplicate HTML IDs

https://gerrit.wikimedia.org/r/493116

ssastry triaged this task as Medium priority.Mar 11 2019, 5:04 PM
cscott subscribed.

This reappeared as related to T358588 and maybe we should re-triage this as maintenance work for Content-Transform-Team .

ihurbain moved this task from Needs Triage to Linting on the Parsoid board.
ihurbain removed a project: Content-Transform-Team.
ihurbain removed a project: MediaWiki-Parser.

Change #493116 abandoned by Subramanya Sastry:

[mediawiki/services/parsoid@master] Emit lint error when a page has duplicate HTML IDs

Reason:

No longer relevant -- partial patch and we are also in PHP land now.

https://gerrit.wikimedia.org/r/493116

Change #1073572 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/Linter@master] Add a "duplicate-ids" lint category

https://gerrit.wikimedia.org/r/1073572

Change #1073574 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/services/parsoid@master] Lint duplicate ids

https://gerrit.wikimedia.org/r/1073574

Change #1074253 had a related patch set uploaded (by C. Scott Ananian; author: Arlolra):

[mediawiki/extensions/Linter@wmf/1.43.0-wmf.23] Add a "duplicate-ids" lint category

https://gerrit.wikimedia.org/r/1074253

Change #1073572 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Add a "duplicate-ids" lint category

https://gerrit.wikimedia.org/r/1073572

Change #1074253 merged by jenkins-bot:

[mediawiki/extensions/Linter@wmf/1.43.0-wmf.23] Add a "duplicate-ids" lint category

https://gerrit.wikimedia.org/r/1074253

Mentioned in SAL (#wikimedia-operations) [2024-09-19T20:45:58Z] <dreamyjazz@deploy1003> Started scap sync-world: Backport for [[gerrit:1073871|Re-order arguments to DataAccess::addTrackingCategory]], [[gerrit:1074253|Add a "duplicate-ids" lint category (T200517)]]

Mentioned in SAL (#wikimedia-operations) [2024-09-19T21:00:53Z] <dreamyjazz@deploy1003> dreamyjazz, cscott: Backport for [[gerrit:1073871|Re-order arguments to DataAccess::addTrackingCategory]], [[gerrit:1074253|Add a "duplicate-ids" lint category (T200517)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-09-19T21:15:19Z] <dreamyjazz@deploy1003> Finished scap sync-world: Backport for [[gerrit:1073871|Re-order arguments to DataAccess::addTrackingCategory]], [[gerrit:1074253|Add a "duplicate-ids" lint category (T200517)]] (duration: 29m 20s)

Change #1073574 merged by jenkins-bot:

[mediawiki/services/parsoid@master] Lint duplicate ids

https://gerrit.wikimedia.org/r/1073574

Change #1075077 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a22

https://gerrit.wikimedia.org/r/1075077

Change #1075077 merged by jenkins-bot:

[mediawiki/vendor@master] Bump wikimedia/parsoid to 0.20.0-a22

https://gerrit.wikimedia.org/r/1075077

For some reason this new category is timing out for me on en.wp with 100k results. Other categories with more don't (such as misnested tags which has 300k). Is the category still filling, or is there something weird going on? (I can work around it by selecting a namespace in the URL, so it's not a critical problem, but not everyone is going to know that is viable.)

Separate comment: I think the name could use some adjustment. These aren't "Duplicate Ids" (see also https://en.wikipedia.org/wiki/Id,_ego_and_superego#Id ), they're (either) "duplicate IDs" or "duplicate id attributes". ("duplicate ids" is almost as bad as "duplicate Ids")

Can someone please perform the rest of the "add a new Linter condition" checklist before closing this ticket? The new condition needs to be added to the lists in which Linter errors appear, including the Page Information entry for each page, and the necessary documentation and help pages need to be completed.

Izno's comment is also correct; this problem does not relate to Freudian psychology. It's better to fix it now than to wait until reports and other systems depend on a suboptimal naming choice.

including the Page Information entry for each page

Do you have an example where it isn't showing up? It appears to be working here,
https://www.mediawiki.org/wiki/Extension:Scribunto?action=info#Lint_errors

Change #1076048 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/Linter@master] Change capitalization of duplicate IDs

https://gerrit.wikimedia.org/r/1076048

I think the name could use some adjustment

There's a patch up and this edit was made.

Change #1076048 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Change capitalization of duplicate IDs

https://gerrit.wikimedia.org/r/1076048

The new condition needs to be added to the lists in which Linter errors appear, ... , and the necessary documentation and help pages need to be completed.

These edits have been made,
https://www.mediawiki.org/w/index.php?title=Help%3ALint_errors%2Fduplicate-ids&diff=6772997&oldid=6772988
https://www.mediawiki.org/w/index.php?title=Help%3ALint_errors&diff=6772876&oldid=6616726

including the Page Information entry for each page

Do you have an example where it isn't showing up? It appears to be working here,
https://www.mediawiki.org/wiki/Extension:Scribunto?action=info#Lint_errors

They just started showing up on en.WP between the time of my comment and right now. Thanks for the response.

For some reason this new category is timing out for me on en.wp with 100k results. Other categories with more don't (such as misnested tags which has 300k). Is the category still filling, or is there something weird going on? (I can work around it by selecting a namespace in the URL, so it's not a critical problem, but not everyone is going to know that is viable.)

Hmm, I imagine what's happening is that the linter_cat_page_position index isn't being used, instead the primary linter_id. Since this is a new category, all the linter_id for its errors will be the newest ones so quite a few rows will need to be scanned before returning the ~50 requested.

Adding a namespace probably forces linter_cat_namespace.

P70205#281191 kind of confirms that.

Doing,

SELECT page_id,page_namespace,page_title,page_is_redirect,page_is_new,page_latest,page_touched,page_len,page_content_model,page_namespace,page_title,linter_id,linter_params,linter_start,linter_end,linter_cat FROM `page` JOIN `linter` FORCE INDEX (linter_cat_page_position) ON ((page_id=linter_page)) WHERE linter_cat = 25 ORDER BY linter_id LIMIT 51;

Goes from,

51 rows in set (47.742 sec)

to,

51 rows in set (1.708 sec)

Change #1080845 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/Linter@master] [WIP] Force using an index when paging by category

https://gerrit.wikimedia.org/r/1080845

Change #1080845 merged by jenkins-bot:

[mediawiki/extensions/Linter@master] Force the use of the category index when paging by category

https://gerrit.wikimedia.org/r/1080845

For some reason this new category is timing out for me on en.wp with 100k results.

Post-deploy, that should no longer be the case.