Page MenuHomePhabricator

Some tables lack unique or primary keys, may allow confusing duplicate data
Closed, ResolvedPublic

Description

All database tables should have PRIMARY KEY's, or at minimum, a UNIQUE index

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.
StatusSubtypeAssignedTask
InvalidNone
DuplicateNone
ResolvedNone
Resolved RobLa-WMF
ResolvedNone
ResolvedLSobanski
ResolvedNone
DeclinedNone
Resolveddaniel
Resolvedmatthiasmullie
ResolvedTTO
ResolvedMarostegui
ResolvedMarostegui
ResolvedAddshore
StalledNone
OpenNone
OpenNone
ResolvedMarostegui
ResolvedReedy
ResolvedReedy
ResolvedMarostegui
OpenLadsgroup
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedMarostegui
ResolvedReedy
ResolvedMarostegui
ResolvedSep 9 2018Marostegui
ResolvedMarostegui
DuplicateNone
ResolvedDannyS712
ResolvedUmherirrender
ResolvedDannyS712
ResolvedDannyS712
DeclinedNone
OpenNone
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Results (including errors):

root@db1075.eqiad.wmnet[(none)]> use etwiki
Database changed
root@db1075.eqiad.wmnet[etwiki]> alter table templatelinks add primary key (tl_from,tl_namespace,tl_title);
Query OK, 0 rows affected (10.64 sec)               
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table pagelinks add primary key (pl_from,pl_namespace,pl_title);
Query OK, 0 rows affected (1 min 22.80 sec)         
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table categorylinks drop key cl_from, add primary key (cl_from,cl_to);
Query OK, 0 rows affected (5.76 sec)                
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table ep_users_per_course drop key ep_users_per_course, add primary key (upc_user_id,upc_course_id,upc_role);
ERROR 1146 (42S02): Table 'etwiki.ep_users_per_course' doesn't exist
root@db1075.eqiad.wmnet[etwiki]> alter table flaggedrevs_tracking drop key from_namespace_title, add primary key (ftr_from,ftr_namespace,ftr_title`);
    `> `\c
root@db1075.eqiad.wmnet[etwiki]> alter table flaggedrevs_tracking drop key from_namespace_title, add primary key (ftr_from,ftr_namespace,ftr_title);
ERROR 1146 (42S02): Table 'etwiki.flaggedrevs_tracking' doesn't exist
root@db1075.eqiad.wmnet[etwiki]> alter table imagelinks drop key il_from, add primary key (il_from,il_to);
Query OK, 0 rows affected (2.61 sec)                
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table iwlinks drop key iwl_from, add primary key (iwl_from,iwl_prefix,iwl_title);
Query OK, 0 rows affected (0.60 sec)                
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table langlinks drop key ll_from, add primary key (ll_from,ll_lang);
Query OK, 0 rows affected (25.97 sec)               
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table log_search drop key ls_field_val, add primary key (ls_field,ls_value,ls_log_id);
Query OK, 0 rows affected (0.18 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table math drop key math_inputhash, add primary key (math_inputhash);
Query OK, 0 rows affected (0.28 sec)                
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table module_deps drop key md_module_skin, add primary key (md_module,md_skin);
Query OK, 0 rows affected (0.14 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table objectcache drop key keyname, add primary key (keyname);
Query OK, 0 rows affected (0.16 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table profiling drop key pf_name_server, add primary key (pf_name,pf_server);
ERROR 1146 (42S02): Table 'etwiki.profiling' doesn't exist
root@db1075.eqiad.wmnet[etwiki]> alter table querycache_info drop key  qci_type, add primary key (qci_type);
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table site_stats drop key ss_row_id, add primary key (ss_row_id);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table text drop key old_id, add primary key (old_id);
Query OK, 0 rows affected (26.58 sec)               
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table transcache drop key tc_url_idx, add primary key (tc_url);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table user_former_groups drop key ufg_user_group, add primary key (ufg_user,ufg_group);
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table user_properties drop key user_properties_user_property, add primary key (up_user,up_property);
Query OK, 149551 rows affected (1.16 sec)              
Records: 149551  Duplicates: 0  Warnings: 0

root@db1075.eqiad.wmnet[etwiki]> alter table vote_log drop key log_id, add primary key (log_id);
ERROR 1146 (42S02): Table 'etwiki.vote_log' doesn't exist

No errors on labs or sanitariums.

Main concerns:

  • Duplication of indexes on tl and pl can be an issue, not only disk space but also in cache overhead
  • *links tables and text will take quite some time to run on large wikis

Above list minus the errors:

alter table templatelinks add primary key (tl_from,tl_namespace,tl_title); -- duplicate index, to be fixed on mediawiki later
alter table pagelinks add primary key (pl_from,pl_namespace,pl_title); -- duplicate index, to be fixed on mediawiki later
alter table categorylinks drop key cl_from, add primary key (cl_from,cl_to);
alter table imagelinks drop key il_from, add primary key (il_from,il_to);
alter table iwlinks drop key iwl_from, add primary key (iwl_from,iwl_prefix,iwl_title);
alter table langlinks drop key ll_from, add primary key (ll_from,ll_lang);
alter table log_search drop key ls_field_val, add primary key (ls_field,ls_value,ls_log_id);
alter table math drop key math_inputhash, add primary key (math_inputhash);
alter table module_deps drop key md_module_skin, add primary key (md_module,md_skin);
alter table objectcache drop key keyname, add primary key (keyname);
alter table querycache_info drop key  qci_type, add primary key (qci_type);
alter table site_stats drop key ss_row_id, add primary key (ss_row_id);
alter table text drop key old_id, add primary key (old_id);
alter table transcache drop key tc_url_idx, add primary key (tc_url);
alter table user_former_groups drop key ufg_user_group, add primary key (ufg_user,ufg_group);
alter table user_properties drop key user_properties_user_property, add primary key (up_user,up_property);

With this change + watchlist + tags, the list of tables without a primary key that are NOT clearly scheduled for dropping are:

cur
hidden
interwiki
l10n_cache
logging_pre_1_10
oldimage
querycache
querycachetwo
searchindex
securepoll_lists
securepoll_msgs
securepoll_properties
site_identifiers
user_newtalk

Most are not too large, but oldimage will cause problem on commons in the future. We'll see.

ls_field_val is used in a FORCE INDEX, which is causing errors like:

[Exception Wikimedia\Rdbms\DBQueryError] (/srv/mediawiki/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php:1075) A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? 
Query: SELECT  log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_comment,log_params,log_deleted,user_id,user_name,user_editcount,(SELECT  GROUP_CONCAT(ct_tag SEPARATOR ',')  FROM `change_tag`    WHERE ct_log_id=log_id  ) AS `ts_tags`  FROM `logging` FORCE INDEX (PRIMARY) LEFT JOIN `user` ON ((log_user=user_id)) INNER JOIN `log_search` FORCE INDEX (ls_field_val) ON ((ls_log_id=log_id))   WHERE log_type = 'delete' AND log_action = 'revision' AND ls_field = 'rev_id' AND ls_value = 'X' AND (log_type != 'suppress') AND log_namespace = 'X' AND log_title = 'X' AND ((log_deleted & 9) != 9)  ORDER BY log_timestamp DESC LIMIT 26  
Function: IndexPager::buildQueryInfo (LogPager)
Error: 1176 Key 'ls_field_val' doesn't exist in table 'log_search' (10.64.0.206)

  #0 /srv/mediawiki/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php(933): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
  #1 /srv/mediawiki/php-1.29.0-wmf.21/includes/libs/rdbms/database/Database.php(1269): Wikimedia\Rdbms\Database->query(string, string)
  #2 /srv/mediawiki/php-1.29.0-wmf.21/includes/pager/IndexPager.php(368): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
  #3 /srv/mediawiki/php-1.29.0-wmf.21/includes/pager/IndexPager.php(225): IndexPager->reallyDoQuery(string, integer, boolean)
  #4 /srv/mediawiki/php-1.29.0-wmf.21/includes/logging/LogPager.php(428): IndexPager->doQuery()
  #5 /srv/mediawiki/php-1.29.0-wmf.21/includes/pager/IndexPager.php(422): LogPager->doQuery()
  #6 /srv/mediawiki/php-1.29.0-wmf.21/includes/logging/LogEventsList.php(623): IndexPager->getBody()
  #7 /srv/mediawiki/php-1.29.0-wmf.21/includes/specials/SpecialRevisiondelete.php(224): LogEventsList::showLogExtract(OutputPage, string, Title, string, array)
  #8 /srv/mediawiki/php-1.29.0-wmf.21/includes/specialpage/SpecialPage.php(522): SpecialRevisionDelete->execute(NULL)
  #9 /srv/mediawiki/php-1.29.0-wmf.21/includes/specialpage/SpecialPageFactory.php(578): SpecialPage->run(NULL)
  #10 /srv/mediawiki/php-1.29.0-wmf.21/includes/MediaWiki.php(287): SpecialPageFactory::executePath(Title, RequestContext)
  #11 /srv/mediawiki/php-1.29.0-wmf.21/includes/MediaWiki.php(862): MediaWiki->performRequest()
  #12 /srv/mediawiki/php-1.29.0-wmf.21/includes/MediaWiki.php(523): MediaWiki->main()
  #13 /srv/mediawiki/php-1.29.0-wmf.21/index.php(43): MediaWiki->run()
  #14 /srv/mediawiki/w/index.php(3): include(string)
  #15 {main}

Change 351653 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/core@master] Use IGNORE INDEX(ls_log_id) instead of FORCE INDEX(ls_field_val)

https://gerrit.wikimedia.org/r/351653

Mentioned in SAL (#wikimedia-operations) [2017-05-03T15:13:06Z] <catrope@naos> Synchronized php-1.29.0-wmf.21/includes/logging/LogPager.php: Replace FORCE INDEX(ls_field_val) with IGNORE INDEX(ls_log_id) (https://gerrit.wikimedia.org/r/#/c/351653/ for T17441) (duration: 01m 14s)

Change 351653 merged by jenkins-bot:
[mediawiki/core@master] Use IGNORE INDEX(ls_log_id) instead of FORCE INDEX(ls_field_val)

https://gerrit.wikimedia.org/r/351653

Change 351705 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/core@wmf/1.29.0-wmf.21] Use IGNORE INDEX(ls_log_id) instead of FORCE INDEX(ls_field_val)

https://gerrit.wikimedia.org/r/351705

Change 351705 merged by jenkins-bot:
[mediawiki/core@wmf/1.29.0-wmf.21] Use IGNORE INDEX(ls_log_id) instead of FORCE INDEX(ls_field_val)

https://gerrit.wikimedia.org/r/351705

Change 370190 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[mediawiki/core@master] tables.sql: Convert UNIQUE keys into PK

https://gerrit.wikimedia.org/r/370190

I would like someone to help getting this: https://gerrit.wikimedia.org/r/#/c/370190/ merged and deployed.

To sum up, we have finished converting UNIQUE into PRIMARY keys and we would like this to be reflected on tables.sql so future wikis are created with the same PKs that we have deployed in core.

Thanks!

So... After 1.30.0-wmf.17 is everywhere, our DBA can drop the other two primary indexes that were blocked on a code change

ALTER TABLE /*_*/pagelinks DROP INDEX /*i*/pl_from;
ALTER TABLE /*_*/templatelinks DROP INDEX /*i*/tl_from;

hitcounter removed from core. externalinks now has el_id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,

I suspect archive and oldimage should be left alone due to the other major db overhauls to be done...

So querycache querycachetwo user_newtalk probably still want sorting?

Thank you very much @Reedy , I will create a separate task for deployment tracking- so we do not spam wmf-deployment related tasks on this mediawiki task: T174509

@Reedy I understand that T162774 should be resolved, too?

@Reedy I understand that T162774 should be resolved, too?

I think so, yup. Will do so

I suspect archive and oldimage should be left alone due to the other major db overhauls to be done...

Archive should have ar_id already- if it doesn't have on wmf, it would be a deployment error, but I belive that is not the case or at least it is not widespread.

For image I would ask for an estimation to reformat the image tables -if anyone plans to work on it soon-, if it is long term it may be reasonable to deploy a PK shorter term (it may even facilitate further schema changes). Edit: it already has img_name ?

I have yet to see querycache querycachetwo user_newtalk, but I suspect they may be much easier to fix and deploy, and not much of a problem.

Image does have img_name

oldimage also has oi_name, but in this case, it's not unique, as an image can have multiple revisions... oi_archive_name may work (but is quite a long string)

querycache has a patch... querycachetwo doesn't yet, neither does user_newtalk (need to decide what to use as a PK, likely a composite)

If we use qc_type, qc_value for querycache... We can probably use qcc_value, qcc_type for querycachetwo

Oh, sorry, I confused image and oldimage, of course, silly me.

Still a work in progress.

LSobanski claimed this task.
LSobanski subscribed.

This task is too large in scope. Tasks for individual tables exist and can be tackled separately.