The most recent version of each image is stored in the image table, with historic versions stored in the oldimage table. This structure resembles cur and old, with all the nastiness that comes with it. In MW 1.5 we ditched cur/old in favor of page/revision, but we never did a similar thing for files.
We talked about this a bit at the 2011 Amsterdam Hackathon and decided that, while it's complex and hairy, we want to do it at some point in the future.
RFC
- Task: T589: RFC: image and oldimage tables
- Status: Approved
- https://www.mediawiki.org/wiki/Requests_for_comment/image_and_oldimage_tables
In a nut shell
Currently, the oldimage table contains information about non-current file revisions, and the image table contains information about the current file revisions as well as the unversioned information about a file in general. This means that to query all revisions, MediaWiki has to query two tables. It also means that whenever a file gains a new revision, the "current" data has to be moved across from one table to another.
The approved changes will rename oldimage to filerevision and add the missing rows for the current file revisions. Then, the image (now: file) table can be reduced to just containing the generic file information. This matches the way the page and revision tables have been organised in the database since MediaWiki 1.5 (before the page/revision tables existed, MW used curpage/oldpage tables which is similar to these old image tables, and is why today we still have a notion of curid=pageid and oldid=revid).
Approved changes
- Add fields:
- img_id: New primary key for image.
- img_latest: Pointer to oldimage.oi_id for the current revision of a file (similar to how page.page_latest points to revision.rev_id)
- oi_id: New primary key for oldimage.
- oi_img: Pointer to image.img_id for the file this revision corresponds to (similar to how revision.rev_page points to page.page_id)
- Rename tables:
- Rename image to file.
- Rename oldimage to filerevision.
- Add missing rows in filerevision for current revisions, based on rows from image.
- Reduce fields in file to only be the minimum amount of derived data we need indexed for the "current" revision. Currently indexed: img_timestamp, img_user_text, img_sha1, img_media_type, img_major_mime, img_minor_mime, img_size.
- img_timestamp: Remove. Only used for file history, redundant now with the missing rows now present in filerevision.
- img_user_text: Remove. Only used for file history (ApiQueryAllImages, SpecialMIMEsearch), these queries can instead query or join filerevision directly.
- img_sha1: Keep. Needed for reverse look-up in duplication detection on current file revisions.
- img_media_type img_major_mime, img_minor_mime: Keep. Needed for SpecialMIMESearch. Note that (while out of scope of this task), there has been talk about creating a separate RFC for removing these fields in favour of a SearchEngine-based approach.
- img_size: Remove. Not needed as indexed field. Only used by checkImages.php and Special:MediaStatistics (SUM query), these can directly query/join against filerevision.
- img_width, img_height: Remove. Not needed as indexed field. Only existed because oldimage didn't contain current revisions, these fields are redundant in filerevision.
- img_bits: Remove. Not needed. Redundant now with filerevision.
- img_description: Remove. Not needed. Redundant now with filerevision.
- img_user: Remove. Not needed. Redundant now with filerevision. Used by ApiQueryAllImages and NewFilesPager (SpecialNewimages) which could query/join filerevision instead. Though, should probably use recentchanges.rc_bot instead. (unrelated)
Proposed migration strategy
Exact schema migration script to be written as part of the implementation and fine-tuned as needed during code review. We may need two separate strategies due to the size of the migration (one for the default db updater, and an opt-in maintenance script for large farms such as Wikimedia Foundation). A few ideas so far:
- Tim Starling, T589#2747454:
- Rename image to filerevision. Create a view called image.
- Add new fields to filerevision.
- Create file table.
- (Disable uploads.)
- Populate file from filerevision.
- Also move oldimage rows into filerevision. (These will be made invisible from the image view.)
- (Deploy new MediaWiki version that uses file/filerevision.)
- (Re-enable uploads)
- Drop image and oldimage.
- Jaime Crespo mentioned the idea of potentially doing the migration offline while serving traffic from codfw instead of eqiad.