Author: visitor
Description:
Problem:
The conventions used by MediaWiki for dealing with uploaded images seem to result in the uploaded images and their description pages not being indexed by Google by default.
Suggested fix:
Adding an optional configuration switch that can force the default link for every thumbnail to be the URL of the original file rather than its description page.
Thumbnails created using "File:" already include a small additional icon that always points to the description page, so there would still be description page links alongside each thumbnail ... however, we'd need to apply these description-page icons to the auto-generated thumbnails that appear on Category pages, too.
Not all wiki owners would want this change, so it'd need to be "opt-in".
Presumed (speculated) reason for failure:
Normal default behaviour on a website is for a clickable thumbnail image to link directly to the full version of the file.
MediaWiki breaks this convention in order to have an intermediate page to hold additional metadata and history information about the image. Unfortunately, the URL for this additional page ends in an image file identifier (''e.g.'' ".jpg"), which means that search engines may have an understandable tendency to assume that the resource being linked to is an image file rather than HTML/XML.
It seems that the default behaviour for the search engine is then NOT to attempt to explore the innards of the "faux .jpg" file but to pass the URL to its image-indexing routines. These then attempt to load the file that corresponds to the description page, recognise immediately from the header that this is ''not'' an image file - and discard it.
This can result in a three-way failure: (1) the nice description page with image preview and copies of all the metadata as text, and with additional written descriptions, is ignored by the search engine because it appears to be a malformed (and potentially malicious) image file: (2) the original full-size image file with embedded metadata is also not indexed because Google never gets to read a page that links directly to it, and (3) the article thumbnail ''is'' indexed, but is low-quality and low resolution, inherits no embedded metadata, and might be flagged by the search engine as being associated with a bad (and potentially broken or malicious) link, so it gets assigned a poor ranking.
In the normal course of affairs, Google will never get to find out that the original image files exist. Google also can’t read the Wiki’s thumbnail image listings ( which ''do'' contain direct links to the images), because these are automatically given a NOINDEX tag, which specifically tells Google not to index them, and this flag doesn't seem to be overridable.
Presumed (speculated) reason for Google's behaviour:
We can argue that this problem is not down to a bug with MediaWiki, and is instead Google's fault - shouldn't Google analyse pages based on content rather than on apparent filename suffixes?
However, Google can counter-argue that ignoring apparent filenames would make their search routines less efficient, that authors should be encouraged to use appropriate filetype suffixes for their files, and that since maliciously-constructed JPG files are a known vector for malware, that perhaps there's even an argument that perhaps Google ''should'' be deliberately boycotting URLs that suggest that they lead to image files (but don't), on principle.
In any case, search engine optimisation is the job of a webpage author not Google, and if we decide to make our web-pages operate in a way that is misleading and results in pages not being crawled, then that's our problem rather than Google's.
See also T6421: Image file extension should not be part of the name
Partial temporary workarounds:
A wiki’s owner can add direct (Google-followable) links to point to the original image files themselves, either (1) by manually compiling a separate listings page with the direct links (which includes images but is missing any surrounding referential context), (2) by manually using the LINK= property for each individual manually-embedded thumbnail (which can involves a lot of extra work), or (3) by replacing MediaWiki’s "File:" link syntax with a custom thumbnail template that includes both a link to the image description page, and a direct link to the original image.
However, the "Link=" override method still doesn't solve the problem of creating corresponding direct links for the Category-page thumbnails generated by Mediawiki.
If a wiki is used partly as a storage system for a lot of large high-quality images, then its quite possible that many of those images will mainly be be accessed via category page thumbnails and may not have separate additional embedded thumbnails that the "Link=" override can be applied to - we still need some way of telling MediaWiki that we want it to create Google-followable paths from the category page thumbnails to the full image files.
Implementation
The suggested fix would be to have a switch that makes image thumbnails link directly to the original files, regardless of whether they were created within the body of an article using the File: syntax, or were automatically generated near the end of a "Category" page.
A secondary link would then be provided either hanging below, on or by the thumbnail to point to the description page. This secondary link already exists for thumbnails generated by "File:", but if the new global override feature was implemented, a similar "info page" link icon would need to be added to Category page thumbnails.
Possible enhanced implementation
If the bug-fixer wanted to be especially creative, the flag could support multiple options, for instance, to allow a choice of icon and icon placement – a wiki owner could then choose to specify, say, that an "INFO" strip icon sits below every thumbnail, or that a red dot icon or a "page corner curl" icon floats superimposed on top of the bottom right corner of every thumbnail image to link to the description page, while the rest of the exposed thumbnail links to the image itself.
Although the priority for this fix would be to allow a wiki’s administrator to solve the current problem with Google not indexing full images (without really changing the look of the pages), an "enhanced" implementation with choice of infopage icon and position would give MediaWiki additional visual customisation options.
Version: 1.22.0
Severity: normal