Jump to content

Wikipedia:Pruning article revisions

From Wikipedia, the free encyclopedia
Like a busy Wikipedia editor pruning an article, this arborist is doing a serious trimming on a tree.

Although Wikipedia, today, has fewer than 7 million articles (plus millions of red-link articles), the total number of revisions is over 1.3 billion. Currently, the article count is 6,915,627 articles, with 1,254,679,641 total revisions, giving an average[1] of 181 revisions per article.

Sometimes people are worried that the number of articles or edits is a problem. It isn't, but it is friendly to your fellow humans to try to make it easy to use the article histories in a productive way. One way to do that is to avoid unnecessarily large numbers of revisions for a change, while another is to use more revisions than strictly required so that your edit comments can clearly say what you are doing and why.

Technical issues and costs

[edit]

The revision count is not a technical or cost problem. The Wikimedia servers combine old versions into large batches and then compress them. Because so much is the same between revisions the compression produces a huge reduction in storage space. The storage space is almost cost-free because starting in late 2004 and early 2005 Wikimedia uses SATA hard disks on some of the web server for this work. The SATA disks are extremely cheap and because the servers are already used for page building there is insignificant additional cost there. As with the main database servers, several copies of each set are kept so that failure of one machine will not cause trouble.

Some people mistakenly believe that deleting articles or revisions saves space somewhere. Wikipedia keeps all old versions of articles and versions, including for deleted articles. No space is saved by deleting. Editors with the basic rights cannot see these versions and might wrongly believe that they are gone. Administrators can see most of them, and those with Oversight permission can also see the few that are deleted for concealment from most administrators.

There is a small cost for each edit. A tiny amount of storage is used for metadata and summary information. The edit also has to be sent to the slave database servers. At extreme edit rates this can sometimes cause short delays in the most recent revision being available but today the servers used are fast enough and the processes are streamlined enough that this is not the problem it used to be sometimes back in 2004/5. The issues with this have essentially been engineered out of the Mediawiki design as they were encountered.

[edit]

The licenses used by Wikipedia require that every revision is saved. If there was wholesale complete removal of revisions, instead of just hiding some, the article would become a copyright infringement and would need to reverted to a blank page and rewritten. Thoughts of copying article text and then deleting the original are impractical for this reason. Don't do it, complying with copyright law is something that is taken very seriously here.

Problems with numerous revisions

[edit]

While there are no technical issues with lots of revisions, there are some human issues with careless use of lots of unnecessary revisions.

Article quality

[edit]

An edit history that is clogged with experimental or "junk" edits may become confusing to humans who are trying to work out what happened and when. An editor who makes multiple edits to an article in an attempt to achieve their final plan may view any edits before the final one as temporary revisions that will not remain very long. Sometimes, it is not easy or even possible to get the permanently planned revision made in a single edit. This can be the case when the edit contains a huge amount of information, or when it is difficult to enter all the text at once. While Wikipedia is a work in progress and there is no deadline for completion it is good to make life easier for those looking at the history to work out what was done in each edit and why.

If sensible, consider the following steps, none of which is required, but which might sometimes make life easier for others:

  1. Combine multiple edits together, as one "SAVE" operation. Use the article preview feature to see your work in progress.
  2. Don't compromise the clarity of the edit description; this is the first thing that those looking at the history will see. There are two solutions:
    • If this requires a lengthy summary, post the lengthy description to the talk page as a new topic/section; and (assuming you edit the article before the talk page) in the article revision's edit summary write something like "Edited per [[Talk:Article#Topic|Talk]], coming shortly."
    • Or, if multiple edits are the best way to make what you are doing clear, use multiple edits.
  3. If you are reverting vandalism, see whether there are other changes you should make at the same time. Any out of date links, typos?
  4. Create new articles offline or in a sandbox, then copy online only for previewing.
  5. Avoid running fixer-bots too often and beware trivial auto-updates, especially for minor words or auto-correcting grammar in vandalism jokes.

See also

[edit]
 

Notes

[edit]
  1. ^ The average revisions per article is calculated as the total revisions divided by total pages: avg = #total_edits / #pages, as:
    {{NUMBEROFEDITS:R}} / {{NUMBEROFARTICLES:R}}.