Page MenuHomePhabricator

Fix ‎very slow performance of Gadget-Cat-a-lot.js
Open, Needs TriagePublic

Assigned To
Authored By
Zache
Sep 23 2024, 5:25 AM
Referenced Files
F57660031: image.png
Oct 30 2024, 4:47 PM
Tokens
"Love" token, awarded by waldyrious."Like" token, awarded by Yann."Like" token, awarded by Prototyperspective.

Description

Originally Cat-a-lot.js made submitted all edits parallel. This caused spikes which caused outages (see T370304). Fix for that was to add 1s delay between edits which made tool very slow.

Basic idea of Cat-a-lot is that user can select files in category view and then add, remove or move them in categories. Usage profile is that mostly users are usually selecting some files (in category view there is max 200 files) and then move them. However, in edge cases user could also do thousands edits per minute with average of 16 successful edits per second.

Proposed fix for this is to limit the concurrent edits to 5 and if maxlag is higher than 1.5s then limit it to 1. This would allow reasonable fast user experience when there is no high load and prevent choking the system with large automated edit streaks.

Example code

Event Timeline

I suggest starting with a lower number of concurrent edits (let's say two) and then bump it up slowly while we are monitoring the databases.

We can do so, but just as a note.

Based on the edits in the revision table, SDC bots continuously perform edits at a higher rate than two concurrent edits without any issues, and their total edit rate is multiple times higher than what was observed with Cat-a-lot. The system was able also to handle short bursts of 200 simultaneous edits (ie. "select all" and then "add/remove cats"), and if that had been a problem, it would have been noticed more frequently.

Based on edits that caused the problems in August was an editing spike with Cat-a-lot, which was five times larger than the daily baseline from Cat-a-lot (180k edits as total in that day), overwhelming Commons. However, SDC bots can do over 500k edits per day with high as 1M as combined total on Commons per day, so the number of edits itself was not the issue. The problem was that Cat-a-lot was coded to continuously make requests as fast as possible over an extended period of time. Most likely, any limit would prevent this, and a higher limit, even more than 5, would not cause any issues.

The biggest difference between Cat-A-Lot and SDC is that CAL makes edits that trigger more locks. They lock rows in category table to update number of members (both removal and additions) and they deadlock or cause lock contention.

just FYI, plan is to move this version to in use at next monday with two concurrent edits.

Changelog to current version

  • Update Cat-a-lot to use libAPI for editing to manage number of parallel edits.
  • Fixing the Special:Search selection user interface
  • Fixing the incorrect dialog height bug

Just as a update, update is live now.

@Ladsgroup, if there is no problems how we should handle the gradual increasing of the number of parallel edits? Ie. would it be too fast just to increase it with 1 per week until the number parallel writes is 5?

Also as another idea, in meanwhile I will try to add some integration with QuickCategories so Cat-a-lot user could offload large edits batches to it instead of doing them inside cat-a-lot.

Just as a update, update is live now.

@Ladsgroup, if there is no problems how we should handle the gradual increasing of the number of parallel edits? Ie. would it be too fast just to increase it with 1 per week until the number parallel writes is 5?

Let's wait a bit and see how it goes.

Also as another idea, in meanwhile I will try to add some integration with QuickCategories so Cat-a-lot user could offload large edits batches to it instead of doing them inside cat-a-lot.

That'd be amazing. If we can get this out the door, that'd be awesome. Specially, if you can just dump the work on QC instead.

Based on edit numbers there is nothing suspicious in cat-a-lot edits in Wikimedia Commons.

Ie. there were 2000 edits from cat-a-lot in 2024-10-29 23:00 - 2024-10-29 24:00 (server database time).

No suspicious peaks in editing per minute stats. Editing speed has been about 1-2 edit per seconds max as total.

There is peak in category changes in Wikimedia Commons between 2024-10-29 23:00 - 2024-10-29 24:00 so it could be possible cause though.

There was mass reverting of bot edits done by reguest in admin noticeboard. It was done by using some mass reverting tool as system tagged edits as rollback by non admin user who was added temporarily to noratelimit user group for the task. This caused 35k edits and 60k category changes in hour. Level is similar than edit numbers by cat-a-lot which caused problems.

Thanks for the investigation! 35K edits in an hour is a lot. It didn't cause issues but it is close. T365303: Move update of category members count to CategoryMembershipChangeJob really should happen soon. I try to ask for some people's time on this.