Page MenuHomePhabricator

Migrate servers in codfw racks D7 & D8 from asw to lsw
Closed, ResolvedPublic

Description

Currently scheduled for Thurs Sept 19th 2024 16:00 UTC

As part of the scheduled refresh of switch equipment in codfw rows C and D we need to move the network connections for servers in racks D7 and D8 from the old to new switch.

Hosts in this rack are managed by the following teams:

Collaboration Services
Core Platform
Data Persistence
Data Platform
Infrastructure Foundations
Observability
Search Platform
ServiceOps
Traffic
WMCS

A full list of the specific hosts can be found below. We will use the sheet to plan the moves and co-ordinate with other SRE teams on actions required to ensure things go smoothly:

https://docs.google.com/spreadsheets/d/16xoZuDeC_-o6s70uEMnvdgn4BlT1f8__WPYprRuduIA#gid=1116984552

Server links will be moved one-by-one from old to the new switch. So no two hosts will be offline at once.

Based on previous experience each host is likely to only lose comms for ~10 seconds. It is inevitable that a small number of the new cables do not work, however, or there is some minor glitch in the move. So it is possible in an edge case that a host will be offline for 2-3 minutes. On previous occasions this happened with about 1 out of 20 hosts.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Traffic hosts (cp2041/cp2042) are depooled.

Mentioned in SAL (#wikimedia-operations) [2024-09-19T15:38:15Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'depool db2131 db2152 db2173 db2174 db2181 db2182 db2195 db2219 db2220 es2040 - T373105', diff saved to https://phabricator.wikimedia.org/P69336 and previous config saved to /var/cache/conftool/dbconfig/20240919-153815-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T15:38:19Z] <arnaudb@cumin1002> START - Cookbook sre.hosts.downtime for 0:30:00 on 10 hosts with reason: network maintenance T373105

Mentioned in SAL (#wikimedia-operations) [2024-09-19T15:38:28Z] <arnaudb@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 10 hosts with reason: network maintenance T373105

all data-persistence hosts have been depooled and downtimed

Icinga downtime and Alertmanager silence (ID=a040f2d9-1940-4aba-bd29-efa9aeec87fb) set by cmooney@cumin1002 for 0:30:00 on 22 host(s) and their services with reason: Move server uplinks in codfw rack D7

backup2007.codfw.wmnet,cloudbackup2004.codfw.wmnet,cp[2041-2042].codfw.wmnet,elastic[2060,2067-2068,2086,2108-2109].codfw.wmnet,ganeti2044.codfw.wmnet,kafka-stretch2002.codfw.wmnet,logging-sd2004.codfw.wmnet,mc[2053-2054].codfw.wmnet,ms-be[2056,2059,2073,2080].codfw.wmnet,thanos-be2004.codfw.wmnet,wdqs[2012,2022].codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:11:09Z] <topranks> migrating server uplinks in codfw rack D7 to new top-of-rack switch T373105

Icinga downtime and Alertmanager silence (ID=9d0dd9cc-ca9d-4736-b81c-6f32f4a0772d) set by cmooney@cumin1002 for 0:25:00 on 25 host(s) and their services with reason: Move server uplinks in codfw rack D8

arclamp2001.codfw.wmnet,bast2003.wikimedia.org,conf2006.codfw.wmnet,db[2131,2152,2173-2174,2181-2182,2195,2219-2220].codfw.wmnet,es2040.codfw.wmnet,ganeti2018.codfw.wmnet,gerrit[2002-2003].wikimedia.org,kubernetes[2052-2053].codfw.wmnet,mw2282.codfw.wmnet,parse[2018-2020].codfw.wmnet,puppetdb2003.codfw.wmnet,restbase[2023,2035].codfw.wmnet

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:17:32Z] <topranks> migrating server uplinks in codfw rack D8 to new top-of-rack switch T373105

All hosts have been moved and all now responding to ping again.

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:25:21Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2131 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69337 and previous config saved to /var/cache/conftool/dbconfig/20240919-162521-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:25:27Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69338 and previous config saved to /var/cache/conftool/dbconfig/20240919-162526-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:25:31Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2173 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69339 and previous config saved to /var/cache/conftool/dbconfig/20240919-162531-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:25:36Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2174 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69340 and previous config saved to /var/cache/conftool/dbconfig/20240919-162536-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:25:41Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69341 and previous config saved to /var/cache/conftool/dbconfig/20240919-162541-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:25:46Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2182 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69342 and previous config saved to /var/cache/conftool/dbconfig/20240919-162546-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:25:51Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2195 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69343 and previous config saved to /var/cache/conftool/dbconfig/20240919-162551-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:25:56Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2219 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69344 and previous config saved to /var/cache/conftool/dbconfig/20240919-162556-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:26:01Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69345 and previous config saved to /var/cache/conftool/dbconfig/20240919-162601-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:26:07Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2040 (re)pooling @ 25%: T373105', diff saved to https://phabricator.wikimedia.org/P69346 and previous config saved to /var/cache/conftool/dbconfig/20240919-162606-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:27:43Z] <sukhe@puppetmaster1001> conftool action : set/pooled=yes; selector: name=(cp2041|cp2042).codfw.wmnet [reason: T373105 is done]

ms-nodes all good; thanos-be2004 seems OK (but checking that picked up an unrelated replication issue on thanos-be2002).

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:34:27Z] <effie> restarting confd on all servers on: codfw, ulsof, eqsin - T373105

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:40:27Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2131 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69347 and previous config saved to /var/cache/conftool/dbconfig/20240919-164026-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:40:32Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69348 and previous config saved to /var/cache/conftool/dbconfig/20240919-164031-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:40:37Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2173 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69349 and previous config saved to /var/cache/conftool/dbconfig/20240919-164036-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:40:42Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2174 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69350 and previous config saved to /var/cache/conftool/dbconfig/20240919-164041-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:40:47Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69351 and previous config saved to /var/cache/conftool/dbconfig/20240919-164046-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:40:52Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2182 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69352 and previous config saved to /var/cache/conftool/dbconfig/20240919-164051-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:40:57Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2195 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69353 and previous config saved to /var/cache/conftool/dbconfig/20240919-164056-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:41:02Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2219 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69354 and previous config saved to /var/cache/conftool/dbconfig/20240919-164101-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:41:07Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2220 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69355 and previous config saved to /var/cache/conftool/dbconfig/20240919-164106-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:41:12Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2040 (re)pooling @ 50%: T373105', diff saved to https://phabricator.wikimedia.org/P69356 and previous config saved to /var/cache/conftool/dbconfig/20240919-164111-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:55:32Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2131 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69357 and previous config saved to /var/cache/conftool/dbconfig/20240919-165531-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:55:37Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69358 and previous config saved to /var/cache/conftool/dbconfig/20240919-165537-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:55:42Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2173 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69359 and previous config saved to /var/cache/conftool/dbconfig/20240919-165542-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:55:47Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2174 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69360 and previous config saved to /var/cache/conftool/dbconfig/20240919-165546-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:55:52Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69361 and previous config saved to /var/cache/conftool/dbconfig/20240919-165551-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:55:57Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2182 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69362 and previous config saved to /var/cache/conftool/dbconfig/20240919-165556-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:56:02Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2195 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69363 and previous config saved to /var/cache/conftool/dbconfig/20240919-165602-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:56:07Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2219 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69364 and previous config saved to /var/cache/conftool/dbconfig/20240919-165606-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:56:12Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2220 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69365 and previous config saved to /var/cache/conftool/dbconfig/20240919-165611-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T16:56:17Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2040 (re)pooling @ 75%: T373105', diff saved to https://phabricator.wikimedia.org/P69366 and previous config saved to /var/cache/conftool/dbconfig/20240919-165617-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:10:37Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2131 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69367 and previous config saved to /var/cache/conftool/dbconfig/20240919-171037-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:10:43Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69368 and previous config saved to /var/cache/conftool/dbconfig/20240919-171042-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:10:47Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2173 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69369 and previous config saved to /var/cache/conftool/dbconfig/20240919-171047-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:10:52Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2174 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69370 and previous config saved to /var/cache/conftool/dbconfig/20240919-171052-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:10:58Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69371 and previous config saved to /var/cache/conftool/dbconfig/20240919-171057-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:11:02Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2182 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69372 and previous config saved to /var/cache/conftool/dbconfig/20240919-171101-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:11:08Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2195 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69373 and previous config saved to /var/cache/conftool/dbconfig/20240919-171107-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:11:12Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2219 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69374 and previous config saved to /var/cache/conftool/dbconfig/20240919-171112-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:11:17Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db2220 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69375 and previous config saved to /var/cache/conftool/dbconfig/20240919-171117-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-09-19T17:11:23Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'es2040 (re)pooling @ 100%: T373105', diff saved to https://phabricator.wikimedia.org/P69376 and previous config saved to /var/cache/conftool/dbconfig/20240919-171122-arnaudb.json

cmooney claimed this task.