Jump to content

Server Admin Log

From Wikitech

2024-11-09

  • 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply

2024-11-08

  • 23:35 zabe: attach Sotiale's local accounts on newly created wikis
  • 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for T379398 because oad_user=0
  • 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
  • 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
  • 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:18 denisse: disabling Puppet on grafana2001 - T379043
  • 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
  • 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed
  • 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly
  • 20:28 aude@deploy2002: Finished scap sync-world: Backport for Reviving "Update interwiki map" (duration: 10m 19s)
  • 20:24 aude@deploy2002: seddon, aude: Continuing with sync
  • 20:21 aude@deploy2002: seddon, aude: Backport for Reviving "Update interwiki map" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 20:18 aude@deploy2002: Started scap sync-world: Backport for Reviving "Update interwiki map"
  • 20:15 aude@deploy2002: Finished scap sync-world: Backport for Enable Tabular data for test commons (T378127) (duration: 10m 55s)
  • 20:10 aude@deploy2002: aude: Continuing with sync
  • 20:06 aude@deploy2002: aude: Backport for Enable Tabular data for test commons (T378127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:04 aude@deploy2002: Started scap sync-world: Backport for Enable Tabular data for test commons (T378127)
  • 20:02 aude@deploy2002: Finished scap sync-world: Backport for Reopen testcommonswiki for testing Chart extension (duration: 14m 33s)
  • 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: T371400
  • 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: T371400
  • 19:57 aude@deploy2002: aude: Continuing with sync
  • 19:50 aude@deploy2002: aude: Backport for Reopen testcommonswiki for testing Chart extension synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:47 aude@deploy2002: Started scap sync-world: Backport for Reopen testcommonswiki for testing Chart extension
  • 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm
  • 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm
  • 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm
  • 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm
  • 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm
  • 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm
  • 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm
  • 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
  • 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
  • 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm
  • 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
  • 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
  • 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
  • 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
  • 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
  • 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
  • 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
  • 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
  • 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
  • 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
  • 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
  • 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm
  • 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
  • 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
  • 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
  • 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
  • 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm
  • 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
  • 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
  • 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet
  • 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm
  • 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
  • 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm
  • 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm
  • 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm
  • 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm
  • 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm
  • 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm
  • 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm
  • 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm
  • 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm
  • 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm
  • 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
  • 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm
  • 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm
  • 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm
  • 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
  • 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm
  • 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm
  • 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage
  • 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
  • 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001
  • 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
  • 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
  • 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
  • 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
  • 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye
  • 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm
  • 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
  • 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
  • 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
  • 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors
  • 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors
  • 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
  • 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
  • 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
  • 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox
  • 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet
  • 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
  • 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
  • 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
  • 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
  • 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
  • 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm
  • 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm
  • 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm
  • 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
  • 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
  • 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
  • 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
  • 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage
  • 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm
  • 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
  • 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm
  • 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm
  • 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm
  • 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm
  • 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm
  • 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm
  • 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
  • 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm
  • 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
  • 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm
  • 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
  • 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
  • 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye
  • 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
  • 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
  • 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
  • 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
  • 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
  • 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
  • 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
  • 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
  • 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
  • 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
  • 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
  • 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
  • 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
  • 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage
  • 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage
  • 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm
  • 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm
  • 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm
  • 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
  • 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
  • 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye
  • 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm
  • 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm
  • 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
  • 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm
  • 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
  • 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128']
  • 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128']
  • 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158']
  • 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158']
  • 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157']
  • 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157']
  • 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156']
  • 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156']
  • 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156']
  • 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156']
  • 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145']
  • 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145']
  • 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144']
  • 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144']
  • 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144']
  • 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144']
  • 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143']
  • 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143']
  • 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142']
  • 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
  • 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141']
  • 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141']
  • 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140']
  • 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140']
  • 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139']
  • 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139']
  • 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138']
  • 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138']
  • 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137']
  • 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137']
  • 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136']
  • 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136']
  • 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129']
  • 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129']
  • 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128']
  • 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128']
  • 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye
  • 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
  • 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
  • 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye
  • 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment
  • 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye
  • 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage
  • 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage
  • 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
  • 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
  • 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
  • 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
  • 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet
  • 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet
  • 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet
  • 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
  • 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
  • 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
  • 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
  • 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
  • 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel
  • 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye
  • 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
  • 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
  • 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure
  • 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure
  • 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
  • 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
  • 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
  • 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye
  • 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw
  • 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw
  • 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw
  • 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw
  • 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw
  • 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw
  • 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw
  • 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw
  • 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
  • 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw
  • 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw
  • 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw
  • 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw
  • 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw
  • 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw
  • 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw
  • 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw
  • 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw
  • 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw
  • 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw
  • 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw
  • 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw
  • 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C
  • 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw
  • 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw
  • 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw
  • 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw
  • 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C
  • 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw
  • 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw
  • 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw
  • 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw
  • 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw
  • 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw
  • 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad
  • 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad
  • 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad
  • 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad
  • 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad
  • 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad
  • 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin
  • 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin
  • 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1
  • 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts
  • 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - T347461
  • 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
  • 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001
  • 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet
  • 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet
  • 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
  • 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
  • 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
  • 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C

2024-11-07

  • 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
  • 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
  • 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye
  • 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye
  • 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002"
  • 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002"
  • 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage
  • 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002"
  • 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002"
  • 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage
  • 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage
  • 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage
  • 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002"
  • 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002"
  • 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002"
  • 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002"
  • 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm
  • 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye
  • 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye
  • 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027']
  • 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026']
  • 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027']
  • 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026']
  • 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
  • 21:11 jsn@deploy2002: Finished scap sync-world: Backport for Enable AutoModerator on viwiki (T378343) (duration: 08m 28s)
  • 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
  • 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync
  • 21:06 jsn@deploy2002: suecarmol, jsn: Backport for Enable AutoModerator on viwiki (T378343) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002"
  • 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002"
  • 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 21:02 jsn@deploy2002: Started scap sync-world: Backport for Enable AutoModerator on viwiki (T378343)
  • 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002"
  • 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002"
  • 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
  • 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127) (duration: 13m 02s)
  • 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync
  • 20:25 cdanis@deploy2002: cdanis, aude: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:22 cdanis@deploy2002: Started scap sync-world: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127)
  • 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for DB config for testcommonswiki deployment for Charts (T379199) (duration: 10m 45s)
  • 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync
  • 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for DB config for testcommonswiki deployment for Charts (T379199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:10 cdanis@deploy2002: Started scap sync-world: Backport for DB config for testcommonswiki deployment for Charts (T379199)
  • 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts
  • 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002"
  • 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002"
  • 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:23 cdanis: T379199 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql
  • 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
  • 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
  • 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet
  • 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
  • 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
  • 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables
  • 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables
  • 19:08 mutante: VRTS - switching firewall provider from iptables to nftables
  • 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
  • 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet
  • 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
  • 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
  • 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
  • 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
  • 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors
  • 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors
  • 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
  • 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
  • 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox
  • 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet
  • 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002"
  • 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002"
  • 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - T356241
  • 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet
  • 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet
  • 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
  • 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
  • 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad
  • 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad
  • 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # T375508
  • 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # T375508
  • 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
  • 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
  • 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye
  • 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
  • 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
  • 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye
  • 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
  • 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
  • 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
  • 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
  • 15:54 moritzm: remove ganeti1010 from active ganeti nodes T378921
  • 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary (T378466) and tcywikisource (T378474)
  • 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
  • 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s)
  • 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
  • 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # T375950
  • 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
  • 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw
  • 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase
  • 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
  • 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s)
  • 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided)
  • 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s)
  • 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided)
  • 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw
  • 14:55 hashar: Restarted CI Jenkins for plugins update
  • 14:41 moritzm: installing python-git security updates
  • 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381) (duration: 09m 37s)
  • 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync
  • 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)
  • 14:13 kartik@deploy2002: Finished scap sync-world: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420) (duration: 10m 08s)
  • 14:09 kartik@deploy2002: kartik: Continuing with sync
  • 14:06 kartik@deploy2002: kartik: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s)
  • 14:03 kartik@deploy2002: Started scap sync-world: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)
  • 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3]
  • 13:52 cwhite: running thanos bucket cleanup on titan1001 - T351927
  • 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048
  • 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048
  • 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047
  • 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
  • 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s)
  • 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640]
  • 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s)
  • 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640]
  • 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s)
  • 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047
  • 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
  • 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047
  • 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
  • 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640]
  • 12:16 vgutierrez: repool liberica on lvs1013
  • 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
  • 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync
  • 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync
  • 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync
  • 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync
  • 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync
  • 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
  • 11:03 vgutierrez: depool liberica on lvs1013
  • 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
  • 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye
  • 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
  • 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
  • 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
  • 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
  • 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
  • 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
  • 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002"
  • 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002
  • 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002
  • 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002"
  • 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json
  • 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
  • 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye
  • 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json
  • 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia)
  • 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json
  • 09:21 moritzm: installing openjdk-8 security updates
  • 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia
  • 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs T375661
  • 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json
  • 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
  • 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:26 kartik@deploy2002: Finished scap sync-world: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918) (duration: 18m 39s)
  • 08:25 _joe_: runing scap pull on mwdebug2001/2002
  • 08:19 kartik@deploy2002: kartik, abi: Continuing with sync
  • 08:13 kartik@deploy2002: kartik, abi: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:07 kartik@deploy2002: Started scap sync-world: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918)
  • 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json
  • 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
  • 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
  • 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
  • 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
  • 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C
  • 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C
  • 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C
  • 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C
  • 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B
  • 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B
  • 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply

2024-11-06

  • 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm
  • 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm
  • 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm
  • 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm
  • 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm
  • 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm
  • 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
  • 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm
  • 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm
  • 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage
  • 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
  • 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
  • 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage
  • 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
  • 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage
  • 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage
  • 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
  • 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage
  • 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
  • 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm
  • 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm
  • 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm
  • 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm
  • 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm
  • 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm
  • 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm
  • 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm
  • 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155']
  • 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154']
  • 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153']
  • 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152']
  • 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151']
  • 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151']
  • 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152']
  • 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153']
  • 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154']
  • 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155']
  • 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002"
  • 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002"
  • 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002"
  • 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002"
  • 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm
  • 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm
  • 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm
  • 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm
  • 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm
  • 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
  • 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced]
  • 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
  • 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
  • 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
  • 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
  • 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
  • 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
  • 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
  • 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
  • 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
  • 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm
  • 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm
  • 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm
  • 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm
  • 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm
  • 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150']
  • 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149']
  • 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148']
  • 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147']
  • 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146']
  • 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150']
  • 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149']
  • 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148']
  • 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147']
  • 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146']
  • 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002"
  • 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002"
  • 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm
  • 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) (T375569)
  • 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
  • 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
  • 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:03 sukhe: dummy authdns-update to test CR 10857508
  • 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage
  • 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage
  • 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
  • 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm
  • 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:17 hnowlan: importing debs for mercurius-1.0.1
  • 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
  • 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
  • 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002"
  • 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002"
  • 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:32 moritzm: remove ganeti1014 from active ganeti nodes T378921
  • 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
  • 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002"
  • 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002"
  • 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236
  • 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s)
  • 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002"
  • 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002"
  • 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0
  • 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work T358260
  • 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted
  • 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted
  • 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
  • 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts
  • 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236
  • 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" T379166
  • 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Document available wbformatvalue options (T323778) (duration: 38m 45s)
  • 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet
  • 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Document available wbformatvalue options (T323778) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:51 moritzm: installing php7.4 security updates
  • 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
  • 14:48 moritzm: installing usb.ids updates from Bookworm point release
  • 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046
  • 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046
  • 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Document available wbformatvalue options (T323778)
  • 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Cleanup for logo related file (duration: 15m 01s)
  • 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
  • 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
  • 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
  • 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet
  • 14:19 sukhe: depool cp2031
  • 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for Cleanup for logo related file synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
  • 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Cleanup for logo related file
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045
  • 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045
  • 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
  • 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
  • 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
  • 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
  • 13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
  • 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
  • 13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet
  • 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
  • 13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
  • 13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet
  • 12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
  • 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain
  • 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for T373579', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json
  • 12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain
  • 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236
  • 12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236
  • 12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
  • 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
  • 12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
  • 12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
  • 12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
  • 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
  • 12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test 1087895
  • 12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test 1087895
  • 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json
  • 12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 T378940
  • 12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json
  • 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
  • 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
  • 12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
  • 12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
  • 12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool
  • 12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool
  • 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json
  • 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
  • 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
  • 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
  • 10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) (T378578)
  • 10:43 elukey: depool maps1005 to test an nginx config - T378944
  • 10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs T375661
  • 10:32 XioNoX: push new pfw policies - T379127
  • 10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain
  • 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
  • 09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
  • 09:59 jnuche@deploy2002: Finished scap sync-world: Backport for Fix automatic category creations by FuzzyBot (T285463) (duration: 08m 03s)
  • 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
  • 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
  • 09:54 jnuche@deploy2002: jnuche: Continuing with sync
  • 09:54 jnuche@deploy2002: jnuche: Backport for Fix automatic category creations by FuzzyBot (T285463) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B
  • 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B
  • 09:51 jnuche@deploy2002: Started scap sync-world: Backport for Fix automatic category creations by FuzzyBot (T285463)
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
  • 09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
  • 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044
  • 09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043
  • 09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043
  • 09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
  • 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
  • 08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - T336485
  • 05:52 kart_: Updated cxserver to 2024-10-25-044319-production (T377160, T375102, T371420)
  • 05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 01:30 zabe@deploy2002: Finished scap sync-world: T378260 (duration: 07m 34s)
  • 01:23 zabe@deploy2002: Started scap sync-world: T378260
  • 00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over
  • 00:21 ryankemper: T377594 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now
  • 00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for TextPassDumper: refresh content address on failure (T377594), TextPassDumper: refresh content address on failure (T377594) (duration: 08m 48s)

2024-11-05

  • 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over
  • 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm
  • 23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync
  • 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm
  • 23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:56 ebernhardson@deploy2002: ebernhardson: Backport for TextPassDumper: refresh content address on failure (T377594), TextPassDumper: refresh content address on failure (T377594) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm
  • 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm
  • 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm
  • 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm
  • 23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for TextPassDumper: refresh content address on failure (T377594), TextPassDumper: refresh content address on failure (T377594)
  • 23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
  • 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
  • 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
  • 23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
  • 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
  • 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
  • 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
  • 23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
  • 23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
  • 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
  • 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
  • 23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
  • 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm
  • 23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm
  • 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm
  • 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135']
  • 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134']
  • 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133']
  • 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132']
  • 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131']
  • 22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130']
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135']
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134']
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133']
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132']
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131']
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130']
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134
  • 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135
  • 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133
  • 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132
  • 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131
  • 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130
  • 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135
  • 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134
  • 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133
  • 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132
  • 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131
  • 22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130
  • 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002"
  • 22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002"
  • 22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132
  • 22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for AbstractProvider: Normalize top level config correctly (T379094), AbstractProvider: Normalize top level config correctly (T379094) (duration: 12m 39s)
  • 21:34 urbanecm@deploy2002: Started scap sync-world: Backport for AbstractProvider: Normalize top level config correctly (T379094), AbstractProvider: Normalize top level config correctly (T379094)
  • 21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for cswiki: adding throttle rule for Editathon Czechoslovakia (T379060) (duration: 31m 18s)
  • 21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)
  • 21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet
  • 20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
  • 20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002"
  • 20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002"
  • 20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet
  • 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002"
  • 20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002"
  • 19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
  • 19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
  • 19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
  • 19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
  • 19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
  • 19:20 eileen: civicrm upgraded from 26d8013c to 65a8de90
  • 18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs
  • 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 (T376905)', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json
  • 18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
  • 17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json
  • 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json
  • 17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json
  • 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json
  • 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json
  • 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
  • 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json
  • 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json
  • 16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Fixup paths to moved resources (T379080) (duration: 08m 02s)
  • 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json
  • 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Fixup paths to moved resources (T379080) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox
  • 16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Fixup paths to moved resources (T379080)
  • 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json
  • 16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json
  • 16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json
  • 16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
  • 16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json
  • 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet
  • 15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B
  • 15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B
  • 15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B
  • 15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B
  • 15:48 moritzm: remove ganeti1013 from active ganeti nodes T378921
  • 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
  • 15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json
  • 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # T359795
  • 15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json
  • 15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # T359795
  • 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json
  • 15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
  • 15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
  • 15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json
  • 15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in T359795 and blocks today Jenkins upgrade ( T379059 )
  • 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
  • 15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json
  • 15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
  • 15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
  • 15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
  • 15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
  • 15:01 hashar: Upgrading CI Jenkins | T379059
  • 14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json
  • 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs T375661
  • 14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
  • 14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
  • 14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json
  • 14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
  • away: UTC afternoon deploys done
  • 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json
  • 14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
  • 14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia)
  • 14:28 tgr@deploy2002: Finished scap sync-world: Backport for JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067) (duration: 17m 24s)
  • 14:24 tgr@deploy2002: tgr: Continuing with sync
  • 14:16 tgr@deploy2002: tgr: Backport for JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 14:11 tgr@deploy2002: Started scap sync-world: Backport for JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)
  • 14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian)
  • 14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release
  • 13:54 arnaudb: reimage pc1017 T378068
  • 13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia T379059
  • 13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
  • 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
  • 13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
  • 13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers T378173
  • 13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
  • 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
  • 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336) (duration: 11m 46s)
  • 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
  • 12:46 kharlan@deploy2002: kharlan: Continuing with sync
  • 12:42 kharlan@deploy2002: kharlan: Backport for Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
  • 12:39 kharlan@deploy2002: Started scap sync-world: Backport for Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)
  • 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
  • 12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
  • 12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; T378983)
  • 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation (T378983)
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
  • 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150) (duration: 07m 39s)
  • 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
  • 12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
  • 12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
  • 12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
  • 12:16 urbanecm@deploy2002: Started scap sync-world: Backport for CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)
  • 12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661 (duration: 07m 43s)
  • 12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B
  • 12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B
  • 12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661
  • 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042
  • 11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs T375661
  • 11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json
  • 11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042
  • 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041
  • 11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040
  • 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040
  • 11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661 (duration: 36m 28s)
  • 11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json
  • 11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json
  • 11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json
  • 11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661
  • 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json
  • 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json
  • 10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts
  • 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json
  • 10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s)
  • 10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts
  • 10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json
  • 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json
  • 10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - T378618
  • 10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json
  • 10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm
  • 10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
  • 09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
  • 09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661
  • 09:21 _joe_: restarted rsyslog on deploy2002 T379044
  • 08:57 tchanders@deploy2002: Started scap sync-world: Backport for Revert "temp accounts: Enable temp account creation on second-round pilots"
  • 08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm)
  • 08:10 tchanders@deploy2002: Started scap sync-world: Backport for temp accounts: Enable temp account creation on second-round pilots (T378336)
  • 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828
  • 08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828
  • 08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
  • 07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
  • 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414
  • 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414
  • 05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s)
  • 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661
  • 00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:10 rzl@deploy2002: Finished scap sync-world: 1085506 (duration: 02m 50s)
  • 00:08 rzl@deploy2002: Started scap sync-world: 1085506
  • 00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2024-11-04

  • 23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006
  • 23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006
  • 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm
  • 23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm
  • 23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm
  • 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage
  • 22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage
  • 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage
  • 22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage
  • 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm
  • 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm
  • 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm
  • 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006']
  • 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005']
  • 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004']
  • 22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006']
  • 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005']
  • 22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004']
  • 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:22 damilare: civicrm upgraded from 31f5cbdb to 26d8013c
  • 22:22 damilare: SmashPig upgraded from be47dddd to 601405dc
  • 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002"
  • 22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002"
  • 22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm
  • 22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T376905)', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json
  • 22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm
  • 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json
  • away: UTC late deploys done
  • 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage
  • 21:41 tgr@deploy2002: Finished scap sync-world: Backport for Set Flow to read-only on remaining phase 0 wikis (T377990) (duration: 08m 40s)
  • 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage
  • 21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync
  • 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage
  • 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage
  • 21:35 tgr@deploy2002: tgr, kemayo: Backport for Set Flow to read-only on remaining phase 0 wikis (T377990) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:32 tgr@deploy2002: Started scap sync-world: Backport for Set Flow to read-only on remaining phase 0 wikis (T377990)
  • 21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
  • 21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json
  • 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm
  • 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm
  • 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004']
  • 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003']
  • 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004']
  • 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003']
  • 21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T376905)', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json
  • 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
  • 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T376905)', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json
  • 21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
  • 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002"
  • 21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002"
  • 21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T376905)', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json
  • 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
  • 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json
  • 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet
  • 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json
  • 20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox
  • 20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - T375243
  • 20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet
  • 20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T376905)', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json
  • 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T376905)', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json
  • 20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json
  • 20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for Message: Downgrade exception on bool/null param to warning (T378876) (duration: 09m 12s)
  • 19:55 urbanecm@deploy2002: urbanecm: Continuing with sync
  • 19:54 urbanecm@deploy2002: urbanecm: Backport for Message: Downgrade exception on bool/null param to warning (T378876) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json
  • 19:51 urbanecm@deploy2002: Started scap sync-world: Backport for Message: Downgrade exception on bool/null param to warning (T378876)
  • 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json
  • 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json
  • 19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json
  • 19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T376905)', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json
  • 19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json
  • 18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer
  • 18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer
  • 18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json
  • 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet
  • 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet
  • 18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez
  • 18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez
  • 18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T376905)', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json
  • 18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm
  • 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T376905)', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json
  • 18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T376905)', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json
  • 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json
  • 18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json
  • 17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - T377127
  • 17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T376905)', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json
  • 17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet
  • 17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet
  • 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T376905)', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json
  • 17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json
  • 17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm
  • 16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json
  • 16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json
  • 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json
  • 16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json
  • 16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet
  • 16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235
  • 16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235
  • 16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet
  • 16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json
  • 16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235
  • 16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235
  • 15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json
  • 15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json
  • 15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - T377127
  • 15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json
  • 15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: T378744
  • 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json
  • 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json
  • 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json
  • 14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json
  • 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json
  • 14:38 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252) (duration: 23m 39s)
  • 14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync
  • 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json
  • 14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build)
  • 14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)
  • 14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json
  • 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json
  • 13:51 marostegui: Start schema change on redacteddb1001:s8 T367856 (this will make replication in s8 lag for around 2-3 days)
  • 13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change T367856
  • 13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change T367856
  • 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json
  • 13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json
  • 13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B
  • 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json
  • 13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B
  • 13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json
  • 13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json
  • 12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - T378453
  • 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json
  • 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B
  • 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B
  • 12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
  • 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
  • 11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json
  • 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json
  • 11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json
  • 11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json
  • 11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json
  • 11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T376905)', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json
  • 10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - T377844
  • 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json
  • 10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:41 moritzm: installing libtool updates from Bookworm point release
  • 10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:31 moritzm: installing libseccomp updates from Bookworm point release
  • 10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json
  • 10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T376905)', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json
  • 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T376905)', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json
  • 10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:56 volans: deploying spicerack v8.15.2 to cumin[12]002
  • 09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables
  • 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables
  • 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
  • 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
  • 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization T373579
  • 08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization T373579
  • 08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet

2024-11-02

2024-11-01

  • 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye
  • 19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot T374924
  • 19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage
  • 19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage
  • 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
  • 19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet']
  • 19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet']
  • 19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet']
  • 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet']
  • 18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet']
  • 18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet']
  • 18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet']
  • 18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet']
  • 18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
  • 18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts
  • 18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
  • 18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts
  • 18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye
  • 17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose`
  • 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage
  • 16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage
  • 16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
  • 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill
  • 16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill
  • 16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production
  • 16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production
  • 16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet']
  • 16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for Revert "Dummy commit for testing" (duration: 07m 46s)
  • 16:00 thcipriani@deploy2002: thcipriani: Continuing with sync
  • 16:00 thcipriani@deploy2002: thcipriani: Backport for Revert "Dummy commit for testing" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:57 thcipriani@deploy2002: Started scap sync-world: Backport for Revert "Dummy commit for testing"
  • 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet']
  • 15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye
  • 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
  • 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
  • 14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye
  • 14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
  • 14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm
  • 14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over
  • 13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm
  • 13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
  • 13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over
  • 12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet
  • 12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move
  • 10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR 1085569
  • 02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json
  • 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json
  • 01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye
  • 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json
  • 01:42 urandom: Decommissioning Cassandra/aqs1013-{a,b} — T378725
  • 01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — T378725
  • 01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — T378725
  • 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json
  • 01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet
  • 01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet
  • 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json
  • 01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T376905)', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json
  • 01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage
  • 01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage
  • 01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json
  • 01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
  • 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json
  • 00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
  • 00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
  • 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T376905)', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json
  • 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T376905)', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json
  • 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json
  • 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json
  • 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json

Archives

See Server Admin Log/Archives.