User Details
- User Since
- May 10 2021, 3:25 PM (184 w, 4 d)
- Availability
- Available
- IRC Nick
- topranks
- LDAP User
- Cathal Mooney
- MediaWiki User
- CMooney (WMF) [ Global Accounts ]
Yesterday
Thanks @VRiley-WMF! Seems ok so far but we can make a call Monday based on if we see errors on the link or not (clean since the swap).
This port bounced again overnight:
cmooney@cloudsw1-d5-eqiad> show log messages.1.gz | match "10.64.147.5|2620:0:861:fe0e::2|et-0/0/52" | match "ifOper" | except ".0$" Nov 22 00:22:55 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_DOWN: ifIndex 682, ifAdminStatus up(1), ifOperStatus down(2), ifName et-0/0/52 Nov 22 00:22:56 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_UP: ifIndex 682, ifAdminStatus up(1), ifOperStatus up(1), ifName et-0/0/52 Nov 22 00:22:58 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_DOWN: ifIndex 682, ifAdminStatus up(1), ifOperStatus down(2), ifName et-0/0/52 Nov 22 00:22:59 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_UP: ifIndex 682, ifAdminStatus up(1), ifOperStatus up(1), ifName et-0/0/52 Nov 22 00:23:04 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_DOWN: ifIndex 682, ifAdminStatus up(1), ifOperStatus down(2), ifName et-0/0/52 Nov 22 00:23:06 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_UP: ifIndex 682, ifAdminStatus up(1), ifOperStatus up(1), ifName et-0/0/52 Nov 22 00:23:06 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_DOWN: ifIndex 682, ifAdminStatus up(1), ifOperStatus down(2), ifName et-0/0/52 Nov 22 00:23:07 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_UP: ifIndex 682, ifAdminStatus up(1), ifOperStatus up(1), ifName et-0/0/52 Nov 22 00:23:11 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_DOWN: ifIndex 682, ifAdminStatus up(1), ifOperStatus down(2), ifName et-0/0/52 Nov 22 00:23:13 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_UP: ifIndex 682, ifAdminStatus up(1), ifOperStatus up(1), ifName et-0/0/52 Nov 22 00:23:14 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_DOWN: ifIndex 682, ifAdminStatus up(1), ifOperStatus down(2), ifName et-0/0/52 Nov 22 00:23:16 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_UP: ifIndex 682, ifAdminStatus up(1), ifOperStatus up(1), ifName et-0/0/52 Nov 22 00:23:20 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_DOWN: ifIndex 682, ifAdminStatus up(1), ifOperStatus down(2), ifName et-0/0/52 Nov 22 00:23:20 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_UP: ifIndex 682, ifAdminStatus up(1), ifOperStatus up(1), ifName et-0/0/52 Nov 22 00:23:21 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_DOWN: ifIndex 682, ifAdminStatus up(1), ifOperStatus down(2), ifName et-0/0/52 Nov 22 00:23:22 cloudsw1-d5-eqiad mib2d[17046]: SNMP_TRAP_LINK_UP: ifIndex 682, ifAdminStatus up(1), ifOperStatus up(1), ifName et-0/0/52
Thu, Nov 21
The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!
From Lumen:
2024-11-21 12:38:45 GMT - Splicing has commenced at the south end. Service affecting alarms are expected to begin clearing.
Seems to be back up:
Nov 21 13:51:02 cr4-ulsfo rpd[29932]: RPD_OSPF_NBRUP: OSPF neighbor 198.35.26.203 (realm ospf-v2 xe-0/1/1.0 area 0.0.0.0) state changed from Exchange to Full due to ExchangeDone (event reason: DBD exchange of slave completed)
Wed, Nov 20
Tue, Nov 19
Ok. So I've tested the "move server" script for this host and it's worked as expected. Some values we need to fill in despite the fact they are not changing, such as the switch and the rack unit. The "switch interface" I just picked a free number within a block of 10G ones based on what was already used.
Mon, Nov 18
Fri, Nov 15
Thu, Nov 14
Wed, Nov 13
Polling Netbox to find what switch each of those are connected to it appears none of them in rows A-D are connected to a 10G capable switch.
@RobH the migration work is now done, all that remains is to remove the old devices and any cables connecting to them and make sure netbox is updated to match. Not sure on the exact process for that?
@Jclark-ctr I've erased the config on all the old devices now, so feel free to remove at any point. I'll check what the normal process if on that in terms of tasks etc, for now I'll close this one as the work is complete.
Tue, Nov 12
Migration work is now complete, bastion and all hosts are reachable again following the moves. BGP is established to both core routers from the new firewall pair and everything looks good. Management is reachable directly from the firewalls on reth1, IPsec tunnel to codfw is working with the config built by automation also.
@Jgreen @Dwisehaupt I think we have broadly two options for how to proceed today:
Another complication I see is that on a SuperMicro device there is no ID_NET_NAME_ONBOARD populated, whereas on a Dell there is (though we can perhaps disable with above BIOS toggle). See P71018.
Certain NICs in our estate are not seen as 'onboard', and expose no 'acpi index'. This results in no ID_NET_NAME_ONBOARD being populated for them, in which case the system does use ID_NET_NAME_PATH for the name
I discussed this briefly with @ayounsi on irc and while this is probably a good idea it won't, as things stand, prevent us getting emails from the automated homer diff. Reason being the order of commands causes a diff and homer will apply them in a different order (it makes no difference to device operation).
Mon, Nov 11
Fri, Nov 8
I'd a chat with @Jgreen on irc about the above and he confirmed all those hosts are decommed. We're a little perplexed as to what the switch ports are plugged into that makes them show "up", ad the hosts don't appear to be in the rack, but for now we will plan to not migrate those ports to the new switches.
@Jgreen @Dwisehaupt I was doing some prep work on T377996 - looking at step 1 to import the existing data into Netbox (step 2 will be to get the new server provisioning flow in place which I'll tackle next).
Thu, Nov 7
@Jclark-ctr as discussed I believe we should have a load of copper SFPs from T369557.
Wed, Nov 6
All, just to be aware I hit another snag this evening which may be problematic.
@Jclark-ctr could you also let me know what ports on the fmsw these two were plugged into?
I personally don't think the current config is a bad thing to have in general (we have a lower pref/normal pref/higher pref community defined). None of the community assignments were specifically created for this use case.
Yeah there is a potential problem for cloud VMs talking to UDP (or other non-TCP) services on the internet. For instance if an internet server sends a 1500-byte packet to a VM with do-not-fragment set, and somewhere on the internet path there is an ICMP "blackhole" (preventing path mtu discovery working), the packet won't get back to the machine.
Tue, Nov 5
I used the cookbook to provision the two new frack switches in eqiad this evening.
Thanks for the task Brian.
Mon, Nov 4
Fri, Nov 1
Yeah this is an odd one. And tbh I'm not sure I 100% understand the original reasoning for this and if we still need it.
We seen this (or at least something similar) today, see T378809.
Just stumbled on this task Eric thanks for filing.
Mon, Oct 28
It seems that changing the "ip token" command from "pre-up" to "up" in /etc/network/interfaces makes things work as expected.