Incident Description: On September 7, 2021, IBM Network Specialists were alerted by monitoring to packet loss across multiple MZRs over the frontend and backend networks and began investigating. Initial troubleshooting identified a change made to the network to
return a backbone transport circuit into production contained an error that resulted in traffic congestion on the backbone network. IBM Network Specialists rolled back the changes, however the routes took time to fully propagate through the backbone network. Packet loss and high latency cleared once routing changes fully propagated through the network, later in the day on September 7, 2021.
Root Cause: A change made to the network to return a backbone transport circuit into production contained an error. IBM Network Specialists have identified gaps in both the creation and the review of scheduled changes, which contributed to this service impact.
Future Actions: The IBM Cloud team has analyzed this incident for areas of improvement, including issue detection, identification and future mitigation. In order to prevent this type of issue of recurring, IBM Network Specialists have updated these elements of the change process.