Resolved
After 2 hours and 30 minutes

All customers have been relocated to the replacement switch.

All customer services downstream of the affected switch appear to be functional at this time.

This incident is concluded, though we'll be staying on site for a while longer to monitor things

Please accept our apologies for the inconvenience caused

Avatar for
{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 2 hours and 24 minutes

We're about to start moving people to the replacement switch

You can expect an outage of ~30s per device at most.

Avatar for
Recovering
After 1 hour and 28 minutes

Further assessment indicates that it is predominantly a control-plane failure and most customer traffic is continuing to be switched as normal. Customers who were experiencing a loss have been temporarily patched elsewhere to get them online.

At this time there are no customers that we are aware of experiencing an outage, but all customers still connected to sw9 are "at risk" that the switch deteriorates or crashes and might not reboot

We will still look to replace this switch shortly but have room in the rack to bring the replacement switch back up and configure it whilst the old one is in place which will reduce any outage to 30s or less for most connected hosts.

Preparing the replacement switch, upgrading firmware, restoring config backups and double checking everything will take a little while, so next update will be between 60 and 90 minutes from now.

As the criticality of the event has been downgraded, we will only post any further updates to the status page at https://status.netcalibre.net until the incident is closed, when an all-channels update will be posted.

Avatar for
{:closed=>"Closed", :complete=>"Complete", :false_alarm=>"False Alarm", :identified=>"Identified", :investigating=>"Investigating", :open=>"Open", :recovering=>"Recovering", :resolved=>"Resolved", :scheduled=>"Scheduled", :underway=>"Underway"}
After 1 hour and 2 minutes

On site: assessing the issue and work required.

Next update: ~30-40 mins

Avatar for
Identified

Initial attempts to investigate alarms in Volta suggest that we have a switch that has failed in Volta

Dedicated server customers connected to sw9.vlt will currently be experiencing a total loss of service

We're on our way with a replacement

Apologies for any inconvenience caused

Next update ~30-40 mins

Avatar for
Began at:

Affected components
  • Core Network Functions
    • Layer 2 (VLT)
  • Hosting
    • Dedicated Servers