As I’m sure that many of you are aware, we had an outage commencing at about 2205 NZST last night and finally getting fixed at about 0825 this morning.
I got notified about 10 minutes into the outage by my various ping systems that the system was down. The main server with the database was just completely off the net, including from web consoles. So I checked everything that I could, including requesting that the system reboot, to no avail. Meanwhile the suppliers own system had noticed that the server was inaccessible and informed me. So I sent off an email and received an automated acknowledgement of receipt.
I was questing to find out how far the problem extended. On the network everything was there up to the last step to the server. It was quite clear that at least a bank of other servers at the same node had also gone dark which pointed to a structural problem that should have been making a big noise of their boards .
I figured that it was nearly midnight, most of the readers would have gone offline, and they would know of the problem from my email at least. I’d check later in the morning to find out when it went back online. So I suppressed the server alarms from my cellphone and went to sleep.
Arggh, woke up and there was no response and no server. So I rang their polite phone support monkey service at 0630 who said that they’d make sure someone knew about it. I finally got a response from support that arrived at 0707 about 8 hours from my first support email and about half an hour after I phoned.
Apologies for not getting back to you earlier. The underlying hosting environment had to be reinitialized.
Why do I get the distinct impression that was the first time they’d realised that they had a problem. Don’t they read their frigging emails? If they had then they could have rebooted eight hours earlier.
Sure, I’m a penny pinching sysop. But this server still generates several thousand dollars a year in revenue for the supplier and I only expect them to run the infrastructure. I don’t expect gold-plated support, but I do expect people to at least read the frigging emails I send them and to respond to them within hours.
Anyway, by that point I was engaged in warming a backup system and refreshing it with the latest available data from 1700 the previous day. I’m going to have to increase the frequency of database backup deltas. I may even reimplement database replication now we’re out of the southern cross cables horrendously expensive data.
But I’m starting to look for a more support responsive primary server provider.