Who me ? “Expect the unexpected” is a frequently repeated cliché when planning for disasters. But how far should these plans go? Welcome to an episode of Who, Me? where a reader finds an entirely new failure mode.
Today’s story comes from “Brian” (not his name) and takes place during a time when the US state of California was facing blackouts.
Our reader worked for a struggling hardware vendor in the state, a once mighty powerhouse now downsized to just 1,400 employees thanks to that old favorite of the HR ax wielder: “restructuring.”
Brian worked in the data center as a Unix/Linux system administrator while the only remaining facilities engineer was located at another site not far from the highway.
“We were warned that California was going to start having ‘blackouts,'” he told us, “but we were assured that the diesel generator had power and could provide electricity for a day. or two, and we had battery backup for the data center part of the building for at least 30 minutes (just in case).”
What could go wrong?
On the day of the first blackout, Brian’s box farm lights went out and the desktops died, as expected. Employees switched to machines running on the UPS as the large generator prepared to start.
Sure enough, the diesel started. However, the power did not sink. The building remained dark. Brian went to the back of the data center to check and yes the generators were definitely working. But for some reason the lights wouldn’t come on.
He was unable to enter the generator enclosure for further troubleshooting as it was reasonably securely locked. And the key? With the facilities engineer. Which was on the other site.
It was now 4:30 p.m. and anyone who knows how bad the traffic on that part of I-280 could be between 4 and 7 p.m. knows that the chance of the engineer navigating traffic in 30 minutes was about zero.
“So our facilities manager was rushing to the data center at near-walking speed,” Brian said.
Worse still, the data center’s air conditioning was running off the mains, not the UPS. After all, there was only supposed to be a short flash before the generator kicked in. But that wasn’t the case, and things were heating up.
The team began desperately to shut everything down. Development kit, test hardware, even redundant production systems. Anything that could draw precious juice from the UPS, emit heat, and wasn’t absolutely essential didn’t escape a flick of the power button.
“By the time the facilities manager arrived at the data center, about an hour later, the house’s UPS had run out,” Brian recalls, “even though we had reduced the servers and power to a bare minimum. network equipment and that we had all the doors to the building and the data center thrown wide open in a futile effort to keep things cool.”
But what had happened? The generators were working, but the change hadn’t happened. Using the enclosure key, the facilities engineer investigated and reported.
“Turns out everything worked as expected, except for a switch in the generator enclosure that was supposed to switch the building to the generators.
“It was a neighborhood bird’s favorite perch and it was so encrusted with poo it wouldn’t change.”
At least that’s the explanation given by the engineer.
“Well, there’s always the unexpected, huh?” said Brian.
We’ve never seen “crappy encrusted switches” in any of our disaster recovery plans, but maybe we should have. Or maybe the facilities engineer used the antics of his feathered friends to cover up his own rooster. Let us know your thoughts in the comments and submit your own tech scraps with an email to Who, Me? ®