Tuesday, May 27, 2008

Intermittent Failure

First, a definition from Dictionary.com

intermittent
Pronunciation[in-ter-mit-nt]
–adjective
1. stopping or ceasing for a time; alternately ceasing and beginning again: an intermittent pain.
2. alternately functioning and not functioning or alternately functioning properly and improperly.

You all know something about this. It's the way in which a piece of equipment will work fine, do something unexpected, and then work fine again without anybody fixing it. This is more common with things that are electronic or mechanical in nature. This leads me to believe that intermittency is not natural.

I believe it's evil, actually.

You see, in my world something that's unreliable is the worst sort of problem. We do live shows. There generally aren't any second takes. When you hand the girl the microphone and she starts to sing the National Anthem, the sound needs to come out of the speakers. Now. "It worked fine an hour ago" isn't much consolation when your boss or worse yet the Client is looking at you like the world just fell down on him and it's your fault. Ever been boo-ed by twenty thousand people- on your birthday? I have. There were no T-shirts.

It is hard to explain to somebody who isn't in the event business why a problem that "fixes itself" is so troubling.
Consider this example: During one of the recent graduations, I was playing the National Anthem from the Instant Replay (hard disc recorder). Everything was going just fine when there's a loud POP from the PA and the house system is just gone. There's some question about it now, but I think the stage speakers were still working.
Now, we've got thousands of people in the stands and every one of them is very interested in seeing their family member graduate TODAY. Not to mention the five other events in the next three days that need sound. If the PA is dead... Well, nothing good can come from that.
The song ends (about twenty seconds, but it was a LONG time to me) and we're supposed to roll right into another song with no interlude so I push the faders down fade the IR back up and hit the hot key for the next track. I'm VERY curious, my mind is racing through all the things to check and forming non-existent contingency plans, and the adrenalin is pumping pretty hard. Thankfully the song falls out of the speakers and the show goes on without any further trouble.

Whatever it was, we didn't fix it. It "Fixed Itself".

I've racked my brain trying to figure out what could have caused that temporary outage. Lots of possibilities, to be sure. No real good candidates have come forward though.

What I do know is:

Unless you positively identify the failure point and eliminate it, you are liable to have the problem again.

And

When it does fail again, it will not be at an opportune time for you.

I've spent the better part of a decade finding vulnerabilities in the equipment and practices that we use to put on events in this building. I know that we're much better off than we were years ago. We have more spare parts, better system designs, better equipment, and most of all, more years of experience. I remember the time we got our ass kicked because we hadn't learned "the lesson we learned on that show that time where that stuff didn't work".

And yet, we are still dealing with electronics and physics. We're vulnerable all the time to LOTS of things that could stop the show. And lets remember that without the PA system, you can't put seventy thousand people into this place. It's not safe.

Am I thankful that the system "fixed itself"? Hell yeah!

Do I believe it's "fixed"? Not on your life! I'm waiting for it to happen again. Maybe then we'll scrape together enough data to figure out what happened.

And so I've come to believe:

"Intermittent Failure is the hand of the Devil."

No comments: