I was describing my small business's computer layout to a friend of mine, and he made a comment to the effect of my being at high risk because I had so many "single points of failure". What is that, what risk is he talking about, and what do I do about it?

It's actually a concept that I've become just a little too familiar with in recent months, in part due to an accident down the road, and in part due to some aging computer equipment.

If you rely on your network or internet connection to run your business, then it's important to be aware of what might fail, what the ramifications of failure are, and what you plan to do when (not if) it fails.

A "single point of failure" is, to oversimplify a little, any single piece of equipment that, if it fails, can bring your entire operation to a halt.

You may not realize how many single points of failure you have, but it can be kind of surprising.

One great example is your broadband modem. It's your gateway to the internet. If you run an internet based business, and your modem fails, your operation comes to a complete stop.

I had exactly this happen to me some months ago. My DSL modem decided it was done for, and died a quiet death. My internet connectivity was gone. Given that the internet is what I'm all about, that had serious impact on my ability to do business and communicate with friends and clients. On top of that, for various reasons known only to my phone company, I'm actually stuck with an old style of DSL technology, which means that replacement DSL modems are difficult and time consuming to locate. Fortunately I was able to locate one the next day through a local eBay seller.

I have now also purchased a second DSL modem as a backup. That's part of understanding the ramifications of that "single point of failure". By identifying it, you can make contingency plans to recover more quickly from the failure.

It happened again a couple of weeks ago. Apparently an auto accident down the road sent a power surge through the phone lines. All of a sudden not only was my phone dead, but the DSL disconnected once again. My single point of failure? The circuit board fuses that had blown in my DSL filter. I now have a replacement in place, and a spare.

"Never underestimate the power of a backhoe."

By now you can see that almost anything dealing with my DSL line is, for me, a single point of failure. In larger companies, they'll mitigate this risk by having multiple different lines coming into the facility. That way any one can fail, but the operation continues using the other - multiple failures are required for the operation to be seriously affected.

In truly serious datacenter operations, even the conduit that carries the wire can be a single point of failure. If all those multiple different lines come in through the same conduit outside the building, a single backhoe can take them out all at once. (Never underestimate the power of a backhoe.) The solution that large datacenters take is to ensure that different lines come in to different physical locations.

But back to your small business. What are your potential single points of failure? Consider these, if your business is dependant on them:

  • Your internet connection. What happens if your internet connection goes away?

  • Your networking equipment. A router or switch could die. Do you know what you would do?

  • Your cash register or other data processing equipment. For example, my wife's retail store is dependant on a single computer as its cash register. We have a backup plan - do you?

  • Your printer. Are you required to print receipts or forms to process your sales? What happens when the printer dies, or you're simply out of ink?

  • Your credit card processing equipment. If you take credit card sales, the reader might die. (ours did!) Do you know what you would do then? (we didn't.)

You get the idea ... there are lots of hidden places where a single failure can cause major disruption.

It boils down to a risk/cost analysis. It might be cost effective to have a backup DSL modem for $50 or so, but a spare computer for $2000? Not so much. In that case, at least understanding that there is risk there might allow you to build an alternate plan. At my wife's retail store, for example, the plan is calculators and hand written receipts until I can repair or replace a broken computer. That's appropriate for her business, your needs might be quite different.

The most important thing is to simply be aware, and as prepared as you can be.

Stuff happens.

Article C2628 - April 20, 2006 « »

Greg Bulmash
April 21, 2006 12:49 AM

"Never underestimate the power of a backhoe."

Every once in a while, you'll see a story about a backhoe operator who didn't just cut off a business, but multiple city blocks by whomping through some buried cables.

But also worth noting is that a single point of failure can be a source of cascading failure. This was shown in 2003 when that one power plant's problems ended up knocking out power to 50 million people the Northeastern U.S. and Canada. Technically, this shouldn't have happened, but it did.

June 27, 2007 1:49 PM

At the same time every hour i get disconnected from the internet. Verizon has not been any help.It always happens at at 5 minutes before the hour and i can't get back on for anywhere from 2 minutes to 15 minutes,the average is about 6 minutes. Please help me,I play polker for money and it's costing me.

Thank You,

April 15, 2008 11:57 PM

Thank your for this article.
while reading COBIT's DS4, I was so confused about single points of failure.

R.K.Mohan Rao
June 23, 2008 12:52 AM

I am reading an article on distributed processing associated with Echelon LonWorks technology. It was mentioned in that article that distributed processing mitigates the problem of Single Point Failure.

Lee Nelson Guptill
June 6, 2009 3:48 PM

Reminds me of HACCP (hazardous analysis critical control point) in the food industry.

April 27, 2010 12:07 PM

my zoom modem just up and stopped one day, then it worked for a day, then it stopped.
if i'd elected to use the ISP's modem it's likely it would have been THEIR problem.

