Data Center Risk: CHILLERS!

November 12, 2007

Last night’s outage at at Rackspace data center at Heritage Business Park IV in Grapevine should be a wake up call to colocation customers around the country. Why? First, we need to look at what happened. Yesterday evening a truck drove into a power transformer near the building causing the power to go out. The backup generators started up and presumably allowed the facility to power its equipment. Despite the fact that the Liebert HVAC systems were running, the facility began to get hot ~ VERY HOT, VERY FAST! Rackspace had to make a quick decision to power down their servers, despite the fact that hundreds of customers would go offline including 37 Signals, Laugh Squid, and GigaOm to name a few. Why did it get so hot so fast?

It turns out that in most multi-tenant commercial property in the United States, the building owner provides chilled water so that tenants can run their HVAC systems. In general, most buildings do NOT put these chillers on power with generator backup. Tenants pay millions of dollars (and I mean millions) for backup power systems on the rare chance (rare in the U.S.) that the building might lose power. Most of them don’t realize that they won’t be able to cool their space after the power goes off. When Rackspace subleased the space Grubb & Ellis (Qwest’s representative) made representations that the design and construction of the building would allow for operation for up-to two weeks after a power failure. Robert Fulford told the the Dallas Business Journal that, "The good news for Rackspace is they got it for pennies on the dollar. Qwest spent a huge amount of money building out the space — I think around $900 per square foot. It’s unbelievable. If the world would end this center could run on its own for two more weeks."

Of course, as Robert notes, RackSpace didn’t build out the space, Qwest subleased 60,000 square feet of their 144,000 square-foot commitment in the park to Rackspace back in 2004. The building is owned by ING Clarion. Why is this important? Simple, no one is responsible. Was ING supposed to ensure the chillers were backed up? Did ING represent to Qwest that they were? Was Qwest responsible? By the time 37Signals sold me their hosted project managment service (BaseCamp) they assumed they were doing everything right. They paid EXTRA for top quality space, power and pipe. Rackspace paid EXTRA for a top quality facility. Qwest paid EXTRA for a great building and so on. You get it? Everyone made quarantees, promises and representations about the power ~ did anyone think to ask about the air conditioning? Is 37 Signals responsible to me for the outage? This one incident likely resulted in millions of dollars in loses, who is going pay?

Of course, this issue is not unique to Rackspace, Qwest or 37Signals ~ it is going on EVERYWHERE. Just last month the power to our building’s chillers was interrupted for more than a day and we learned that despite our expensive UPS system and generator, our HVAC system was going to blow hot air until the power could be restore (several hours later). You might as well not bother installing a millions of dollars worth of generators and UPS systems if you are not going to put ALL of your systems (servers, lights AND the entire HVAC system) on generator power. You can’t run servers in the heat, period. My advice to data center operators (and data center customers), talk to your landlord today and ask him if your HVAC system is using the building’s chiller and if it is is the chiller on a generator? Get him to show you, test the system and put it in writing. Just my two cents…

UPDATE: One of my readers passed along Rackspace’s Zero-Downtime Network Guarantee, talk about making promises you can’t keep:

100% Network Uptime Isn’t Wishful Thinking, It’s A Guaranteed Reality

We know that every second your network is down you’re losing opportunities, revenue and the confidence of your users and visitors. So we decided to do something about it. We designed and built the Zero-Downtime Network to minimize downtime and we are so confident about its capabilities that we guarantee we will give you money back if it goes down. And it works so well that we guarantee it.

Our 100% Network Uptime Guarantee — Every aspect of our Zero-Downtime Network makes it real:

  • We use the network only for our customers’ managed hosting needs, never sharing it with telecom services or cable TV services that would negatively affect your service.
  • The only bandwidth we use is high performance bandwidth, which usually isn’t the case with cheaper hosting providers.
  • To provide multiple redundancies in the flow of information to and from our data centers, we partner with nine network providers.
  • Every fiber carrier must enter our datacenters at separate points. This is to protect you from complete service failures caused by an unlikely network cut.
  • You get the fastest and most reliable network connections because our Proactive Network Management methodology monitors route efficiency and end-user performance, automatically improving the network’s topology and configuration in real-time.
  • We purposely underutilize the network, making it resilient to even the largest Internet routing issues.
  • The network’s configuration, co-developed with Cisco, guards against any single points of failure at the shared network level. We even provide you with the option to extend it to your VLAN environment.
  • Cisco and Arbor Networks continually work with us, creating ever-improving ways of monitoring and securing our network.

If we didn’t own and operate our four US and four UK data centers, we could never engineer them to the standards required to support the Zero-Downtime Network. We engineered them to never compromise security or redundancy. Ever.

Physical Security

  • Keycard protocols, biometric scanning protocols and round-the-clock interior and exterior surveillance monitor access to every one of our data centers.
  • Only authorized data center personnel are granted access credentials to our data centers. No one else can enter the production area of the datacenter without prior clearance and an appropriate escort.
  • Every data center employee undergoes multiple and thorough background security checks before they’re hired.

Precision Environment

  • Every data center’s HVAC (Heating Ventilation Air Conditioning) system is N+1 redundant. This ensures that a duplicate system immediately comes online should there be an HVAC system failure.
  • Every 90 seconds, all the air in our data centers are circulated and filtered to remove dust and contaminants.
  • Our advanced fire suppression systems are designed to stop fires from spreading in the unlikely event one should occur.
  • All cables are securely tied down with cable racks suspended from ceilings, providing dual routes for all cables.

Conditioned Power

  • Should a total utility power outage ever occur, all of our data centers’ power systems are designed to run uninterrupted, with every server receiving conditioned UPS (Uninterruptible Power Supply) power.
  • Our UPS power subsystem is N+1 redundant, with instantaneous failover if the primary UPS fails.
  • If an extended utility power outage occurs, our routinely tested, on-site diesel generators can run indefinitely

Core Routing Equipment

  • Only fully redundant, enterprise-class routing equipment is used in Rackspace data centers.
  • All routing equipment is housed in a secured core routing room and fed by its own redundant power supply.
  • Fiber carriers can only enter our data centers at disparate points to guard against service failure.

Network Technicians

  • We require that the networking and security teams working in our data centers be certified. We also require that they be thoroughly experienced in managing and monitoring enterprise level networks.
  • Our Certified Network Technicians are trained to the highest industry standards.

 

 

 

Comments

3 Responses to “Data Center Risk: CHILLERS!”

  1. Indefinite Articles » Things to ask your hosting provider on November 13th, 2007 11:35 am

    [...] they have the HVAC on backup power as well?  Bookmark on del.icio.us    Filed under: Technology by — jb @ 4:35pm [...]

  2. Grace Farmer on November 13th, 2007 12:11 pm

    BaseCamp was down for a couple of hours as a result of this. See the notice at 37Signals.com.

    When the back up power kicked in…two chillers did not kick back on, the servers started heating up and the system shut down.

  3. Web Hosting Providers Directory on November 28th, 2007 6:42 am

    Web Hosting Providers Directory

    Sorry, it just sounds like a crazy idea for me :)

Got something to say?