The stakes are rising when it comes to Data Centre power outages?
Electrically coupled UPS technology advances offer a safer path and more uptime
A power outage in a data centre that results in service loss can propel data centre operators into the headlines.
Two recent but separate power outages at two different London data centres which resulted in loss of services did just that.
One outage lasted more than 14 hours before power and services were fully restored. Hundreds of service provider customers were directly impacted and thousands of their customers were without network-based services for a whole working day.
In the first of these outages the operator identified the problem as originating with a faulty static UPS. A few days later in a second incident at a facility run by a different operator, unconfirmed reports say the operator said a small fire was caused by a faulty UPS.
In the first incident a Static UPS was identified as the problem. Given the nature of the reports from London Fire Brigade it would seem most likely that the second incident also involved a static UPS.
For the data centre operator a serious outage is initially viewed through the impact on its customers. How quickly was the problem spotted? How did the operator react? How quickly did it communicate with its customers? How quickly was the issue identified and how fast and effective were the remediation measures. How quickly were services restored? Where effective processes in place and where the protocols followed?
This initial stage of any major outage is when boards light up and social media goes into overdrive and journalists start asking questions.
Once the initial crisis has been addressed the next step for the operator is to identify what went wrong and why? A thorough review ensues.
Don’t waste a crisis
All data centres, and especially commercial data centres, live or die by their uptime. If the equipment that is designed to protect the power provision and back up was the cause of an incident the investigation goes in a particular direction.
Firstly, for a UPS to be identified as the cause of an outage incident raises many questions. In turn if the root cause was a technical failure this can raise serious long-term concerns. Whenever an outage occurs the opportunity must be grasped to ensure it doesn’t recur
If a UPS failed and caused an incident the questions asked will include:
Was it due to a service issue?
Is there a fundamental design issue?
Was the age of the particular unit a factor?
For an operator it is critical that there is no repeat of any failure.
For data centre management teams and the engineers who report to them approaches to electrical engineering in large data centre environments is changing.
Operators are no longer constrained by fixed power topologies which were intended not to change once they were designed, deployed and commissioned.
As data centres become more important and uptime becomes even more vital it seems odd that some operators feel they are unable to take advantage of UPS technology advances in reliability, availability and safety which deliver high uptime levels.
There exist real alternatives to multiple static UPSs being paralleled throughout data centre power chains which are inherently less safe and secure than deploying high power rated single modules. Using many low power range Static UPS introduces multiple points of failure into the data centre. So, what operators end up with is often dozens of units connected in a chain. And within each static UPS there is a high component count of things such as capacitors and fan which are used to cool lots of moving parts.
Any thorough review of an outage where the fault came from the UPS must explore alternative approaches to power provision and back up. This should include asking if electrically coupled UPS technology such as Piller’s UB-V can provide safer, more efficient power protection at scale when compared with traditional static UPS technologies.
Whether the root cause of the recent high profile outages were caused by a battery fault, a faulty capacitor or a worn out fan is something that is unlikely ever to make it into the public domain.
What is clear is that with rising capacity demands meaning more power will be needed, UPS selection becomes even more vital. With so many ageing Static UPS units in operation these outages are unlikely to be last we hear of.
Data centre operations are about managing risk. Reducing power outages that put service at risk is the first duty of the ME teams. Piller’s UB-V series is changing the way engineers view provision of conditioned power in large scale data centres right across the world.
If you are concerned about outages, looking at alternatives to the Static UPS is a good place to start.