AI Case Study

Google drove a 15% reduction in power usage effectiveness overhead in cooling their data centres through advanced reinforcement ensemble machine learning

Google is focused on increasing the efficiency of its data centre power usage. Traditional industrial cooling equipment such as pumps, chillers and cooling can be a crude solution. Using reinforcement ensemble machine learning on historic sensor data such as temperatures and pump speed they were able to drive a 15% reduction in power usage effectiveness overhead.

Industry

Technology

Internet Services Consumer

Project Overview

"To address this problem, we began applying machine learning two years ago to operate our data centres more efficiently. And over the past few months, DeepMind researchers began working with Google’s data centre team to significantly improve the system’s utility. Using a system of neural networks trained on different operating scenarios and parameters within our data centres, we created a more efficient and adaptive framework to understand data centre dynamics and optimize efficiency.

We accomplished this by taking the historical data that had already been collected by thousands of sensors within the data centre -- data such as temperatures, power, pump speeds, setpoints, etc. -- and using it to train an ensemble of deep neural networks. Since our objective was to improve data centre energy efficiency, we trained the neural networks on the average future PUE (Power Usage Effectiveness), which is defined as the ratio of the total building energy usage to the IT energy usage. We then trained two additional ensembles of deep neural networks to predict the future temperature and pressure of the data centre over the next hour. The purpose of these predictions is to simulate the recommended actions from the PUE model, to ensure that we do not go beyond any operating constraints."

Reported Results

"Our machine learning system was able to consistently achieve a 40 percent reduction in the amount of energy used for cooling, which equates to a 15 percent reduction in overall PUE overhead after accounting for electrical losses and other non-cooling inefficiencies. It also produced the lowest PUE the site had ever seen."

Technology

"Using a system of neural networks trained on different operating scenarios and parameters within our data centres, we created a more efficient and adaptive framework to understand data centre dynamics and optimize efficiency. We accomplished this by taking the historical data that had already been collected by thousands of sensors within the data centre -- data such as temperatures, power, pump speeds, setpoints, etc. -- and using it to train an ensemble of deep neural networks. Since our objective was to improve data centre energy efficiency, we trained the neural networks on the average future PUE (Power Usage Effectiveness), which is defined as the ratio of the total building energy usage to the IT energy usage. We then trained two additional ensembles of deep neural networks to predict the future temperature and pressure of the data centre over the next hour. The purpose of these predictions is to simulate the recommended actions from the PUE model, to ensure that we do not go beyond any operating constraints.

Because the algorithm is a general-purpose framework to understand complex dynamics, we plan to apply this to other challenges in the data centre environment and beyond in the coming months. Possible applications of this technology include improving power plant conversion efficiency (getting more energy from the same unit of input), reducing semiconductor manufacturing energy and water usage, or helping manufacturing facilities increase throughput."

Function

Operations

Network Operations

Background

"One of the primary sources of energy use in the data centre environment is cooling. Just as your laptop generates a lot of heat, our data centres -- which contain servers powering Google Search, Gmail, YouTube, etc. -- also generate a lot of heat that must be removed to keep the servers running. This cooling is typically accomplished via large industrial equipment such as pumps, chillers and cooling towers. However, dynamic environments like data centres make it difficult to operate optimally for several reasons:

1. The equipment, how we operate that equipment, and the environment interact with each other in complex, nonlinear ways.
Traditional formula-based engineering and human intuition often do not capture these interactions.
2. The system cannot adapt quickly to internal or external changes (like the weather). This is because we cannot come up with rules and heuristics for every operating scenario.
3. Each data centre has a unique architecture and environment. A custom-tuned model for one system may not be applicable to another. Therefore, a general intelligence framework is needed to understand the data centre’s interactions."

Benefits

Data

"... historical data that had already been collected by thousands of sensors within the data centre -- data such as temperatures, power, pump speeds, setpoints, etc."