Logo

Notes | Work | Github | Twitter | Linkedin | Email

Using Machine Learning to Efficiently Cool Data Centers

30–40% of power costs in a data center go to cooling. This README describes a data-driven methodology to model, simulate, and automatically optimize cooling policies to reduce PUE while honoring operational constraints.

Github Repo

Table of Contents

Problem Statement

30–40% of power costs in a data center go to cooling. Average PUE of data centers in India is over 1.7. Though data centers have state-of-the-art cooling systems from leading vendors, they can be managed more efficiently due to:

  1. Hard-to-model environments: There’s no safe place to experiment with new settings (e.g., setpoints).
  2. Coarse, localized policies: Often tuned for peak IT load and ignore cross-system interactions.
  3. Reactive/static control: E.g., lowering AC temperature only after a zone heats up.

Introduction

Past sensor and control data can be used to model data centers. These models can accurately predict the impact of changing setpoints and can be used to test or even auto-generate new policies.

Background

Google successfully deployed machine-learning-based cooling control in their data centers, reportedly saving ~40% of cooling energy and achieving very low PUE (≈ 1.06). See:


Our Methodology

High-level workflow

Using data to model and then control

Modelling

Model as proxy

Model testing

Generating policies

Experiments

Strategy Simulations

Consider 4 racks in a cold-aisle/hot-aisle layout. Each rack intakes from the cold aisle and exhausts to the hot aisle.

What does the model learn?

EnergyPlus Simulations

We use EnergyPlus to simulate building energy consumption and override its control system:

Simulated DC floor Simulated DC floor

Integration with DCIM

Launch

Before launch, we collect operational constraints (e.g., temperature at rack < 22 °C). All policies enforce constraints at all times.

Data Required for Modelling

The list below is illustrative. Missing or extra parameters do not materially affect savings potential.

Sensory data

Cooling system data

Overall metrics