5. Smart metering electricity data analytics

Challenge 5 is divided into two sub-challenges (5A & 5B), where each sub-challenge represents separate task.

CHALLENGE 5A: Households Energy Disaggregation


Do you want to solve one of the hottest problems the electrical energy industry faces right now?

Distribution companies currently massively replace the old energy meters with the new so-called smart meters. One of the greatest challenges right now is how to get as much as possible from this data.

Your challenge is to make a supervised machine learning model that would recognize electrical appliances (refrigerator, oven, boiler, heat pump etc.) from smart metering energy time series data.

Energy disaggregation, also called Non-Intrusive Load Monitoring (NILM) is an approach where machine learning and statistical models are used to analyse the total energy consumption and determine the energy consumption signals originating from electrical appliances. A lot has been made on datasets with high-resolution data. Since smart meters measure consumed energy at 15 min intervals you should learn a model that can acquire original signals of electrical appliances from actual smart metering data.


Task 1: Start with a freely available dataset called REDD [1]. Original dataset is recorded at 3s resolution. Resample data to 15 min and explore which devices can be recognized by learning a model with this low-resolution data.

Task 2: After learning a model on REDD dataset try to recognize electrical appliances from actual households smart metering data. When examining actual data you will not have labels about appliances.

Plot recognized behavioural patterns. Results have to be visually appealing and understandable to non-experts.

Python – Jupyter is the preferred environment.

External data about households appliances can be used (only for Task 2).

We are waiting for you to help us with your solution.

(whole problem description and given data are available at GitHub)

CHALLENGE 5B: Identification/classification of Dangerous Events in the Power Grid


In this challenge you’ll be dealing with one of the major problems that companies for high voltage power transmission are facing today.

Electricity runs from large powerplants through transmission powerlines and transformer stations all the way to factories, your home and other consumers. But every once in a while, a dangerous event happens.


In some transformer stations in the Transmission System there are fast smart meters – sAMI which measure these events (measurement is taken every 20 ms synchronized with GPS).

Usually the dangerous events can be identified/classified by looking at the values of different variables that sAMI devices are measuring. The events are seen as transients of these variables. The event classification and localization is done in post-mortem analysis.

Some examples of event’s classification and localization:

  • 3 pole short circuit at N1
  • Power plant outage at N4
  • Powerline outage between N1 and unmeasured N
  • Powerline outage between unmeasured Ns (close to N2)


Since the different parts of the power system are interconnected with powerlines each sAMI in the system will measure the event, that’s why there are some transformer stations and some powerlines, which don’t have sAMI. However, the further away that sAMI is from the event’s location the less transient it will measure.


The objective of this challenge is to classify dangerous events and try to localize them on nodes from the available data so when the event in real world occurs the computer would know how to classify it and where to localize it.

Datasets descriptions:

Folder “REDD_low_freq” consists of 6 sub-folders. Each sub-folder includes datasets for one household. Please read the exact description in [2] under title “Low-Frequency Power Data”.

Smart metering data for 100 households is in file “hack_elect_data_1.csv”. Consumers are anonymized and numbered from 0 to 99. Every row represents daily load time series in kW for particular consumer. Each consumer has load time series data for year 2016.