Municipalities in Norway are seeing an average water loss of more than 30% due to leakages in their water distribution network. This represents a significant cost that ends up in the hands of consumers. During a week-long hackfest with Microsoft, Powel implemented an IoT solution, SmartWater, that will provide the maintenance staff with the ability to discover and react to these leakages early. This has the potential for huge savings for the municipalities and their customers.

This report describes the technology choices made and the learnings from the hackfest.


Powel SmartWater operations dashboard

Powel SmartWater operations dashboard


Key technologies used

Core team

  • Pedro Dias – Technical Evangelist, Microsoft
  • Olav Tollefsen – Technical Evangelist, Microsoft
  • Project Lead, Powel
  • Domain Expert - Water, Powel
  • ML, IoT Ingestion, Powel

Customer profile

Powel spans Europe with a broad and sustainable customer base and a long history as a trusted supplier of software solutions for cities/municipalities, counties, the energy industry, as well as the contracting sector. It has approximately $50 million (USD) in revenue and some 400 employees. The company has offices across Norway as well as in Sweden, Denmark, Switzerland, Chile, Turkey, and Poland.

Problem statement

Water leakages in public water distribution systems are wasting resources and need to be detected and repaired as soon as possible. In Norway, an estimated 32% of the water supply gets wasted due to leakages.

One major challenge with the water distribution system is the age of the various components in the infrastructure. Another challenge is that water pressure today is set to a constant, high level, so pipes that are leaking will leak more water than necessary. The pressure is set high in case of fire, when firefighters need that extra pressure for their water hoses.

Powel started this journey with the goal of using the Cortana Intelligence Suite to deliver a product that could help municipalities detect these issues. They engaged with Microsoft to deliver a working Minimal Viable Product (MVP).

The challenges in the project were:

  • Finding a secure, scalable way to deliver and store telemetry data.
  • Understanding how to work and prepare the telemetry data for Machine Learning.
  • Configuring Azure Machine Learning to do anomaly detection on the data.
  • Connecting the anomaly detection results to the SmartWater application.

Preparing for the hackfest

A few weeks before the hackfest, we had a full-day discussion in which we detailed the activities for the hackfest. Much of the day was spent planning the features for the SmartWater solution. The technologies and architecture in the solution were solidified during this session, and the scoping of what we could achieve within a week was set.

There is also the problem of locating water leakages. In today’s solution, special microphones are attached to water pipes and then used to triangulate an approximate location of the leakage source based on their sound wave patterns. These systems are mobile, and only used where leaks are suspected, so we decided to leave this out of the scope of this hackfest.

We also discussed whether regulating water pressure based on predictions would be possible, but decided to leave that as a future exercise.

The dataweek

One important decision that was made was to spend a full week, dubbed “the dataweek,” between the value stream mapping and the hackfest to gather as much data as possible and to look at different ways to correlate them. We wanted to see if traffic conditions, weather, or other external data sources could help us determine water leakages, and to find relevant “third party” sources of interesting data.

We also had the design team start work on sketching the user experience in the SmartWater application to have as much as possible in place before the hackfest itself.

During this time, we collaborated using Slack due to its informality and effortless cross-domain collaboration. Questions, comments, and feedback were discussed daily, and this helped make the hackfest the success it was.

Key learning
Had we not decided to spend time in the dataweek, we would not have been able to deliver a complete end-to-end solution like we did in this hackfest. Investigating correlations between different data was central in the work we did during the hackfest, which mainly revolved around creating a working Machine Learning model.


Discussing water grids, elevations, and valves

Discussing water grids, elevations, and valves


Solution, steps, and delivery

The SmartWater solution monitors current water flow into a water distribution system and, in near real time, detects anomalies compared to a Machine Learning-based prediction. The operators will then see alerts in their application, together with enough insight to take the appropriate actions. The goal, of course, is stopping the leakage as early as possible. When more sensors and smart water meters are installed, the solution will have the potential to provide even more intelligence.

A key part of the solution is the use of Azure Machine Learning. Without this, we would have nothing!


SmartWater solution design - overview

Smart Water Solution design overview


Data available for analysis

For this project, we had the following data available:

  • Sensor data for water flow into a few of the water distribution zones in the Trondheim municipality. The sensor data had various frequencies and time spans.
  • Much of the data for the previous three years (2013–2015). The goal was to have a complete data set for those years, but due to issues with sensor availability, there were some gaps in the data.
  • Reports written by the municipality on various incidents and repairs done relative to the water distribution system.


SCADA sensor

SCADA sensor

Hypothesis

If we could accurately predict the “normal” water flow into a given zone for the next few hours, we could quickly detect anomalies in the current water flow that would likely be caused by water leakages.

Process

To be successful with data analysis and working with Machine Learning models, we needed to understand the whole process from the raw data until we had a reasonably good model that was ready to be used by the application. The process we used is illustrated here.

Process

Telemetry transmission

We were able to obtain three years of telemetry from a friendly municipality, and in order to create a realistic data stream, we wrote a simulator application that replaced the timestamp of that dataset with the current time before sending the data to Azure IoT Hub.

Device simulator

The simulator was written in an Azure Function, which limited the transmit frequency to 1 minute. The SCADA sensors are able to transmit telemetry in 15-second intervals, but will require gateway software to connect to IoT Hub. We do not see any major problems in creating such a gateway, and the 1-minute interval was enough to go ahead.

Security

IoT Hub allows only registered devices to communicate with it and each registered device gets its own “personal” connection string. In terms of security, this means that a compromised device’s connection string cannot be used to impersonate other devices. Additionally, device management in the Azure portal allows devices to be “disconnected” or “muted” to stop a faulty device from transmitting data that is inconsistent or damaged in any form, which helps prevent misreadings. Protecting this kind of data is vital to the success of the project, and we found that IoT Hub provided a more than adequate solution to our concerns.

Storing telemetry using Stream Analytics job

The Azure Stream Analytics service was one of the easiest things to deal with during the hackfest. Because we had already set up the IoT Hub stream and created a storage account in Azure Storage, we only needed to join the two using the following statement in between:

SELECT * INTO [SmartWaterSimulatorOutput] FROM [SmartWaterSimulatorInput]

SELECT * INTO [SCADAQueue] FROM [SmartWaterSimulatorInput]

As you can see, there are two statements in the code sample. One takes the data from the IoT Hub stream and stores it directly into Table storage, and the second takes the same telemetry but outputs it to a Service Bus Queue. The idea here is that the Service Bus gives the application a near-real-time telemetry feed of actual values, while Table storage is used by the Machine Learning experiment to do anomaly detection. The Alert Service subscribes to the queue to visualize current values.

Pre-processing of data files

The files that were exported from the SCADA system were in Excel format, each representing the water flow for a single sensor. Each file contained some Excel tabs with data for different date ranges. Some of the sensor values represented water flowing into a zone and others water flowing out. The files also had differences in update intervals, and some even contained sensor data that only reported changes to a previous state, with no predictable frequency at all.

To simplify the processing of this, we created a C# program to pre-process the data by doing the following:

  • Combining different Excel files and tabs for various sensors for a water distribution zone into one consolidated CSV file.
  • Calculating the total water flow into a zone by correctly considering the sensors measuring water flowing in and out of a zone.
  • Creating a uniform frequency for the sensor data by aggregating and distributing values on fixed intervals.
  • Splitting the Datetime column into Year, Weekday (Monday through Sunday), Quarter, Month, Hour, FiveMinute (each 24 hours split into 5-minute intervals), and Holiday (flag for public holidays).

Lesson learned
This process was not strictly sequential and we had to go back and iterate several times before a good model was achieved.


Visualizing the data

Visualization is always a fantastic way to understand data, so we chose Power BI Desktop to explore the data using various kinds of visualizations. This allowed us to quickly look for patterns and get a lot of insight into the data. We identified the fields that we needed to include in the dataset and also what kind of cleaning needed to be done before we started to experiment with data in Machine Learning.


Data visualization in Power BI

Data Visualization in Power BI


Azure Machine Learning studio experiments

After having spent some time in Power BI, we learned that the water flow into the water distribution system follows a daily pattern with some exceptions of weekends and possibly public holidays. To allow for a somewhat granular model for water flow within a 24-hour period, we decided to split every 24 hours into 5-minute intervals. We also included a flag to indicate whether a given sample time is associated with a public holiday.

We created the following Azure Machine Learning experiment to test various features and Machine Learning algorithms.


Azure Machine Learning experiment

Azure Machine Learning Experiment


Eureka!

Testing with different sets of features and Machine Learning algorithms in Azure Machine Learning was very easy!

To be able to separate noise from real anomalies, we experimented with various ways to visualize the residuals between the predicted values and the actual values. We decided to go for a model in which we created a sliding window of residuals. In the sliding window, we ignored all the values within one standard deviation of the predicted value. Then we calculated the percentage of the difference between the actual value and the predicted value and summed it up over the time window. When visualizing the residuals, we also filtered out the low values.

Key takeaway
It is very important to use a tool such as Power BI for visualizing the results of Machine Learning!


Overlaying historical data on top of the Machine Learning prediction

Overlaying historical data on top of the Machine Learning prediction


With this visualization, we could quickly identify anomalies in the water flow values compared with the predicted values. We wanted to compare our findings against the history of maintenance reports to see if we could somehow correlate those with the detected anomalies, but this taught us something.

Lesson learned
Trying to correlate time series data from sensors with manually entered maintenance reports is a challenge since they’re very loosely coupled with the sensor data. We therefore decided to keep the manual report data out of the Machine Learning models.


Even though we decided to skip the manually entered reports in the Machine Learning experiment, we still used them extensively to see and understand the data in Power BI. It largely helped us in strengthening our hypothesis that we would be able to build an application to monitor the current actual water flow and trigger alarms based on deviations from the predicted curve.

Conclusion

It is nothing short of amazing that during the course of five days we were able to put together a secure IoT solution where every working part required little effort.

We were able to create a reasonably good Machine Learning model to predict future water flow values based on simple inputs such as date, time, and water flow. We were also able to create a good visualization of historical data to highlight the anomalies in our SmartWater application.

Correlating anomalies with the manual inspection reports was done in the application. We also created an end-to-end demo application for monitoring and alerting of potential water leakages. This provided us with the confidence needed to take this to the next phase.

Lessons learned

  • Setting up a secure, scalable IoT system using Azure is a breeze. Azure provides all the required components to host such a system, and in a proof-of-concept gone bad, it can just as easily be taken down again, securing a low investment cost/easy exit.
  • The science behind Machine Learning still requires significant knowledge about the problem domain as well as some experience in statistical analysis. Powel is doing the necessary investments to ensure they have this competency in place.
  • The architecture and structure in this solution can be applied to many of Powel’s other projects.

Going forward

The next step for Powel is to finish the last bits and pieces of the application and use this in discussion with customers to bring this solution to the market.

An analyst has been introduced to the project to work with the Machine Learning model. Together with Microsoft, Powel is making significant investments in acquiring expertise on data science, Machine Learning integration, data analysis, and domain expertise.

Further down the line, we will look at automating water pressure in various grids based on Machine Learning predictions. This will be to provide just enough water pressure, based on predictions.


Leaking water


Additional resources

A separate, DevOps-focused Powel team worked with Microsoft to implement the SmartWater application. For more details about the solution architecture, see Improving water monitoring through DevOps practices with Powel AS.