Challenge 5: Process Steaming Data

< Previous Challenge - Home - Next Challenge >

Introduction

Now that we have data streaming into Azure and we are able to gather general insights about the data coming from our Industrial IoT environment it is important to be able to analyze real-time telemetry streams, perform geospatial analysis, and perform advanced anomaly detection. Complex event-processing engines are designed to analyze high volumes of fast streaming data from multiple sources simultaneously. At times these run in the cloud but they can also run at the edge.

Stream processing is useful in a number of scenarios including:

Description

In this challenge we’ll be creating an Azure Stream Analytics job, using that job to read from the message route coming from IoT Hub, filtering or aggregating data, writing the output data to the data lake and then visualizing the data with Microsoft Power BI.

  1. In your Azure resource group create a Stream Analytics job.
  2. Set the input of the Stream Analytics job to be the route defined in your IoT Hub.
  3. Create 1 output to save a copy of all data into your data lake.
  4. Create a Query that filters or aggregates your data based upon business requirements. Test this filter/aggregation (HAVING clause for example) but don’t save in the query, only test.
  5. Test the query to ensure it is functioning as expected.
  6. Create a Power BI workspace (or identify an existing workspace for this workload).
  7. Create an additional output to the previously created/identified Power BI workspace.
  8. Run the Stream Analytics job and note the output files created in the data lake & data in the Power BI dataset.

Success Criteria

Learning Resources

  1. Azure Stream Analytics Introduction
  2. Common query patterns in Azure Stream Analytics
  3. Stream Analytics Query Language Reference
  4. Stream Analytics and Power BI: A real-time analytics dashboard for streaming data
  5. Blob storage and Azure Data Lake Gen2 output from Azure Stream Analytics

Taking it Further

There are other What The Hack hackathons that explore using data in a data lake for other purposes like data warehousing, machine learning. Below are some recommended follow-up hackathons to keep learning:

  1. This Old Data Warehouse
  2. Databricks Intro ML