Challenge 5: Staging and Transformation

< Previous Challenge - Home - Next Challenge >

Introduction

Now that Caladan has all the data from their cloud and on-premises data stores it is time to create a usable Operational Data Store.

It is time to conform the source data in the data lake into a more useable dataset. Downstream consumers of the data should not need to worry about negotiating between Document data and SQL Data. These downstream consumers also want a one-stop shop for all the data. This will be especially important as new source systems are brought into the lake, so that downstream consumers and the Azure Data Warehouse you will build can react to new data sources being brought online. However, the original source data must also be preserved in the Operational data store for audit and review purposes. This will enable the creation of alternative intermediate datasets at any time, and it will also enable deeper exploration for use cases such as comparing Global Covid-19 data with local Covid-19 data.

As Caladan looks toward the future, they would also like to introduce a review process such that all changes to the solution under source control must be approved by a second developer.

Description

Success Criteria

Tips

Learning Resources

Ramp Up

Choose Your Tools

Dive In

Azure Databricks

HDInsight

Polybase

GitHub