< Previous Challenge - Home - Next Challenge >
The overall goal of this hackathon is to take data from 2 different data sources and combine them into consolidated tables so that the business user(s) consuming this data do not realize that it originated from different data sets
We will use the AdventureWorksLT and WideWorldImporters SQL databases for our source data.
For this hackathon you will use Azure Synapse and/or Azure Databricks to copy data from these source systems and format them appropriately for business user consumption by utilizing a three-tiered data architecture.
We are now ready to setup the environment and populate the data into the Bronze Data Layer. For this challenge, we want to bring in the data “as is”. No data transformation is needed at this layer.
Environmental Setup
We need to set up the proper environment for the Hackathon. Thus, we need everyone on the team to have access to the Azure Synapse and Databricks environments. Also, any ancillary resources such as Power BI, the Azure Storage Accounts and Key Vault.
It would be a good idea for each team to host the solution in a new Resource Group in a subscription that all particpants have access to. Thus, at least one person should be owner of the Resource Group and then provide the rest of the team the proper access to that Resource Group. For more informaton on this topic, see the Learning Resources below.
This way each person in the team can take turns to lead the hack and just in case one person has to drop, the rest of the team can still progress through the challenges.
Hydration of Data in the Bronze Data Lake
For this challenge we will be working with two SQL data sets:
You will not setup the source databases for this challenge, they are setup and configured already. Your coaches will provide the connection details for these data sources for you to utilize.
The goal is not to import all data from these databases, just choose to only bring in either the Customer or Sales Order data. There is no need to do both.
HINT: Customers have addresses and Sales Orders have header, detail and product associated with them. Both have their own complexities, so one is not easier than the other. See the graphic below for reference.
Things to keep in mind about data in the Bronze layer:
Now that we know what we need to do, it’s also important to understand why we are doing this.
From an organizational standpoint, the Bronze layer serves two main purposes:
Even though the Bronze layer is generally locked down in most organizations, some teams are often given access to it to do some quick discovery work. This is often the case with Data Science teams working on prototyping a new solution.
To complete this challenge successfully, you should be able to:
The following links may be useful to achieving the success criteria listed above.