< Previous Challenge - Home - Next Challenge >
WWI importers realize they need to further modernize their data warehouse and wants to proceed to the second stage. They are starting to reach capacity constraints on their data warehouse and need to offload data files from the relational database. Likewise, they are receiving more data in json and csv file formats. They’ve been discussing re-engineering their data warehouse to accommodate larger data sets, semi-structured data and real-time ingestion of data. They would like to conduct a POC on the Data Lake and see how to best to design it for integration into the Data Warehouse.
For this challenge, WWI wants us to build out the data lake and show how to load data into the lake from an on-premise data source.
The objective of this challenge is to build a Data Lake with Synapse Analytics or Azure Data Lake Store (ADLS) Gen 2. The Data Lake will be a staging area where all our source system data files reside. We need to ensure this Data Lake is well organized and doesn’t turn into a swamp. This challenge will help us organize the folder structure and setup security to prevent unauthorized access. Lastly, we will extract data from the WWI OLTP platform and store it in the Data Lake.
The OLTP platform is on-premise so you will need to build a hybrid architecture to integrate it into Azure. Keep in mind that the pipeline that you build will become the EXTRACT portion of the new E-L-T process. The first requirement is to build a functional POC that is able to move a single dataset to the new ADLS Gen 2 data lake. Ideally, it would be nice to make the process table driven so that new pipelines do not need to be created for each additional table that needs to be copied. (Optional, sharing to give insights on end-state.)
Prior to starting this challenge, you should ensure that there are changes in the City data captured from Wide World Importers OLTP Database. Execute the scripts below to insert/change data in the source, and update necessary configuration values.
Execute queries below in the Wide World Importers Database to update 10 existing records and insert 1 new record. Go to the Resource.zip
file and open the /Challenge02/
folder and look for the T-SQL script generateCityData.sql
. The code snippet is for display purposes and recommend to use the T-SQL script for execution.
UPDATE T
SET [LatestRecordedPopulation] = LatestRecordedPopulation + 1000
FROM (SELECT TOP 10 * from [Application].[Cities]) T
INSERT INTO [Application].[Cities]
(
[CityName]
,[StateProvinceID]
,[Location]
,[LatestRecordedPopulation]
,[LastEditedBy]
)
VALUES
(
'NewCity' + CONVERT(char(19), getdate(), 121)
,1
,NULL
, 1000
,1
)
;
Modify the [Integration].[GetCityUpdates] stored procedure in the same OLTP database to remove the Location field from the result set returned. Go to the Resource.zip
file and open the /Challenge02/
folder and look for the T-SQL script GetCityUpdates.sql
. The code snippet is for display purposes and recommend to use the T-SQL script for execution.
SELECT [WWI City ID], City, [State Province], Country, Continent, [Sales Territory],
Region, Subregion,
-- [Location] geography, -->Remove due to data type compatibility issues
[Latest Recorded Population], [Valid From],
[Valid To]
FROM #CityChanges
ORDER BY [Valid From];
Here is the query you will execute in the Azure Synapse SQL Pool to update the parameter used as the upper bound for the ELT process. Go to the Resource.zip
file and open the /Challenge02/
folder and look for the T-SQL script UpdateLoadControl.sql
. The code snippet is for display purposes and recommend to use the T-SQL script for execution.
UPDATE INTEGRATION.LOAD_CONTROL
SET LOAD_DATE = getdate()
Review data pipeline and execute it after coach approves
Note: you can execute your new pipeline by clicking the “Debug” button or adding a trigger from the UI designer.
Too comfortable? Eager to do more? Try these additional challenges!