Challenge 01 - Ingest Data From Blob Storage

Home - Next Challenge >

Pre-requisites

Introduction

In this challenge you will import several parquet files from Blob Storage into your data warehouse and then you will move data from staging area to the production one.

You can find all the files for this challnge into the Zip file provided by your coach at this path: ./Resources/Challenge-01/

Description

Learning objectives:

Ingest Data from Azure Data Lake Gen2 – Blob container

Define staging tables architecture, then import all parquet files using the link your coach will provide you. Data should land in a staging area (schema “Staging”) optimized to ingest data at maximum speed using the “COPY INTO” T-SQL command.

Move data from staging area to production tables

Optimize each table structure considering whether it is a “dimension” or a “fact” table. Production tables should belong to “Sales” schema. Check the Database diagram (Download available here) to identify relationships between tables and decide which is the proper distribution method. Consider also tables will be queried by filtering using the CustomerKey, ProductionKey, DataKey columns. Choose the proper one to guarantee an even distribution of data across all distributions. Use the suggested “CREATE TABLE AS” T-SQL command.

Investigate slowness due to data skew

Users are complaining that a query is taking too much to complete and need your assistance to fix the issue. Investigate why the query is so slow and make it faster.

Success Criteria

Learning Resources