Challenge 2: Load data into a relational database/warehouse

< Previous Challenge - Home - Next Challenge >

Introduction

Now data has been put into a central location, how do you work with it?

Description

The data has been loaded into a cloud storage service. Your team’s next task is to load the data to a platform where querying and reporting tools can be used. In later challenges, you will need to visualize the data using tables/graphs/maps. You will also need to be able to secure the data, potentially with data masking, entity and/or column level access control, and encryption.

Your method of loading should anticipate that later files with the same structure will need to be loaded to keep the database/warehouse up-to-date. You can make the simplifying assumption that all future files will only contain new records (i.e., INSERTs).

Although the data used in the Hack is less than 10 GB in size (think of this as your test data set), you need to design for an expected data size of 20 TB or more. Choose your tools appropriately.

Success Criteria

Bonus

Learning Resources