View on GitHub

Ready-to-use Presentations

Pick a topic to present with ready-made presentations!

Read and Write Data in Azure Databricks

Databricks is a platform to work with big data. Azure has a solid integration with Databricks, making it easy to setup and start working with datasets.

Module Sources

Read and write data in Azure Databricks

Goals

In this workshop, you’ll use Databricks, a powerful platform to work with big data. We’ll go over reading and writing different data types in Azure Databricks like JSON, Parquet, and CSV. You’ll also learn how to read and operate on stored data in Databricks.

Goal	Description
What will you learn	Read and write different types of data in Azure Databricks
What you’ll need	An Azure subscription: <ul><li>Free Azure Trial</li> <li>Azure For Students</li></ul>
Duration	1 hour
Slides	Powerpoint

Video

🎥 Click the image above to learn how to deliver this workshop

Pre-Learning

Describe Azure Databricks

Prerequisites

Attendees should have an Azure account. There are several options that can give you free credits:

For this workshop, students will go through the Read and write data in Azure Databricks Learn Module which will direct you to import Jupyter Notebooks to execute processes using Databricks. These notebooks will show you how to read and write data in Azure Databricks.

What students will learn

Databricks is one of the most prominent platforms to deal with big data and perform collaborative tasks in Data Science. In this workshop, you will cover how to get started with the platform in Azure and perform data interactions including reading, writing, and analyzing datasets.

Read data: Image of working with a notebook

Work with notebooks: Image of importing notebooks

Import notebooks: Image of notebooks in a Databricks workspace

Attach a cluster to a workplace: Image of a Databricks cluster

Milestone 1: Read data in CSV format

Start here to create an Azure Databricks cluster and go through the notebook to read data. This notebook will cover the following:

Cover the SparkSession class
Read a very simple TSV (tab-separated value) file
Transform the TSV into a CSV (comma-separated value) file
Read the CSV file with an inferred schema and then with a user-defined schema

Milestone 2: Read data in JSON format

Continue with the next notebook to read JSON data after completing the CSV notebook. Go through the steps required to read and load the JSON data into Databricks. In this section you will:

Load a JSON file into Databricks
Use inferring to auto detect the types of values and data in the JSON file
With pyspark, you’ll create a user-defined schema to load JSON

Milestone 3: Read data in Parquet format

Work through the next notebook (3.Reading Data - Parquet) to load Parquet files that usually come with a pre-defined schema. Similar to other milestones, in this section you will:

Load a Parquet file with a pre-defined schema
Understand why Parquet files usually come with a schema
Read data from the Parquet file and visualize some of its details

Milestone 4: Read data stored in tables and views

Go to the next notebook (4.Reading Data - Tables and Views) in the Databricks notebook to read data stored in tables and views. In this section you will see how to register tables and read from a table or view, these will include:

Register a table into Databricks
Use a TSV file to create a new table
Visualize the loaded data in the UI
Create a temporary view

Milestone 5: Write data

Finally, go to the last notebook (5.Writing Data) where you will use Parquet files to write data. You’ll cover loading a TSV file into Databricks and then save it as a Parquet file using the Pyspark Python API.

Quiz

Verify your knowledge with a short quiz

Next steps

There is a slightly more advanced and involved learning path that covers Data Engineering with Azure Databricks.

Practice

In this workshop you used small data sets that are easy to work with. Try using larger datasets to take advantage of a platform like Databricks.

Feedback

Be sure to give feedback about this workshop!

Code of Conduct