View on GitHub

Ready-to-use Presentations

Pick a topic to present with ready-made presentations!

Prepare your data using Python and VS Code

Module Source

Manipulate and clean data in Python

Goals

In this workshop, you will learn how to use Python, and popular libraries like NumPy and pandas, to manipulate and clean data to prepare it for analysis.

Goal Description
What will you learn How to find information about, clean, and prepare data that’s stored in a pandas DataFrame.
What you’ll need Visual Studio Code environment set up to run Python and Jupyter notebooks
Duration 1 hr 20 min
Just want to try the app or see the solution? Solution
Slides Powerpoint

Video

workshop walk-through

🎥 Click this image to watch Ornella walk you through the workshop

Pre-Learning

Prerequisites

What students will learn

Say you want to perform some analysis on a dataset that you find interesting – like the squirrel population of Central Park, or various types of French cheese. The first thing you’ll need to do with any dataset is to clean it up. Many datasets have missing information, or won’t be formatted in the exact way you’d like. In this workshop, you will learn how to use data science libraries to prepare your data for analysis and visualization.

image of completed project

Introduction

In this section, you’ll review an introduction and make sure that your data science environment is set up correctly before continuing on to the next part of the workshop.

Explore DataFrame information

Next, you will learn how to use Python libraries to explore an iconic dataset. You will be able to understand how to use pandas DataFrames to get an immediate idea about the size, shape, and content of a particular dataset.

Work with missing data

Now that you know how to get an overall sense of the dataset you are working with, you will learn how to identify and deal with missing values.

Remove duplicate data

Another common thing you’ll have to do with most datasets you encounter is remove duplicate data. In this section of the workshop, you will learn how to use pandas to detect and remove duplicate entries.

Combine datasets

Sometimes, you will need to combine datasets together. Luckily, there are several methods available in pandas to merge and join datasets.

Exploratory statistics and visualization

So far, you’ve learned how to use pandas methods to examine some aspects of a DataFrame, and fill in, remove, and combine data. The final way we will seek to understand our data is by creating visualizations.

Next steps

Practice

To test your knowledge, try downloading a free dataset from Kaggle that you find interesting. Use the techniques that you learned in this workshop to manipulate and clean your data!

Feedback

Be sure to give feedback about this workshop!

Code of Conduct