View on GitHub

Ready-to-use Presentations

Pick a topic to present with ready-made presentations!

Introduction to NLP with TensorFlow

Module Source

Goals

In this workshop, we will cover how text is processed using TensorFlow, a popular platform for machine learning.

Goal	Description
What will you learn	Analyze text using TensorFlow
What you’ll need	<ul><li>Python</li> <li>Visual Studio Code</li><li>Setup VSCode for Data Science</li></ul>
Duration	1 hour
Slides	Powerpoint

Video

🎥 Click the image above to learn how to deliver this workshop

Pre-Learning

Introduction to TensorFlow using Keras

Prerequisites

You can activate the sandbox environments in each module, or alternately setup your local environment:

If you are trying the examples in a local environment, install required libraries and download the dataset in the same directory where the notebooks are:

$ pip install --quiet tensorflow_datasets==4.4.0
$ wget -q -O - https://mslearntensorflowlp.blob.core.windows.net/data/tfds-ag-news.tgz | tar xz

What you will learn

TensorFlow is a popular Machine Learning platform that allows many workflows and operations for Machine Learning in Python. In this workshop, you’ll be learn how to process and analyze text, so that you can create generated text or answer questions based on context.

Image of generated text Image of sequential training

Milestone 1: Represent text as Tensors

Complete the sandboxed Jupyter Notebook which will go through the following:

Perform a text classification task using the AG NEWS dataset
Convert text into numbers that can be represented as tensors
Create a Bag-of-Words text representation
Automatically calculate Bag-of-Words vectors with TensorFlow

Milestone 2: Represent words with embeddings

Go through the sandboxed Jupyter Notebook to work with the AG News dataset and try to represent a semantic meaning of a word.

In this section you will:

Create and train a Classifier Neural Network
Work with semantic embeddings
Use pre-trained embeddings available in the Keras framework
Find about potential pitfalls and limitations of traditional pre-trained embedding representations like Word2Vec

Milestone 3: Capture patterns with recurrent neural networks

Work through the sandboxed Jupyter Notebook to understand not only the aggregated meaning of words but take into account the order. To capture the meaning of a text sequence you’ll use a recurrent neural network.

The notebook will go through the following items:

Load the AG News dataset and train it with TensorFlow
Use masking to minimize the amount of padding
Use LSTM to learn relationships between distant tokens

Milestone 4: Generate text with recurrent networks

Complete the sandboxed Jupyter Notebook to discover how to generate text using RNNs (Recurrent Neural Networks). In this final section of the workshop, you’ll cover the following:

Build character vocabulary with tokenization
Train an RNN to generate titles
Produce output with a custom decoding function
Sample generated strings during training to check for correctness

Quiz

Verify your knowledge with a short quiz

Next steps

There are other Learn Modules for TensorFlow that are grouped in the TensorFlow fundamentals Learning Path

Practice

In this workshop you used pre-trained models which may yield limited results. Try using other data sources to train your own model. What can you discover?

Feedback

Be sure to give feedback about this workshop!

Code of Conduct